Convolutional Neural Networks

January 19, 2023 - 3 minute read - Category: Intro - Tags: Deep learning

Topics

This post covers the second lecture in the course: “Convolutional Neural Networks.”

Convolutional Neural Networks (CNNs) revolutionized computer vision and played a central role in ushering in the deep learning revolution. In this lecture, we will discuss the evolution of CNNs, which form the backbone for many vision applications.

If you need an initial review of CNN architecture and concepts, consider reviewing the blog post listed first under “Web Resources” before watching the lecture or reviewing the accompanying notes.

Lecture Video

Lecture notes

References Cited in Lecture 2: Convolutional Neural Networks

Academic Papers

LeCun, Yan, Léon Bottou, Yoshua Bengio, and Patrick Haffner. “Gradient-based learning applied to document recognition.” Proceedings of the IEEE 86, no. 11 (1998): 2278-2324.
Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. “Imagenet classification with deep convolutional neural networks.” Advances in neural information processing systems 25 (2012): 1097-1105 (AlexNet).
Deng, Jia, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. “Imagenet: A large-scale hierarchical image database.” In 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248-255, 2009. (Notes: this is the dataset that has been used to develop and test many advancements in computer vision, and is also one of the most cited papers in deep learning. Benchmark datasets like this play a very central role in computer science research)
Simonyan, Karen, and Andrew Zisserman. “Very deep convolutional networks for large-scale image recognition arXiv preprint arXiv:1409.1556 (2014) (VGGNet). Szegedy, Christian, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. “Going deeper with convolutions.” In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1-9. 2015 (GoogLeNet).
He, Kaiming, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. “Deep Residual Learning for Image Recognition.” In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770-778. 2016 (ResNet).
Xie, Saining, Ross Girshick, Piotr Dollár, Zhuowen Tu, and Kaiming He. “Aggregated residual transformations for deep neural networks.” In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1492-1500. 2017 (ResNeXt).
Liu, Z., Mao, H., Wu, C. Y., Feichtenhofer, C., Darrell, T., & Xie, S. (2022). A convnet for the 2020s. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 11976-11986). (ConvNeXt; this may also be clearer after the Vision Transformers lecture)
Howard, M. Sandler, G. Chu, L.-C. Chen, B. Chen, M. Tan, W. Wang, Y. Zhu, R. Pang, V. Vasudevan,, Searching for mobilenetv3, Proceedings of the IEEE/CVF international conference on computer vision pp. 1314–1324 (2019).

Other Resources

Blog post by C. Thomas
Slightly more in-depth blog post by S. Saha, also very helpful for basic concepts.

Code Bases

PyTorch ImageNet examples: https://github.com/pytorch/examples/tree/main/imagenet. Implements a variety of convolutional architectures.
Official ResNet implementation
Official ResNeXt implementation from FAIR
Official ConvNeXt implementation from FAIR
timm: API and implementations for a wide variety of vision models, including ConvNeXt, ResNet, and Mobilenet

Image Source: https://www.mathworks.com/help/deeplearning/ug/layers-of-a-convolutional-neural-network.html