Vision and Audio Transformers

January 23, 2023 - 2 minute read - Category: Intro - Tags: Deep learning


This post covers the sixth lecture in the course: “Vision and Audio Transformers.”

The transformer architecture has made major inroads in vision in recent years, with key advancements covered in this lecture. Similar recent advancements are also covered in the audio space.

Lecture Video

Watch the video

Lecture notes

References Cited in Lecture 4: The Transformer and Transformer Language Models

Academic Papers

Original Vision Transfomer Paper

Further Work with Image Transformers

Other Resources

Code Bases

Using timm to implement these models is strongly recommended. Here are a few official implementations:

Image Source: