Vision and Audio Transformers

January 23, 2023 - 2 minute read - Category: Intro - Tags: Deep learning


This post covers the sixth lecture in the course: “Vision and Audio Transformers.”

The transformer architecture has made major inroads in vision in recent years, with key advancements covered in this lecture. Similar recent advancements are also covered in the audio space.

Lecture Video

Lecture notes

Using timm to implement these models is strongly recommended. Here are a few official implementations:

