The Transformer

January 21, 2023 - 2 minute read - Category: Intro - Tags: Deep learning


This post covers the fourth lecture in the course: “The Transformer.”

The transformer architecture revolutionized NLP and has since made substantial inroads in most areas of deep learning (vision, audio, reinforcement learning…). This lecture will cover substantial ground that will be foundational to the rest of the course. Please plan to devote sufficient attention (no pun intended) to this material.

Lecture Video

Watch the video

Lecture notes

References Cited in Lecture 4: The Transformer and Transformer Language Models

Academic Papers

The Original Transformer

  • Vaswani, Ashish, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. “Attention is all you need.” In Advances in Neural Information Processing Systems, pp. 5998-6008. 2017.

Transformer Language Models

Other Resources

Code Bases

  • OPT (Open Pre-trained Transfomers) from FAIR
  • Huggingface open source library with large variety of NLP models. See Transformers repo for most applications. Most models referenced above (BERT, BERTweet, RoBERTa, DistilBERT) are implemented and easily accessed through Huggingface APIs.

Image Source: Vaswani et. al. (2017) Attention Is All You Need.