Language Modeling, Recurrent Neural Nets and Seq2Seq

January 20, 2023 - 2 minute read - Category: Intro - Tags: Deep learning

Topics

This post covers the third lecture in the course: “Language Modeling, Recurrent Neural Nets and Seq2Seq.”

TThis lecture will cover the history of language modeling prior to transformers. Even though you should typically use a transformer-based language model, this lecture will provide valuable context for understanding NLP and how the field has evolved. It will also introduce sequence-tosequence models and attention, key pre-requisites for understanding transformers and various applications.

The Lecture is broken up into three parts:

Language Modeling, covering early examples of language modeling
Word Embeddings, covering GloVe, Word2Vec, etc
Seq2Seq, covering Sequence to Sequence models (LSTMs)

Lecture Videos

Language Modeling

Word Embeddings

Sequence to Sequence Models

Lecture notes: Language Modeling, Word Embeddings, Seq2Seq

References Cited in Lecture 3: Language Modeling, Recurrent Neural Nets and Seq2Seq

Academic Papers

Jurafsky, Daniel and James Martin, “N-Gram Language Models” Speech and Language Processing, 2020 .
Bengio, Yoshua, Patrice Simard, and Paolo Frasconi. “Learning long-term dependencies with gradient descent is difficult.” IEEE transactions on neural networks 5, no. 2 (1994): 157-166.
Hochreiter, Sepp, and Jürgen Schmidhuber. “Long short-term memory.” Neural computation 9, no. 8 (1997): 1735-1780.
Greff, Klaus, Rupesh K. Srivastava, Jan Koutník, Bas R. Steunebrink, and Jürgen Schmidhuber. “LSTM: A search space odyssey.” IEEE transactions on neural networks and learning systems 28, no. 10 (2016): 2222-2232.
Chung, Junyoung, Caglar Gulcehre, KyungHyun Cho, and Yoshua Bengio. “Empirical evaluation of gated recurrent neural networks on sequence modeling.” arXiv preprint arXiv:1412.3555 (2014).
Sutskever, Ilya, Oriol Vinyals, and Quoc V. Le. “Sequence to sequence learning with neural networks.” arXiv preprint arXiv:1409.3215 (2014).
Bahdanau, Dzmitry, Kyunghyun Cho, and Yoshua Bengio. “Neural machine translation by jointly learning to align and translate.” arXiv preprint arXiv:1409.0473 (2014).
Mikolov, Tomas, Ilya Sutskever, Kai Chen, Greg S. Corrado, and Jeff Dean. “Distributed representations of words and phrases and their compositionality.” Advances in Neural Information Processing Systems 26 (2013): 3111-3119. (Notes: this paper developed the word2vec model commonly used in NLP)
Pennington, Jeffrey, Richard Socher, and Christopher D. Manning. “Glove: Global vectors for word representation.” In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pp. 1532-1543. 2014.

Other Resources

Olah, Chris, and Shan Carter. “Attention and augmented recurrent neural networks.” Distill 1, no. 9 (2016): e1.
Olah, Christopher. “Deep Learning, NLP, and Representation”, 2014.

Code Bases

Pytorch Examples: Classifying MNIST digits with an RNN

Image Source: https://thorirmar.com/post/insight_into_lstm/