Language Modeling, Recurrent Neural Nets and Seq2Seq
This post covers the third lecture in the course: “Language Modeling, Recurrent Neural Nets and Seq2Seq.”
TThis lecture will cover the history of language modeling prior to transformers. Even though you should typically use a transformer-based language model, this lecture will provide valuable context for understanding NLP and how the field has evolved. It will also introduce sequence-tosequence models and attention, key pre-requisites for understanding transformers and various applications.
The Lecture is broken up into three parts:
- Language Modeling, covering early examples of language modeling
- Word Embeddings, covering GloVe, Word2Vec, etc
- Seq2Seq, covering Sequence to Sequence models (LSTMs)
Sequence to Sequence Models
Lecture notes: Language Modeling, Word Embeddings, Seq2Seq
References Cited in Lecture 3: Language Modeling, Recurrent Neural Nets and Seq2Seq
Jurafsky, Daniel and James Martin, “N-Gram Language Models” Speech and Language Processing, 2020 .
Bengio, Yoshua, Patrice Simard, and Paolo Frasconi. “Learning long-term dependencies with gradient descent is difficult.” IEEE transactions on neural networks 5, no. 2 (1994): 157-166.
Hochreiter, Sepp, and Jürgen Schmidhuber. “Long short-term memory.” Neural computation 9, no. 8 (1997): 1735-1780.
Greff, Klaus, Rupesh K. Srivastava, Jan Koutník, Bas R. Steunebrink, and Jürgen Schmidhuber. “LSTM: A search space odyssey.” IEEE transactions on neural networks and learning systems 28, no. 10 (2016): 2222-2232.
Chung, Junyoung, Caglar Gulcehre, KyungHyun Cho, and Yoshua Bengio. “Empirical evaluation of gated recurrent neural networks on sequence modeling.” arXiv preprint arXiv:1412.3555 (2014).
Sutskever, Ilya, Oriol Vinyals, and Quoc V. Le. “Sequence to sequence learning with neural networks.” arXiv preprint arXiv:1409.3215 (2014).
Bahdanau, Dzmitry, Kyunghyun Cho, and Yoshua Bengio. “Neural machine translation by jointly learning to align and translate.” arXiv preprint arXiv:1409.0473 (2014).
Mikolov, Tomas, Ilya Sutskever, Kai Chen, Greg S. Corrado, and Jeff Dean. “Distributed representations of words and phrases and their compositionality.” Advances in Neural Information Processing Systems 26 (2013): 3111-3119. (Notes: this paper developed the word2vec model commonly used in NLP)
Pennington, Jeffrey, Richard Socher, and Christopher D. Manning. “Glove: Global vectors for word representation.” In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pp. 1532-1543. 2014.
Olah, Chris, and Shan Carter. “Attention and augmented recurrent neural networks.” Distill 1, no. 9 (2016): e1.
Olah, Christopher. “Deep Learning, NLP, and Representation”, 2014.
- Pytorch Examples: Classifying MNIST digits with an RNN
Image Source: https://thorirmar.com/post/insight_into_lstm/