Text Retrieval

March 20, 2023 - 2 minute read - Category: Intro - Tags: Deep learning


This post covers the twelth lecture in the course: “Text Retrieval.”

Retrieval – locating relevant information in a large knowledgebase - is also core to a variety of applications, relating closely to semantic similarity (previous lecture) and entity disambiguation (following lecture). Knowledge intensive NLP is also covered.

Lecture Video

Intro and Question Answering

Watch the video

Retrieval and Open Domain Question Answering

Watch the video

Retrieval Augmented Language Modeling

Watch the video

Lecture notes

References Cited in Lecture 12: Text Retrieval

Academic Papers

Karpukhin, Vladimir, Barlas Oğuz, Sewon Min, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen-tau Yih. “Dense passage retrieval for open-domain question answering.” arXiv preprint arXiv:2004.04906 (2020).

Khattab, Omar, and Matei Zaharia. “Colbert: Efficient and effective passage search via contextualized late interaction over bert.” In Proceedings of the 43rd International ACM SIGIR conference on research and development in Information Retrieval, pp. 39-48. 2020.

Santhanam, Keshav, Omar Khattab, Jon Saad-Falcon, Christopher Potts, and Matei Zaharia. “Colbertv2: Effective and efficient retrieval via lightweight late interaction.” arXiv preprint arXiv:2112.01488 (2021).

Luan, Yi, Jacob Eisenstein, Kristina Toutanova, and Michael Collins. “Sparse, dense, and attentional representations for text retrieval.” Transactions of the Association for Computational Linguistics 9 (2021): 329-345.

Gao, Luyu, Xueguang Ma, Jimmy Lin, and Jamie Callan. “Precise Zero-Shot Dense Retrieval without Relevance Labels.” arXiv preprint arXiv:2212.10496 (2022).

Tam, Weng Lam, Xiao Liu, Kaixuan Ji, Lilong Xue, Xingjian Zhang, Yuxiao Dong, Jiahua Liu, Maodi Hu, and Jie Tang. “Parameter-efficient prompt tuning makes generalized and calibrated neural text retrievers.” arXiv preprint arXiv:2207.07087 (2022).

Self-supervised training

Ram, Ori, Gal Shachaf, Omer Levy, Jonathan Berant, and Amir Globerson. “Learning to retrieve passages without supervision.” arXiv preprint arXiv:2112.07708 (2021).

Sachan, Devendra Singh, Mike Lewis, Mandar Joshi, Armen Aghajanyan, Wen-tau Yih, Joelle Pineau, and Luke Zettlemoyer. “Improving Passage Retrieval with Zero-Shot Question Generation.” arXiv preprint arXiv:2204.07496 (2022).

Sachan, Devendra Singh, Mike Lewis, Dani Yogatama, Luke Zettlemoyer, Joelle Pineau, and Manzil Zaheer. “Questions are all you need to train a dense passage retriever.” arXiv preprint arXiv:2206.10658 (2022).

Knowledge Intensive NLP

Lewis, Patrick, Ethan Perez, Aleksandara Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler et al. “Retrieval-augmented generation for knowledge-intensive nlp tasks.” Advances in Neural Information Processing Systems 33 (2020): 9459-9474.

Borgeaud, Sebastian, Arthur Mensch, Jordan Hoffmann, Trevor Cai, Eliza Rutherford, Katie Millican, George Bm Van Den Driessche et al. “Improving language models by retrieving from trillions of tokens.” In International conference on machine learning, pp. 2206-2240. PMLR, 2022.

Other Resources

DPR Codebase

Advances towards ubiquitous neural information retrieval (Meta AI blog post)

RETRO Blog Post

Image Source: Chan, D., Fisch, A., Weston, J., Bordes, A. (2017). Reading Wikipedia to Answer Open-Domain Questions.