This post covers the eleventh lecture in the course: “Semantic and Syntactic Similarity.”
Measuring textual similarity – either in terms of noisy duplicates or semantic similarity – is core to a variety of fascinating social science applications of NLP. We will also discuss bi-encoders and cross-encoders in depth.
References Cited in Lecture 11: Semantic and Syntactic Similarity
Reimers, Nils, and Iryna Gurevych. “Sentence-bert: Sentence embeddings using siamese bert-networks.” arXiv preprint arXiv:1908.10084 (2019).
Johnson, Jeff, Matthijs Douze, and Hervé Jégou. “Billion-scale similarity search with gpus.” IEEE Transactions on Big Data 7, no. 3 (2019): 535-547. See also this slide deck.
Silcock, Emily, Luca D’Amico-Wong, Jinglin Yang, and Melissa Dell. “Noise-Robust De-Duplication at Scale”
David A Smith, Ryan Cordell, and Abby Mullen. Computational methods for uncovering reprinted texts in antebellum newspapers. American Literary History, 27(3):E1–E15, 2015.
Image Source: www.sbert.net