Semantic and Syntacting Similarity

March 2, 2023 - 1 minute read - Category: Intro - Tags: Deep learning


This post covers the eleventh lecture in the course: “Semantic and Syntactic Similarity.”

Measuring textual similarity – either in terms of noisy duplicates or semantic similarity – is core to a variety of fascinating social science applications of NLP. We will also discuss bi-encoders and cross-encoders in depth.

Lecture Video

Watch the video

Lecture notes

References Cited in Lecture 11: Semantic and Syntactic Similarity

Reimers, Nils, and Iryna Gurevych. “Sentence-bert: Sentence embeddings using siamese bert-networks.” arXiv preprint arXiv:1908.10084 (2019).

Johnson, Jeff, Matthijs Douze, and Hervé Jégou. “Billion-scale similarity search with gpus.” IEEE Transactions on Big Data 7, no. 3 (2019): 535-547. See also this slide deck.

Silcock, Emily, Luca D’Amico-Wong, Jinglin Yang, and Melissa Dell. “Noise-Robust De-Duplication at Scale

David A Smith, Ryan Cordell, and Abby Mullen. Computational methods for uncovering reprinted texts in antebellum newspapers. American Literary History, 27(3):E1–E15, 2015.

Other Resources

S-BERT Loss Functions

cuML documentation

Image Source: