American Stories

A Billion Scale Dataset of Structured Texts and Layouts from U.S. Public Domain Newspapers

HEADLINES

A Massive Scale Semantic Similarity Dataset of Historical Newspaper Headlines

HomoglyphsCJK

A Python Package for Deep Learning-Assisted String Matching

Layout Parser

A Python Library for Document Image Analysis

LinkTransformer

A Unified Python Package for Record Linkage with Transformer Models