American Stories

A Billion Scale Dataset of Structured Texts and Layouts from U.S. Public Domain Newspapers

HEADLINES

A Massive Scale Semantic Similarity Dataset of Historical Newspaper Headlines

Layout Parser

A Python Library for Document Image Analysis

LinkTransformer

A Unified Python Package for Record Linkage with Transformer Models