Information Extraction with Complex Layouts

Abstract: Recent innovations have improved layout analysis of document images, significantly improving our ability to identify text and non-text regions. However, extracting information from within text regions remains quite challenging because the text region may have a complex structure. In this paper, we present a new dataset with complex tabular structure, and propose new methods to robustly retrieve information from the complex text region.

Paper