Graduate Theses and Dissertations

Template-Based Document Information Extraction Using Neural Network Keypoint Filtering

Dylan M. Flaute (0000-0001-5601-459X), University of Dayton

Date of Award

2024

Degree Name

M.S. in Electrical Engineering

Department

Department of Electrical and Computer Engineering

Advisor/Chair

Russell Hardie

Abstract

Documents like invoices, receipts, and forms are essential to many modern business operations. We develop a system for autonomously processing common United States Air Force contract front forms. The system takes in a form and extracts a key-value pair for each box in the form. This task is called key information extraction. In a structured document, the layout is the same from instance to instance (perhaps allowing for rigid transforms). Our documents are semi-structured because, although their layouts are similar, some of the content may be in slightly different places between instances of the form. This makes information extraction harder because the response regions may be in different places from form to form. We demonstrate that, despite the added difficulty, template matching and registration makes for a strong baseline on our semi-structured forms. Additionally, we propose a filtering approach for keypoints based on their position in the layout. Specifically, we use a trained U-Net model to identify intersections and end-points in the form's "wire-frame.'' Then, the pipeline only uses keypoints that are close to those landmarks. We demonstrate that this method improves the registration quality over our baseline, results in a more intuitive distribution of keypoints across the image, and potentially speeds up processing since fewer keypoints need matching.

Keywords

Deep learning; semantic segmentation; automatic document processing; information extraction

Rights Statement

Recommended Citation

Flaute, Dylan M., "Template-Based Document Information Extraction Using Neural Network Keypoint Filtering" (2024). Graduate Theses and Dissertations. 7406.
https://ecommons.udayton.edu/graduate_theses/7406

Link to Full Text

COinS

Graduate Theses and Dissertations

Template-Based Document Information Extraction Using Neural Network Keypoint Filtering

Date of Award

Degree Name

Department

Advisor/Chair

Abstract

Keywords

Rights Statement

Recommended Citation

ENTER SEARCH TERMS

Contribute Work

Browse

Links

Graduate Theses and Dissertations

Template-Based Document Information Extraction Using Neural Network Keypoint Filtering

Author

Date of Award

Degree Name

Department

Advisor/Chair

Abstract

Keywords

Rights Statement

Recommended Citation

Share

ENTER SEARCH TERMS

Contribute Work

Browse

Links