layout-parser - Comprehensive Toolset for Document Image Processing Using Deep Learning Models

Introduction to LayoutParser

LayoutParser is a comprehensive toolkit designed for deep learning-based document image analysis. It provides a plethora of tools aimed at simplifying tasks related to Document Image Analysis (DIA). Whether you want to parse the layout of a page or conduct OCR (Optical Character Recognition) on scanned documents, LayoutParser facilitates these tasks with ease.

Key Features of LayoutParser

Deep Learning Models for Layout Detection: LayoutParser offers a rich library of deep learning models that are easily accessible through a unified set of APIs. With these models, users can perform complex layout detection tasks with minimal code. Here's a simple example of performing a layout detection:
```
import layoutparser as lp
model = lp.AutoLayoutModel('lp://EfficientDete/PubLayNet')
layout = model.detect(image)
```
Powerful Layout Data Structures: The toolkit includes advanced data structures designed specifically for document image analysis. These structures come with APIs optimized for manipulating page layouts, such as selecting specific elements and performing actions on them. For instance, you can easily filter and work with elements situated in the left column of a page:
```
image_width = image.size[0]
left_column = lp.Interval(0, image_width/2, axis='x')
layout.filter_by(left_column, center=True)
```
Efficient OCR Integration: LayoutParser simplifies the OCR process. By detecting specific layout regions, users can extract text from those regions using OCR with just a few lines of code:
```
ocr_agent = lp.TesseractAgent()
for layout_region in layout:
    image_segment = layout_region.crop(image)
    text = ocr_agent.detect(image_segment)
```
Versatile Visualization Tools: With flexible APIs, users can visualize detected layouts effortlessly, enhancing their understanding and analysis of the document's structure:
```
lp.draw_box(image, layout, box_width=1, show_element_id=True, box_alpha=0.25)
```
Data Conversion and Loading: LayoutParser enables loading layout data from various formats, including JSON, CSV, and even PDFs, allowing for easy integration with different workflows:
```
layout = lp.load_json("path/to/json")
layout = lp.load_csv("path/to/csv")
pdf_layout = lp.load_pdf("path/to/pdf")
```
Community-driven Platform: The project is an open platform that encourages sharing of models and DIA pipelines, thereby fostering a collaborative ecosystem.

Installation

Installing LayoutParser is straightforward, and it allows you to select only the necessary components for your project:

pip install layoutparser # Base library
pip install "layoutparser[layoutmodels]" # DL layout model toolkit 
pip install "layoutparser[ocr]" # OCR toolkit

Additional steps are required if you intend to use Detectron2-based models. More detailed installation instructions are available.

Practical Examples

To help users get started, LayoutParser provides practical examples. These include guides on performing OCR on tables and parsing results, as well as examples demonstrating deep layout parsing with complex documents.

Contribution and Citation

LayoutParser welcomes contributions from the community and provides guidelines for contributing. If you find LayoutParser beneficial in your work, you're encouraged to cite it using the provided BibTeX entry. This helps support the ongoing development and improvements of the tool.

In summary, LayoutParser is a versatile and robust toolkit that simplifies the complex task of document image analysis using deep learning, making it an invaluable resource for researchers, developers, and anyone in need of precise document parsing capabilities.