tr - Comprehensive Offline Text Recognition SDK for Enhanced Document Processing

tr - Text Recognition Project Introduction

The tr project offers an offline Text Recognition SDK specifically designed for scanned documents. It is primarily developed in C++ with a provided Python interface. The intended compilation environment for this tool is Ubuntu 16.04.

Why was tr Developed?

At the inception of the tr project, the landscape of open-source OCR solutions for text recognition from scanned documents was limited to a few options like ChineseOCR and Tesseract. Since the introduction of tr, the open-source community has seen the arrival of new and effective OCR tools. The tr project has largely fulfilled its initial objectives and is transitioning towards a research-focused endeavor, prioritizing technological innovation over practicality.

End-to-End Document Understanding (Under Development)

The increasing influence of multimodal large models is much like the unstoppable internet wave of the past. Integrating OCR algorithms into the multimodal ecosystem will be vital in the future. While one approach involves using OCR to identify text lines and inputting results in formats like JSON or XML, it has drawbacks such as loss of image information and errors in subsequent document understanding. An end-to-end document understanding approach, which encodes images into a one-dimensional sequence for processing by TransformerDecoder, offers a more feasible solution.

Related Research:

TransformerDecoder

CRNN for Multi-Line Text Recognition

The project combines CRNN with Transformer Encoder/Decoder to support multi-line text recognition, reducing the need for bounding box annotations for text lines and curving text situations. Images that existing OCR tools struggle with can benefit from this multi-line CRNN approach.

Experience Multi-Line CRNN:
- Multi-Line CRNN

Challenging Object Recognition with CRNN

Is image recognition akin to text recognition when objects are viewed as characters? Initial tests on the PASCAL VOC dataset suggest that multi-line CRNN can recognize both object categories and quantities. However, due to the potent memory capabilities of Transformers, overfitting is a concern, necessitating data augmentation and an increased number of training samples.

Experience CRNN for Image Recognition:
- Image Recognition

Addressing Challenges in Large Language Models

How can CRNN technology be applied to large language models (LLM)? By modifying multi-line CRNN to support text input, creating a version called ChatCRNN. Recent experiments indicate that, despite the struggles of current large models with multi-digit multiplication, a three-digit multiplication task is manageable for ChatCRNN after brief training.

Experience ChatCRNN:
- ChatCRNN

CRNN with Transformer

The latest advancements incorporate Transformer Encoder structures to enhance contextual error correction and lessen dependence on real samples. The current training set includes only around 100 genuine samples.

Installation:

pip install tr==2.8.2 -i https://pypi.tuna.tsinghua.edu.cn/simple

For Windows 64-bit systems:

pip install tr==2.8.6 -i https://pypi.org/simple/

Example Code:

import tr
crnn = tr.CRNN()
chars, scores = crnn.run("imgs/line.png")
print("".join(chars))

GUI Screenshot Recognition:

# Requires PyQt5, PIL dependencies
python -m tr.gui

Updates and Installation

The tr project now supports C++ interfaces, Python 2, and has removed dependencies to simplify deployment. It supports multi-threading and includes GPU installation options for older graphics cards under specific requirements. Deployment through Docker is available for those avoiding CUDA/cuDNN installation.

Installation Options:

Clone and install from the repository:

git clone https://github.com/myhub/tr.git
cd ./tr
sudo python setup.py install

Direct installation via pip:

sudo pip install git+https://github.com/myhub/tr.git@master

Testing:

Compatibility tests with Python2 and visualization with Python3 are provided along with multi-threading and screenshot recognition functionalities.

Associated Projects

For web-based implementations, tr recommends the TrWebOCR project.

Full Functionality Demonstration

Detailed demonstrations of the software's capabilities in text recognition, including both Python and C++ examples, are available as part of the project's comprehensive documentation.