tr - Text Recognition Project Introduction
The tr project offers an offline Text Recognition SDK specifically designed for scanned documents. It is primarily developed in C++ with a provided Python interface. The intended compilation environment for this tool is Ubuntu 16.04.
Why was tr Developed?
At the inception of the tr project, the landscape of open-source OCR solutions for text recognition from scanned documents was limited to a few options like ChineseOCR and Tesseract. Since the introduction of tr, the open-source community has seen the arrival of new and effective OCR tools. The tr project has largely fulfilled its initial objectives and is transitioning towards a research-focused endeavor, prioritizing technological innovation over practicality.
End-to-End Document Understanding (Under Development)
The increasing influence of multimodal large models is much like the unstoppable internet wave of the past. Integrating OCR algorithms into the multimodal ecosystem will be vital in the future. While one approach involves using OCR to identify text lines and inputting results in formats like JSON or XML, it has drawbacks such as loss of image information and errors in subsequent document understanding. An end-to-end document understanding approach, which encodes images into a one-dimensional sequence for processing by TransformerDecoder, offers a more feasible solution.
Related Research:
CRNN for Multi-Line Text Recognition
The project combines CRNN with Transformer Encoder/Decoder to support multi-line text recognition, reducing the need for bounding box annotations for text lines and curving text situations. Images that existing OCR tools struggle with can benefit from this multi-line CRNN approach.
- Experience Multi-Line CRNN:
Challenging Object Recognition with CRNN
Is image recognition akin to text recognition when objects are viewed as characters? Initial tests on the PASCAL VOC dataset suggest that multi-line CRNN can recognize both object categories and quantities. However, due to the potent memory capabilities of Transformers, overfitting is a concern, necessitating data augmentation and an increased number of training samples.
- Experience CRNN for Image Recognition:
Addressing Challenges in Large Language Models
How can CRNN technology be applied to large language models (LLM)? By modifying multi-line CRNN to support text input, creating a version called ChatCRNN. Recent experiments indicate that, despite the struggles of current large models with multi-digit multiplication, a three-digit multiplication task is manageable for ChatCRNN after brief training.
- Experience ChatCRNN:
CRNN with Transformer
The latest advancements incorporate Transformer Encoder structures to enhance contextual error correction and lessen dependence on real samples. The current training set includes only around 100 genuine samples.
Installation:
pip install tr==2.8.2 -i https://pypi.tuna.tsinghua.edu.cn/simple
For Windows 64-bit systems:
pip install tr==2.8.6 -i https://pypi.org/simple/
Example Code:
import tr
crnn = tr.CRNN()
chars, scores = crnn.run("imgs/line.png")
print("".join(chars))
GUI Screenshot Recognition:
# Requires PyQt5, PIL dependencies
python -m tr.gui
Updates and Installation
The tr project now supports C++ interfaces, Python 2, and has removed dependencies to simplify deployment. It supports multi-threading and includes GPU installation options for older graphics cards under specific requirements. Deployment through Docker is available for those avoiding CUDA/cuDNN installation.
Installation Options:
- Clone and install from the repository:
git clone https://github.com/myhub/tr.git cd ./tr sudo python setup.py install
- Direct installation via pip:
sudo pip install git+https://github.com/myhub/tr.git@master
Testing:
- Compatibility tests with Python2 and visualization with Python3 are provided along with multi-threading and screenshot recognition functionalities.
Associated Projects
For web-based implementations, tr recommends the TrWebOCR project.
Full Functionality Demonstration
Detailed demonstrations of the software's capabilities in text recognition, including both Python and C++ examples, are available as part of the project's comprehensive documentation.