Project Overview: BetterOCR
BetterOCR is an innovative tool designed to enhance the quality of Optical Character Recognition (OCR) by integrating outputs from multiple OCR engines and utilizing a Large Language Model (LLM) for improved text accuracy and reconstruction. It effectively tackles problems associated with traditional OCR systems, which often struggle with results, especially for languages with inadequate training data.
OCR Engines Integrated
BetterOCR currently supports three major OCR platforms:
- EasyOCR: Developed by JaidedAI, EasyOCR is known for its simplicity and efficiency.
- Tesseract: A popular OCR engine developed by Google, known for its robustness and versatility.
- Pororo: An OCR module from KakaoBrain, used specifically for Korean and English text processing, leveraging BrainOCR for text recognition and EasyOCR for text detection.
Pororo is automatically excluded if the required dependencies are not met, or if the specified languages do not include English or Korean.
Leveraging Large Language Models
The project utilizes chat models from OpenAI to enhance text correction and reconstruction. These models help refine the OCR results by addressing inaccuracies and noise in the data, making the output more readable and reliable.
Custom Context Capability
A unique feature of BetterOCR is the ability for users to input custom contexts, including proper nouns and product names, which aids in correcting spelling and identifying noise. This feature ensures accurate outputs, even for uncommon or specialized vocabulary.
Upcoming Features and Contribution
BetterOCR is continuously evolving, with anticipated improvements such as enhanced interface design, async support, and more efficient box detection. The project is open to contributions, welcoming developers to collaborate in its growth and refinement.
Installation and Usage
Users can install BetterOCR via pip:
pip install betterocr
For text detection, the library can be used as follows:
import betterocr
text = betterocr.detect_text(
"demo.png",
["ko", "en"], # Language codes
context="", # Optional context
tesseract={ # Tesseract specific options
"config": "--tessdata-dir ./tessdata"
},
openai={ # OpenAI specific options
"API_KEY": "sk-xxxxxxx"
}
)
print(text)
Example Use Cases
- Example 1: English text detection with noise, highlighting the transformative capacity of BetterOCR in reconstructing clear text from noisy OCR outputs.
- Example 2: Korean and English text, showcasing its multilingual capacity and how it handles mixed-language content accurately.
- Example 3: Usage of custom context for effective correction in Korean texts.
- Example 4: Hindi text recognition demonstrating the model's versatility across languages.
License and Support
BetterOCR is distributed under the MIT license, encouraging open-source development and collaboration. Users who find the project helpful are encouraged to star it on GitHub and follow the creator, Junho Yeo, for regular updates and new innovations.
For more details, users can explore performance examples and upcoming features, contributing their insights and expertise to enhance BetterOCR further.