#OCR

Logo of PaddleOCR
PaddleOCR
The project provides a robust OCR library designed to equip developers with effective tools for model training. It includes features such as real-time layout parsing, low-code solutions to minimize costs, and diverse deployment options including high-performance inference and service-based deployment. Model integration is simplified through tools like PaddleX, offering broad model support via an easy-to-use Python API. Additionally, the project supports seamless adaptation across various hardware platforms, which enhances its application in tasks like text correction, layout detection, and formula recognition for industry-scale use.
Logo of TTime
TTime
TTime is a versatile tool offering input, screenshot, and selected text translation along with OCR capabilities. Available for Windows and macOS, it features customizable shortcuts and integrates with multiple services like Google, DeepL, and OpenAI. This software ensures comprehensive text recognition and translation solutions, catering to various user needs without excessive complexity.
Logo of Octopii
Octopii
Utilize advanced OCR and NLP technologies to effectively detect and extract personally identifiable information (PII) from images, PDFs, and documents. The tool helps to identify cybersecurity vulnerabilities by scanning public-facing data for sensitive information such as government IDs and contact details. Offering simple installation and versatile scanning options, including local filesystems and cloud URLs, it aids organizations in safeguarding private data without exaggeration or promotional terms.
Logo of Bob
Bob
Bob is a robust translation and OCR tool for macOS providing a wide range of translation modalities, including text selection, screenshot, and input translation. It integrates with diverse text and speech synthesis services like Google, Microsoft, and Tencent, allowing effective translation across platforms. Bob’s OCR functionalities are enhanced with silent screenshot OCR and versatile image selection for precise text recognition. Features such as AppleScript and PopClip integration support advanced usage, making Bob a practical choice for efficient and seamless translation work on macOS. The inclusion of offline modes boosts both security and accessibility.
Logo of surya
surya
Surya is a versatile document OCR toolkit offering high accuracy text recognition in over 90 languages, rivaling leading cloud services. Its features include text detection, layout analysis, reading order, and table recognition across diverse document types. The toolkit provides a straightforward API for processing formats like PDFs and images, ensuring consistent performance. Well-suited for research, personal use, and with specific provisions for commercial application, it integrates seamlessly with Python workflows.
Logo of video-subtitle-extractor
video-subtitle-extractor
Video-subtitle-extractor objectively transforms embedded subtitles from videos into separate SRT files. It includes features like keyframe extraction, subtitle localization, and text recognition. Non-subtitle areas can be filtered, with options to remove watermarks. Supporting batch extraction in 87 languages, three modes—Fast, Auto, and Precise—are available. It uses local OCR without online APIs, maintaining privacy and efficiency, while GPU acceleration enhances performance. Compatible across Windows, macOS, and Linux, the tool offers both GUI and CLI interfaces for ease of use.
Logo of tesseract.js
tesseract.js
A JavaScript library for extracting text from images across multiple languages using WebAssembly and the Tesseract OCR Engine. It works effortlessly in browsers through webpack, script tags, or CDN, as well as on Node.js servers. Text recognition is streamlined with simple commands. Tesseract.js handles various image formats and includes community-developed projects. Recent versions enhance performance, decrease file sizes, and ensure modern system compatibility, featuring real-time video recognition and efficient resource management.
Logo of receipt-scanner
receipt-scanner
Extract structured data from images, PDFs, and emails in Laravel with this AI-powered receipt scanner. It integrates OpenAI and AWS Textract to handle multiple input formats, improving digital business management. Key features include plain text parsing and support for diverse document types. Easily configurable via Laravel, it allows selection between different AI models for tailored speed and accuracy. Perfect for developers seeking smart data processing solutions without overly promoting capabilities.
Logo of deepdoctection
deepdoctection
Deepdoctection is an open-source Python library that facilitates document extraction and layout analysis through the integration of leading deep learning technologies. It supports the creation of flexible pipelines that utilize popular libraries for object detection, OCR, and natural language processing. Compatible with both Tensorflow and PyTorch, it provides extensive features for tasks like language detection, image deskewing, and table recognition. Analyze and process documents efficiently with customizable outputs and explore a wide range of tutorials and pre-trained models suitable for various industry applications.
Logo of arxiv-translator
arxiv-translator
The Arxiv Translator project transforms ArXiv papers into Korean using Nougat OCR, offering quicker access to new academic papers. Departing from Ar5iv's method due to update delays, this tool extracts and presents papers independently, enhancing accessibility. While translations aid understanding, original papers are recommended for detailed insights. Users can navigate a comprehensive list of translated works linked to their specific ArXiv pages.
Logo of layout-parser
layout-parser
LayoutParser provides a cohesive toolkit for Document Image Analysis, featuring deep learning models and APIs for layout detection, OCR, and data visualization. It accommodates formats such as JSON, CSV, and PDFs and facilitates model and pipeline sharing within its community. With easy installation and modular features, it boosts processing efficiency and accuracy, making it suitable for developers working with complex document structures. Known for its open community platform and thorough documentation, LayoutParser meets the needs of those interested in document management and deep learning.
Logo of screen-pipe
screen-pipe
ScreenPipe is an open-source tool that facilitates 24/7 screen and voice recording, essential for AI's evolving needs. Available as a CLI tool, a desktop app, and a Rust or WASM library, it supports Chinese and native OCR for Apple & Windows. Its plugin system enables sandboxed code execution, offering extensive customization. Regular updates and active community involvement ensure its continuous relevance.
Logo of tr
tr
A cutting-edge text recognition SDK engineered in C++ with Python interfaces, tailored for offline functionality on scanned documents. It emphasizes combining CRNN with Transformer models to improve multi-line text recognition and document comprehension. By turning images into sequences, it aims to transcend traditional OCR boundaries. The SDK accommodates multi-threading and incorporates a lightweight Transformer framework for contextual error correction. Optimal for handling curved texts and intricate document layouts, offering high adaptability and effectiveness.
Logo of clifs
clifs
The CLIFS project integrates OpenAI's CLIP model to enable precise video frame searches through free text queries. It utilizes image and text encoders to identify and match similar content, providing top-tier results. The interface is powered by a Django web server, demonstrating features such as OCR with the UrbanTracker Dataset. The deployment is streamlined with Docker support, compatible with both CPU and GPU setups.
Logo of tesseract.js-core
tesseract.js-core
This module describes converting Tesseract OCR from C to JavaScript WebAssembly. It includes comprehensive Docker-based compilation instructions and details specific changes for enhanced functionality. Suitable for developers, it covers aspects like page angle detection, parameter control, and progress logging in web and node settings. The modified repository ensures optimized OCR performance across various platforms.
Logo of EasyOCR
EasyOCR
Discover an OCR tool capable of recognizing text in over 80 languages such as Latin, Chinese, and Arabic. EasyOCR integrates effortlessly with applications via Huggingface Spaces using Gradio, offering a web demo without any initial setup. Regular updates enhance compatibility and promise future features like handwritten text recognition. Easy to install through pip, it includes detailed tutorials and API documentation to guide usage. The tool facilitates simultaneous multi-language support, backed by comprehensive instructions and command-line options.
Logo of MouseTooltipTranslator
MouseTooltipTranslator
MouseTooltipTranslator is a browser extension offering instant text translation on hover or selection. Utilizing Google and Bing translation services, it facilitates text translations in input boxes, dual YouTube subtitles, and OCR for images such as manga. PDF translations are supported via PDF.js, and pronunciation can be heard through Google TTS. Accessible from Chrome and Edge extension stores, this tool enhances web language accessibility, serving learners and global communicators effectively.
Logo of STranslate
STranslate
Explore STranslate, a user-focused translation and OCR app built with WPF, ensuring easy setup and access on platforms such as GitHub and Gitee. It fosters community interaction through GitHub forums and is shaped by contributions from multiple developers, enhancing its translation features. Delve into its documentation and become part of its expanding user base.
Logo of Pix2Text
Pix2Text
Pix2Text is an open-source Python tool offering advanced alternatives to Mathpix, featuring enhanced recognition of layouts, tables, and mathematical formulas. It converts complex images and PDFs into Markdown efficiently. Integrating multiple models, it achieves high accuracy. The online services support over 80 languages, with a focus on English and Simplified Chinese, and offer easy access through desktop applications. Designed to assist multilingual text recognition, Pix2Text also provides a web interface suitable for users unfamiliar with Python.
Logo of marker
marker
Marker is a tool for accurately converting PDFs to markdown, optimized for books and scientific papers with support for all languages. It removes unnecessary elements and formats content while working on different processing units. Despite some limitations, it remains efficient for batch document conversion.