ddddocr

DdddOcr: A Universal Offline Local CAPTCHA Recognition SDK

DdddOcr is an open-source project created by sml2h3 and kerlomz. It provides a straightforward solution for recognizing CAPTCHAs and supports a wide range of text formats through deep learning network training. This project is not designed to target any specific CAPTCHA provider and its effectiveness largely relies on various factors, potentially recognizing some CAPTCHAs while missing others.

DdddOcr emphasizes minimal dependencies to reduce setup and usage costs, aiming to provide a comfortable experience for all users.

The project repository can be found here.

Sponsorship Partners

YesCaptcha: Offers commercial-grade recognition interfaces for Google reCaptcha, hCaptcha, funCaptcha.
超级鹰: A leading global vendor in intelligent image classification and recognition.
Malenia: An enterprise-grade proxy IP gateway platform.
雨云VPS: Provides budget-friendly large bandwidth with Zhejiang node.

Getting Started

Supported Environments

Operating System	CPU	GPU	Max Python Version	Notes
Windows 64-bit	√	√	3.12	Some versions may require a vc runtime library.
Windows 32-bit	×	×	-
Linux 64 / ARM64	√	√	3.12
Linux 32-bit	×	×	-
Macos X64	√	√	3.12	Reference M1/M2/M3 guidance here.

Installation Steps

Install from PyPI

pip install ddddocr

Install from Source

git clone https://github.com/sml2h3/ddddocr.git
cd ddddocr
python setup.py

Note: Avoid importing DdddOcr directly from its root directory to prevent conflicts.

Directory Structure

ddddocr 
├── MANIFEST.in
├── LICENSE
├── README.md
├── /ddddocr/
│  ├── __init__.py            Main library file
│  ├── common.onnx            New OCR model
│  ├── common_det.onnx        Object detection model
│  ├── common_old.onnx        Old OCR model
│  ├── logo.png
│  ├── README.md
│  ├── requirements.txt
├── logo.png
└── setup.py

Project Base Support

The project is built on models trained using dddd_trainer with a PyTorch framework, and utilizes onnxruntime for inference. Therefore, its compatibility with Python versions largely depends on the onnxruntime.

User Guide

Basic OCR Recognition Ability

DdddOcr is capable of recognizing single-line text, particularly CAPTCHAs involving English letters and numbers, and some special characters.

import ddddocr

ocr = ddddocr.DdddOcr()

image = open("example.jpg", "rb").read()
result = ocr.classification(image)
print(result)

DdddOcr includes two OCR models. The second model can be accessed by setting beta=True in the initialization:

import ddddocr

ocr = ddddocr.DdddOcr(beta=True)

image = open("example.jpg", "rb").read()
result = ocr.classification(image)
print(result)

Note: Initialize DdddOcr only once to avoid performance issues.

Object Detection Ability

Detects potential target locations in an image using bounding boxes.

import ddddocr
import cv2

det = ddddocr.DdddOcr(det=True)

with open("test.jpg", 'rb') as f:
    image = f.read()

bboxes = det.detection(image)
print(bboxes)

im = cv2.imread("test.jpg")

for bbox in bboxes:
    x1, y1, x2, y2 = bbox
    im = cv2.rectangle(im, (x1, y1), (x2, y2), color=(0, 0, 255), thickness=2)

cv2.imwrite("result.jpg", im)

Slider Detection

This feature identifies slider target positions using OpenCV algorithms. Two algorithms cater to different scenarios, and usage depends on the image formats.

Algorithm 1

# With separate slider and background images
det = ddddocr.DdddOcr(det=False, ocr=False)

with open('target.png', 'rb') as f:
    target_bytes = f.read()

with open('background.png', 'rb') as f:
    background_bytes = f.read()

res = det.slide_match(target_bytes, background_bytes)
print(res)

Algorithm 2

# With images having no added slider
slide = ddddocr.DdddOcr(det=False, ocr=False)

with open('bg.jpg', 'rb') as f:
    target_bytes = f.read()

with open('fullpage.jpg', 'rb') as f:
    background_bytes = f.read()

res = slide.slide_comparison(target_bytes, background_bytes)
print(res)

OCR Probability Output

Allows for more flexible result control by outputting character probabilities.

import ddddocr

ocr = ddddocr.DdddOcr()

image = open("test.jpg", "rb").read()
ocr.set_ranges("0123456789+-x/=")
result = ocr.classification(image, probability=True)
s = ""
for i in result['probability']:
    s += result['charsets'][i.index(max(i))]

print(s)

Custom OCR Model Import

DdddOcr supports importing custom models trained using dddd_trainer.

import ddddocr

ocr = ddddocr.DdddOcr(det=False, ocr=False, import_onnx_path="myproject.onnx", charsets_path="charsets.json")

with open('test.jpg', 'rb') as f:
    image_bytes = f.read()

res = ocr.classification(image_bytes)
print(res)

Version Control

DdddOcr uses Git for version management, and the latest versions are available on the repository.

Author

[email protected]

For queries and discussions, please use GitHub issues.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Donations

Support the project by visiting the repository.