CTCWordBeamSearch - Refined Sequence Recognition Using CTC and Language Models

CTC Word Beam Search Decoding Algorithm

The CTC Word Beam Search is a sophisticated decoding algorithm designed for Connectionist Temporal Classification (CTC) tasks, often utilized in text recognition and automatic speech recognition. This algorithm stands out due to its integration of a dictionary and an optional Language Model (LM), providing a remarkable balance of accuracy and speed.

Key Updates

2024: Compatibility with Python 3.11 and 3.12.
2021: Default installation method shifted to a Python package.
2020: Introduction of an installable Python package.

Installation and Setup

To get started with the Word Beam Search decoding algorithm, users can follow these steps:

Navigate to the root level of the repository.
Install the package using pip install ..
To ensure successful installation, execute pytest in the tests/ directory.

How to Use

The Word Beam Search algorithm can be applied using a simple yet illustrative example. Imagine a scenario where a model is capable of recognizing three characters: "a", "b", and " " (whitespace). The provided language model is trained on a corpus containing just two words: "a" and "ba".

Here’s a brief code snippet illustrating its usage:

import numpy as np
from word_beam_search import WordBeamSearch

corpus = 'a ba'
chars = 'ab '
word_chars = 'ab'

# RNN output represented as a 3D numpy array
mat = np.array([[[0.9, 0.1, 0.0, 0.0]], 
                [[0.0, 0.0, 0.0, 1.0]],
                [[0.6, 0.4, 0.0, 0.0]]]) 

# Initialize and compute word beam search
wbs = WordBeamSearch(25, 'Words', 0.0, corpus.encode('utf8'), chars.encode('utf8'), word_chars.encode('utf8'))
label_str = wbs.compute(mat)

The above code decodes an instance based on the given RNN output. It returns a list of decoded label strings, which are then translated into character strings using a simple mapping.

Documentation of Parameters

When initializing WordBeamSearch, several critical parameters are involved:

Beam Width: Defines the number of beams maintained at each time-step.
Scoring Mode: Options include "Words" (dictionary only), "NGrams" (dictionary with LM scoring), and more, each varying in computational complexity.
Smoothing: Allows word pairs unknown to the training text, with values ranging between 0 and 1.
Corpus: UTF8 encoded string from which the dictionary and LM are constructed.
Characters: Defines the order of output probabilities in the RNN and must include the CTC-blank label.
Word Characters: Dictates which characters form words and must be a subset of recognized characters.

Algorithm Highlights

The Word Beam Search algorithm excels in several areas:

Dictionary Constrained Words: It uses a dictionary to restrict valid words.
Non-word Character Flexibility: It permits arbitrary non-word characters between recognized words.
Optional Word-level LM: Can enhance the recognizable accuracy of the algorithm.
Efficiency: Known to be faster than traditional token-passing methods.

Practical Comparison

An example of the algorithm's performance is depicted through a comparison of five decoders, demonstrating the superior accuracy of Word Beam Search attributable to its intricate handling of dictionary constraints and non-word characters.

Additional Resources

Additional materials and resources regarding the Word Beam Search include Python prototypes, TensorFlow custom operations, and links to further reading and references, offering a comprehensive understanding for enthusiasts and researchers alike.

Citation

For academic and research purposes, users are encouraged to cite the related paper to acknowledge the effort and innovation behind Word Beam Search.

@inproceedings{scheidl2018wordbeamsearch,
	title = {Word Beam Search: A Connectionist Temporal Classification Decoding Algorithm},
	author = {Scheidl, H. and Fiel, S. and Sablatnig, R.},
	booktitle = {16th International Conference on Frontiers in Handwriting Recognition},
	pages = {253--258},
	year = {2018},
	organization = {IEEE}
}

With its nuanced handling of language modeling and decoding, CTC Word Beam Search remains a pivotal tool in the fields of text and speech recognition.