Calliar Project Overview
Calliar is an innovative project focused on creating a comprehensive dataset for Arabic calligraphy. This dataset is designed to capture the intricate artistry and style of Arabic handwriting, providing a valuable resource for researchers and enthusiasts alike.
Understanding the Dataset
The Calliar dataset is a collection of 2,500 JSON files that feature strokes manually annotated for Arabic calligraphy. These files are organized to help users understand the nuances of Arabic script from individual strokes to full sentences. The annotation in the dataset is done at four levels - stroke, character, word, and sentence - ensuring a detailed representation of the calligraphic art form.
Dataset Statistics
The dataset is divided into three primary subsets:
- Training Set: Consists of 2,000 samples, with 6,065 words and 24,722 characters encompassing 36,561 strokes.
- Validation Set: Contains 250 samples, including 738 words and 2,946 characters with 4,410 strokes.
- Test Set: Comprises 250 samples, 753 words, and 3,052 characters with 4,601 strokes.
File Formats
Calliar's dataset is available in two formats:
-
JSON Format: Each file is a list of strokes represented as dictionaries that map characters to primitive stroke sequences. This format offers a detailed breakdown of the calligraphy into its basic components.
-
NPZ Format: A compressed version of the dataset, which is just 8.6 MB in size. It uses the Ramer-Douglas-Peucker Algorithm to simplify strokes by reducing the number of points per stroke, similar to the QuickDraw dataset methodology.
Visualization Tools
To help users visualize and understand the data better, Calliar includes a vis.py
file, which offers Python methods for visual representation. This includes functions to draw sample JSON files and create animations that showcase the calligraphy strokes dynamically.
How to Visualize
The visualization process involves loading a JSON file and using provided Python scripts to render the strokes or create animations. These visual tools allow users to see the handwriting strokes in action, providing a more intuitive understanding of the dataset.
Annotation Server
For those interested in custom annotations, Calliar provides an annotation server. By installing Django and navigating to the calliar_server
directory, users can run a live server to manage and expand their annotations, enhancing the dataset with personalized contributions.
Sample Animations
Calliar offers multimedia elements, including animations of calligraphy strokes, available in video format. These animations help demonstrate the flow and style of Arabic calligraphy, bringing the dataset to life.
About the Authors
The dataset supports the academic paper "Calliar: An Online Handwritten Dataset for Arabic Calligraphy," authored by Zaid Alyafeai, Maged S. Al-shaibani, Mustafa Ghaleb, and Yousif Ahmed Al-Wajih. This work highlights the importance of calligraphy in Arabic culture and the effort to digitize this art form for wider use and accessibility.
By providing such an extensive and meticulously annotated dataset, Calliar plays a crucial role in preserving the rich heritage of Arabic calligraphy and facilitating its study in the digital age. Whether for academic research, artistic exploration, or just personal interest, Calliar offers a comprehensive resource for anyone interested in the beauty and complexity of Arabic script.