Photo to Cartoon: A Comprehensive Project Overview
Introduction
Photo to Cartoon is an innovative project developed by MiniVision Technology, focusing on transforming realistic photographs into cartoon-style images. This technique retains the identity and textural details of the original image while rendering it with cartoon-like visuals. The project draws from a significant pool of photo and cartoon data to create a mapping process that converts photos into cartoon images. Unlike traditional pix2pix methods that need paired data, this project employs unpaired image translation techniques due to the lack of one-to-one correspondence between photo and cartoon features, like larger cartoon eyes or slimmer jaws.
Key Methodologies
At the core of the unpaired translation methods is CycleGAN, known for its wide application. However, CycleGAN’s results often contain noticeable artifacts. To enhance stability and reduce artifacts, the project adopts methodologies from recent advancements like U-GAT-IT, which includes an AdaLIN normalization technique that balances Instance Norm and Layer Norm intelligently, alongside an attention mechanism that aids in achieving fine anime-style transformations. To ensure a realistic yet distinctly cartoonish style, the project introduces a Face ID Loss using a pre-trained facial recognition model to maintain identity through cosine distance constraints.
Furthermore, the project introduces Soft-AdaLIN, blending encoder (photo feature) and decoder (cartoon feature) statistics during normalization to bolster conversion quality. Model architecture is enhanced by adding additional hourglass modules before the encoder and after the decoder to progressively improve feature abstraction and reconstruction.
Data Processing
Given limited experimental data, the project designs a streamlined data pre-processing strategy to ease the training process. It involves detecting facial features and landmarks in images, using these to adjust orientation, and then cropping to a standardized size. A portrait segmentation model is used to remove backgrounds, standardized across training datasets.
Implementation Steps
Installation
Before starting, ensure these dependencies are installed:
- Python 3.6
- PyTorch 1.4
- TensorFlow-GPU 1.14
- face-alignment
- dlib
- onnxruntime
Cloning the Repository
Begin by cloning the project repository:
git clone https://github.com/minivision-ai/photo2cartoon.git
cd ./photo2cartoon
Resources
Download essential resources like pre-trained models and segmentation models from the provided Google Drive or Baidu links.
Testing
Test the conversion of a photo to cartoon style using:
python test.py --photo_path ./images/photo_test.jpg --save_path ./images/cartoon_result.png
For ONNX model testing, use:
python test_onnx.py --photo_path ./images/photo_test.jpg --save_path ./images/cartoon_result.png
Training
Prepare data by detecting faces and landmarks, correcting orientation, and cropping to a standard size. After processing, store images in specified training and testing folders. For training, run:
python train.py --dataset photo2cartoon
Use pre-trained weights if available:
python train.py --dataset photo2cartoon --pretrained_weights models/photo2cartoon_weights.pt
For multi-GPU training with batch size adjustments, use:
python train.py --dataset photo2cartoon --batch_size 4 --gpu_ids 0 1 2 3
Frequently Asked Questions
-
Model Variance with App: The open-source model differs from the app version due to customized data and an enhanced input resolution for the latter, resulting in more refined outputs.
-
Model Selection: Train the model for 200k iterations and select the optimal model based on the FID metric, usually around the 90k iterations mark.
-
Facial Feature Extraction: Using internally developed recognition models yields superior training results compared to open-source models.
-
Scope of Portrait Segmentation Model: The model is specifically designed for facial regions and does not support full-body segmentation.
Additional Information
The open-source model predominantly caters to young Asian females; hence dataset adjustments may be necessary for broader population coverage. MiniVision offers more inclusive cartoon conversion services on their platform and for customized cartoon style requests, business inquiries are welcomed.
References
The project takes inspiration and techniques from U-GAT-IT and InsightFace_Pytorch, integrating advanced normalization and face recognition capabilities to achieve its conversion goals.