photo2cartoon - Realistic Cartoon Style Rendering using Unpaired Image Translation

Photo to Cartoon: A Comprehensive Project Overview

Introduction

Photo to Cartoon is an innovative project developed by MiniVision Technology, focusing on transforming realistic photographs into cartoon-style images. This technique retains the identity and textural details of the original image while rendering it with cartoon-like visuals. The project draws from a significant pool of photo and cartoon data to create a mapping process that converts photos into cartoon images. Unlike traditional pix2pix methods that need paired data, this project employs unpaired image translation techniques due to the lack of one-to-one correspondence between photo and cartoon features, like larger cartoon eyes or slimmer jaws.

Key Methodologies

At the core of the unpaired translation methods is CycleGAN, known for its wide application. However, CycleGAN’s results often contain noticeable artifacts. To enhance stability and reduce artifacts, the project adopts methodologies from recent advancements like U-GAT-IT, which includes an AdaLIN normalization technique that balances Instance Norm and Layer Norm intelligently, alongside an attention mechanism that aids in achieving fine anime-style transformations. To ensure a realistic yet distinctly cartoonish style, the project introduces a Face ID Loss using a pre-trained facial recognition model to maintain identity through cosine distance constraints.

Furthermore, the project introduces Soft-AdaLIN, blending encoder (photo feature) and decoder (cartoon feature) statistics during normalization to bolster conversion quality. Model architecture is enhanced by adding additional hourglass modules before the encoder and after the decoder to progressively improve feature abstraction and reconstruction.

Data Processing

Given limited experimental data, the project designs a streamlined data pre-processing strategy to ease the training process. It involves detecting facial features and landmarks in images, using these to adjust orientation, and then cropping to a standardized size. A portrait segmentation model is used to remove backgrounds, standardized across training datasets.

Implementation Steps

Installation

Before starting, ensure these dependencies are installed:

Python 3.6
PyTorch 1.4
TensorFlow-GPU 1.14
face-alignment
dlib
onnxruntime

Cloning the Repository

Begin by cloning the project repository:

git clone https://github.com/minivision-ai/photo2cartoon.git
cd ./photo2cartoon

Resources

Download essential resources like pre-trained models and segmentation models from the provided Google Drive or Baidu links.

Testing

Test the conversion of a photo to cartoon style using:

python test.py --photo_path ./images/photo_test.jpg --save_path ./images/cartoon_result.png

For ONNX model testing, use:

python test_onnx.py --photo_path ./images/photo_test.jpg --save_path ./images/cartoon_result.png

Training

Prepare data by detecting faces and landmarks, correcting orientation, and cropping to a standard size. After processing, store images in specified training and testing folders. For training, run:

python train.py --dataset photo2cartoon

Use pre-trained weights if available:

python train.py --dataset photo2cartoon --pretrained_weights models/photo2cartoon_weights.pt

For multi-GPU training with batch size adjustments, use:

python train.py --dataset photo2cartoon --batch_size 4 --gpu_ids 0 1 2 3

Frequently Asked Questions

Model Variance with App: The open-source model differs from the app version due to customized data and an enhanced input resolution for the latter, resulting in more refined outputs.
Model Selection: Train the model for 200k iterations and select the optimal model based on the FID metric, usually around the 90k iterations mark.
Facial Feature Extraction: Using internally developed recognition models yields superior training results compared to open-source models.
Scope of Portrait Segmentation Model: The model is specifically designed for facial regions and does not support full-body segmentation.

Additional Information

The open-source model predominantly caters to young Asian females; hence dataset adjustments may be necessary for broader population coverage. MiniVision offers more inclusive cartoon conversion services on their platform and for customized cartoon style requests, business inquiries are welcomed.

References

The project takes inspiration and techniques from U-GAT-IT and InsightFace_Pytorch, integrating advanced normalization and face recognition capabilities to achieve its conversion goals.