zoom-learn-zoom - Utilizing RAW Sensor Data Innovations for Advancing Digital Zoom through Machine Learning

Project Overview

"Zoom to Learn, Learn to Zoom" is an innovative project aimed at enhancing digital zoom photography using machine learning techniques, particularly by leveraging RAW sensor data. The project, outlined in a paper presented at CVPR 2019, demonstrates that using real, unprocessed data from camera sensors, known as RAW data, significantly improves the learning and processing abilities of machine learning models. The project has been implemented using TensorFlow and tested on Ubuntu 16.04 LTS.

SR-RAW Dataset

Using SR-RAW

The SR-RAW dataset is a critical component of this project. It is used for training and testing the machine learning model. For those interested in experimenting with the dataset, both the training and testing data are available for download online. The testing dataset is approximately 7 GB, whereas the training dataset is around 58 GB. Additionally, the training dataset can also be accessed via Baidu Drive for ease of access in different regions.

Trying with Your Own Data

The project is designed to be flexible with data formats. The current model is trained on Sony Digital Camera RAW data. However, it can be fine-tuned to work with other RAW formats like DNG, commonly used by devices such as iPhones. Users are advised to adjust the model accordingly if they work with different data formats.

Quick Inference

For quick tests, users can download a pre-trained model along with example data. By following the provided bash script commands, they can easily set up a working environment and run inference either on a batch of images or on individual images. The results of these inferences are saved in designated folders for easy access and review.

Training

CoBi Loss

A significant aspect of the project is the implementation of CoBi loss, a loss function that enhances the model's ability to handle image data variability. The implementation is adapted from a prior concept known as contextual loss, and it's crucial for improving the accuracy of the learning process.

Data Pre-processing

The project also provides scripts and functions to align images pre-training, accounting for camera-related movements such as handshakes during capture. Though this step is not mandatory (since CoBi loss is robust against such misalignments), it can speed up the convergence of the model. The scripts include:

run_align.sh: Manages the alignment of fields of view and accounts for hand motion misalignments.
run_wb.sh: Calculates and applies white balance adjustments for the camera's internal processing.

Training should follow the directory structure or modify the scripts as needed, and the alignment results are useful for visualization.

Preparing RAW-RGB Pair Demo

A Jupyter notebook is provided to guide users through preparing RAW-RGB pairs, which is an essential step for effectively training the model to understand the relationship between different data types.

Citation

Researchers and developers who utilize or draw from this project for their work should cite the paper as follows:

@inproceedings{zhang2019zoom
  title={Zoom to Learn, Learn to Zoom},
  author={Zhang, Xuaner and Chen, Qifeng and Ng, Ren and Koltun, Vladlen},
  booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
  year={2019}
}

Contact

For questions or further information, Cecilia Zhang can be contacted via email at [email protected]. This contact information is offered for support to those engaging with the project or requiring assistance.