3D Ken Burns
The 3D Ken Burns project uses advanced technology to bring still images to life by adding a virtual camera effect, which creates an illusion of movement within the image. This technique employs a single photograph and animates it with a camera scan and zoom, enhanced with realistic motion parallax. This project is implemented in PyTorch, an open-source machine learning library.
Setup
The implementation leverages several functions written in CUDA, a parallel computing platform, using CuPy as a necessary dependency. CuPy can be installed using:
pip install cupy
Alternatively, one can use binary packages provided in the CuPy repository. It's essential to configure the CUDA_HOME
environment variable to ensure proper setup. Additionally, for generating output videos, the moviepy
library should be installed with:
pip install moviepy
Usage
There are several ways to interact with the 3D Ken Burns effect:
-
Automatic Animation: Use the following command to automatically generate a 3D Ken Burns effect video from an image:
python autozoom.py --in ./images/doublestrike.jpg --out ./autozoom.mp4
-
Manual Interface: If you prefer adjusting the camera path manually, you can start an interface that lets you do so. Run:
python interface.py
After starting the interface, navigate to
http://localhost:8080/
and load your image using the button located at the bottom right corner. -
Depth Estimation: For obtaining raw depth estimates from your image, use:
python depthestim.py --in ./images/doublestrike.jpg --out ./depthestim.npy
Note that this command does not adjust depth; additional information for depth adjustment can be found in the project's GitHub issues.
-
Benchmarking Depth Estimation: To verify the depth estimation implementation, run:
python benchmark-ibims.py
or
python benchmark-nyu.py
Colab
If you lack the environment to run this project locally, Google Colab provides an excellent alternative to execute the project on the cloud at no cost. Several Colab notebooks are provided by various authors:
Dataset
The dataset used in this project is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International Public License (CC BY-NC-SA 4.0). It can only be utilized for non-commercial purposes. For more detailed information regarding licensing, the LICENSE file within the project can be consulted.
The dataset comprises scenes in different modes such as flying and walking. These include varying properties like color, depth, and normal. Each dataset comprises downloadable links available for different GB sizes, tailored for extensive use.