stable-diffusion-videos - Generate AI-Driven Videos Using Stable Diffusion with Custom Prompts

Introduction to Stable-Diffusion-Videos

Stable-Diffusion-Videos is a fascinating project aimed at creating videos using a process called diffusion. This innovative tool allows users to generate and morph images into videos based on descriptive prompts, following the principles of stable diffusion models. Interestingly, users can even try it out directly in Google Colab with simple steps.

Installation

To get started with Stable-Diffusion-Videos, users need to install the package using the command:

pip install stable_diffusion_videos

This sets up the necessary environment to start creating dynamic videos with ease.

Usage

Inside the package, several example scripts show users how to maximize the potential of this tool. Let’s explore some core functionalities:

Making Videos

Creating videos with this tool is straightforward. You start by importing the StableDiffusionWalkPipeline and setting up the pipeline using the pre-trained model. Here's a step-by-step guide to creating a video that transitions between different scenes or objects:

from stable_diffusion_videos import StableDiffusionWalkPipeline
import torch

pipeline = StableDiffusionWalkPipeline.from_pretrained(
    "CompVis/stable-diffusion-v1-4",
    torch_dtype=torch.float16,
).to("cuda")

video_path = pipeline.walk(
    prompts=['a cat', 'a dog'],
    seeds=[42, 1337],
    num_interpolation_steps=3,
    height=512,
    width=512,
    output_dir='dreams',
    name='animals_test',
    guidance_scale=8.5,
    num_inference_steps=50,
)

This example shows how to morph from a cat to a dog using simple prompts and configurations.

Making Music Videos

A newer feature allows users to synchronize video creation with an audio file. Here, the audio file influences how quickly or slowly transitions occur in the video, effectively syncing the visuals with the beat of the music.

from stable_diffusion_videos import StableDiffusionWalkPipeline
import torch

pipeline = StableDiffusionWalkPipeline.from_pretrained(
    "CompVis/stable-diffusion-v1-4",
    torch_dtype=torch.float16,
).to("cuda")

audio_offsets = [146, 148]
fps = 30

num_interpolation_steps = [(b-a) * fps for a, b in zip(audio_offsets, audio_offsets[1:])]

video_path = pipeline.walk(
    prompts=['a cat', 'a dog'],
    seeds=[42, 1337],
    num_interpolation_steps=num_interpolation_steps,
    audio_filepath='audio.mp3',
    audio_start_sec=audio_offsets[0],
    fps=fps,
    height=512,
    width=512,
    output_dir='dreams',
    guidance_scale=7.5,
    num_inference_steps=50,
)

Using the User Interface

The project also offers a user-friendly interface to make it even easier for those who prefer a visual approach over code.

from stable_diffusion_videos import StableDiffusionWalkPipeline, Interface
import torch

pipeline = StableDiffusionWalkPipeline.from_pretrained(
    "CompVis/stable-diffusion-v1-4",
    torch_dtype=torch.float16,
).to("cuda")

interface = Interface(pipeline)
interface.launch()

Credits

The project builds on existing work by @karpathy, with further adaptations by contributors like nateraw. This collaborative effort has resulted in a robust tool for creative video generation.

Contributing

Community members are welcome to contribute to the project. Any issues or feature suggestions can be submitted through the project's GitHub page.

Stable-Diffusion-Videos is an open-source project that continues to evolve, providing exciting opportunities for both developers and creatives to explore the intersection of art and technology.