storyteller - Exploring Multimodal Storytelling Through AI Integration with Stable Diffusion and GPT

StoryTeller: A Fascinating Multimodal AI Tool

Overview

StoryTeller is an innovative multimodal AI storytelling tool that combines the power of language and visual generations to create engaging narratives. Using advanced AI models like Stable Diffusion, GPT, and neural text-to-speech (TTS), this tool turns a simple story prompt into a fully animated short story, complete with sound and imagery.

How It Works

The process begins with a user-provided story prompt, serving as the opening line. From there, GPT, a powerful text generation model, crafts the rest of the narrative. Simultaneously, Stable Diffusion generates images that visualize each line of the story, and a TTS model reads the story aloud. This combination results in an animated video that brings the story to life with both audio and visual elements.

Installation

Using PyPI:

StoryTeller can be easily installed via Python's package installer, pip, by running the following command:

$ pip install storyteller-core

From Source:

Clone the repository from GitHub and navigate to the project directory:

$ git clone https://github.com/jaketae/storyteller.git
$ cd storyteller

Install the necessary dependencies with pip:
```
$ pip install .
```

Note for Apple Silicon users: You may need to perform additional steps for dependencies like mecab-python3.

Quickstart

To quickly test the capabilities of StoryTeller, you can use the command line interface (CLI). Simply run:

$ storyteller

This command will initiate the storytelling process with the default prompt: "Once upon a time, unicorns roamed the Earth." For a custom story beginning, use the --writer_prompt argument:

storyteller --writer_prompt "The ravenous cat, driven by an insatiable craving for tuna, devised a daring plan to break into the local fish market's coveted tuna reserve."

The final video and supporting files will be saved in the /out/out.mp4 directory.

Usage Options

StoryTeller offers various CLI options to customize storytelling, such as setting the number of images, selecting text and image generation models, and specifying output preferences. Display all options by typing:

$ storyteller --help

Advanced Usage

GPU Acceleration:

If you have a compatible CUDA-enabled machine, leverage GPU for faster processing:

$ storyteller --writer_device cuda --painter_device cuda

If you have multiple GPUs or want to use advanced settings, such as half-precision for faster generation, additional configurations can be specified.

Apple Silicon:

For Apple devices with MPS support, run:

$ storyteller --writer_device mps --painter_device mps

For further performance enhancements on this architecture, enable attention slicing to manage memory more efficiently.

Python Integration:

Developers can integrate StoryTeller directly into Python projects for more customized use:

Load the default model:

from storyteller import StoryTeller

story_teller = StoryTeller.from_default()
story_teller.generate(...)

Configure with custom settings:

from storyteller import StoryTeller, StoryTellerConfig

config = StoryTellerConfig(
    writer="gpt2-large",
    painter="CompVis/stable-diffusion-v1-4",
    max_new_tokens=100,
)

story_teller = StoryTeller(config)
story_teller.generate(...)

License

StoryTeller is released under the MIT License, allowing wide usage and contribution opportunities for developers and storytellers worldwide.