StoryTeller: A Fascinating Multimodal AI Tool
Overview
StoryTeller is an innovative multimodal AI storytelling tool that combines the power of language and visual generations to create engaging narratives. Using advanced AI models like Stable Diffusion, GPT, and neural text-to-speech (TTS), this tool turns a simple story prompt into a fully animated short story, complete with sound and imagery.
How It Works
The process begins with a user-provided story prompt, serving as the opening line. From there, GPT, a powerful text generation model, crafts the rest of the narrative. Simultaneously, Stable Diffusion generates images that visualize each line of the story, and a TTS model reads the story aloud. This combination results in an animated video that brings the story to life with both audio and visual elements.
Installation
Using PyPI:
StoryTeller can be easily installed via Python's package installer, pip, by running the following command:
$ pip install storyteller-core
From Source:
-
Clone the repository from GitHub and navigate to the project directory:
$ git clone https://github.com/jaketae/storyteller.git $ cd storyteller
-
Install the necessary dependencies with pip:
$ pip install .
Note for Apple Silicon users: You may need to perform additional steps for dependencies like
mecab-python3
.
Quickstart
To quickly test the capabilities of StoryTeller, you can use the command line interface (CLI). Simply run:
$ storyteller
This command will initiate the storytelling process with the default prompt: "Once upon a time, unicorns roamed the Earth." For a custom story beginning, use the --writer_prompt
argument:
storyteller --writer_prompt "The ravenous cat, driven by an insatiable craving for tuna, devised a daring plan to break into the local fish market's coveted tuna reserve."
The final video and supporting files will be saved in the /out/out.mp4
directory.
Usage Options
StoryTeller offers various CLI options to customize storytelling, such as setting the number of images, selecting text and image generation models, and specifying output preferences. Display all options by typing:
$ storyteller --help
Advanced Usage
GPU Acceleration:
If you have a compatible CUDA-enabled machine, leverage GPU for faster processing:
$ storyteller --writer_device cuda --painter_device cuda
If you have multiple GPUs or want to use advanced settings, such as half-precision for faster generation, additional configurations can be specified.
Apple Silicon:
For Apple devices with MPS support, run:
$ storyteller --writer_device mps --painter_device mps
For further performance enhancements on this architecture, enable attention slicing to manage memory more efficiently.
Python Integration:
Developers can integrate StoryTeller directly into Python projects for more customized use:
-
Load the default model:
from storyteller import StoryTeller story_teller = StoryTeller.from_default() story_teller.generate(...)
-
Configure with custom settings:
from storyteller import StoryTeller, StoryTellerConfig config = StoryTellerConfig( writer="gpt2-large", painter="CompVis/stable-diffusion-v1-4", max_new_tokens=100, ) story_teller = StoryTeller(config) story_teller.generate(...)
License
StoryTeller is released under the MIT License, allowing wide usage and contribution opportunities for developers and storytellers worldwide.