ospeak - Leverage OpenAI Text-to-Speech API for Versatile Speech Synthesis

Introduction to ospeak

ospeak is a command-line interface (CLI) tool designed to convert written text into speech using the OpenAI Text to Speech API. This tool is capable of vocalizing text directly through your terminal or saving the audio output to a file. It is particularly beneficial for developers or users who need a straightforward method to implement text-to-speech functionality without leaving their development environment.

Installation

To install ospeak, the recommended method is to use pipx, which helps in managing Python applications in isolated environments:

pipx install ospeak

However, if you're a MacOS user, there's a known compatibility issue with Python 3.12. To work around this, install ospeak using Python 3.11 like so:

pipx install --python /path/to/python3.11 ospeak

Additionally, ospeak requires ffmpeg, a tool for handling multimedia data. MacOS users can easily install ffmpeg using Homebrew:

brew install ffmpeg

It's important to install ospeak and other tools like LLM in separate virtual environments due to some dependency issues. Here, pipx is useful as it maintains isolated environments for applications.

Usage

ospeak is designed to be user-friendly. To make your computer speak a simple sentence, all you need to do is:

ospeak "Hello there"

Before using this, you need an OpenAI API key, which can be set up as an environment variable:

export OPENAI_API_KEY="..."

Alternatively, you can provide the API key directly in the command:

ospeak --token "..." "Hello there"

ospeak also allows you to pipe content into it:

echo "Hello there" | ospeak

Customization Options

Voices: Choose different voices with the -v/--voice option. Available voices include alloy, echo, fable, onyx, nova, and shimmer. Use -v all to hear a demonstration of all voice options.
```
ospeak "This is my voice" -v all
```
Models: Enhance audio quality using the -m/--model option. The default model is tts-1, but for higher quality, choose -m tts-1-hd.
```
ospeak "This is higher quality" -m tts-1-hd
```
Speed: Adjust the speaking speed with -x/--speed. Valid values range from 0.25 to 4, where 1.0 is the default.
```
ospeak "This is my fast voice" -x 2
```
Output: Save the spoken audio to a file with -o/--output, specifying a filename ending in .mp3 or .wav.
```
ospeak "This is my voice" -o voice.mp3
```
If you also want to hear the audio live while saving, add the -s/--speak option.
```
ospeak "This is my voice" -o voice.mp3 -s
```

Help and Support

Getting help is simple. Run the following to view all available options and commands:

ospeak --help

Development and Contribution

For those interested in contributing to the development of ospeak, start by checking out the code. Then set up a new virtual environment and install the necessary dependencies:

cd ospeak
python -m venv venv
source venv/bin/activate
pip install -e '.[test]'

Run tests to ensure everything is working:

pytest

ospeak offers a convenient and efficient way to enable text-to-speech functionality directly from the command line, making it an invaluable tool for anyone needing this capability in their daily interaction with technology.