edge-tts - Integrate Microsoft Edge's text-to-speech capabilities into Python projects

Introduction to edge-tts

The edge-tts project is an impressive Python module designed to harness Microsoft Edge's online text-to-speech service. This functionality is accessible both within Python code and via convenient command line tools such as edge-tts and edge-playback. This module is ideal for developers and users who want to incorporate text-to-speech capabilities in a straightforward manner using Microsoft’s robust technology.

Installation

For developers interested in using edge-tts, the installation is quite simple. The module can be installed via pip, the package manager for Python, using the following command:

$ pip install edge-tts

For those who prefer using the command line utilities without directly involving Python code, installing via pipx is recommended:

$ pipx install edge-tts

Usage

Basic Usage

Once installed, using edge-tts is straightforward. To convert text to speech and save it to a media file, you can use the command:

$ edge-tts --text "Hello, world!" --write-media hello.mp3 --write-subtitles hello.vtt

If immediate playback with subtitles is required, the edge-playback command is available. However, it necessitates the installation of the command-line player mpv:

$ edge-playback --text "Hello, world!"

It’s noteworthy that all commands applicable to edge-tts are similarly applicable to edge-playback.

Changing the Voice

To alter the speech's language or the voice itself, the available options can be listed using:

$ edge-tts --list-voices

This command will display various voices, along with details like locale and gender. To specify a different voice, you might use:

$ edge-tts --voice ar-EG-SalmaNeural --text "مرحبا كيف حالك؟" --write-media hello_in_arabic.mp3 --write-subtitles hello_in_arabic.vtt

Custom SSML

It's important to note that as of version 5.0.0, support for custom SSML (Speech Synthesis Markup Language) has been removed. This change arose because Microsoft updated their service, making SSML unsupported via this tool.

Adjusting Rate, Volume, and Pitch

The module also allows for subtle alterations to speech generation, such as adjusting the rate, volume, and pitch. For example:

$ edge-tts --rate=-50% --text "Hello, world!" --write-media hello_with_rate_halved.mp3 --write-subtitles hello_with_rate_halved.vtt
$ edge-tts --volume=-50% --text "Hello, world!" --write-media hello_with_volume_halved.mp3 --write-subtitles hello_with_volume_halved.vtt
$ edge-tts --pitch=-50Hz --text "Hello, world!" --write-media hello_with_pitch_halved.mp3 --write-subtitles hello_with_pitch_halved.vtt

It's critical to use an equal sign (e.g., --rate=-50%) to ensure the correct input of parameters.

Note on edge-playback Command

As a convenient wrapper, edge-playback utilizes edge-tts functionalities for immediate audio playback after synthesis. This tool accepts the same arguments as edge-tts, providing an integrated speech package.

Using edge-tts as a Python Module

For those who prefer deeper integration, edge-tts can be employed directly within Python scripts. This allows for more customization and automation in application development. The project hosts various example applications which can be accessed at:

In summary, edge-tts provides a powerful, user-friendly interface for incorporating text-to-speech capabilities into both standalone applications and Python scripts, backed by Microsoft’s high-quality voice technology.