vocode-core - Develop Voice-Driven LLM Apps Swiftly with Vocode

Introduction to Vocode-Core

Vocode-core is an open-source library designed to simplify the creation of voice-based applications that leverage Large Language Models (LLMs). With Vocode, developers can set up real-time streaming conversations with LLMs and deploy them across various platforms such as phone calls and Zoom meetings. Whether it's building personalized assistants or interactive voice applications like voice-based chess, Vocode provides the necessary tools and integrations within a single library.

Key Features of Vocode-Core

Vocode-core offers a range of impressive features tailored for developers seeking to create voice-based applications:

Real-time Conversations: Start a conversation using your system's audio with easy setup.
Inbound and Outbound Telephony: It allows setting up phone numbers that respond with LLM-based agents and sending out phone calls managed by LLM-based agents.
Zoom Integration: Dial into Zoom calls seamlessly using Vocode's capabilities.
Langchain Integration: Make outbound calls to real phone numbers using a Langchain agent.

Integrations

Out of the box, Vocode provides integrations with various transcription, language, and synthesis services, ensuring flexibility and choice:

Transcription Services
- AssemblyAI, Deepgram, Gladia, Google Cloud, Microsoft Azure, RevAI, Whisper, and others.
Language Models
- OpenAI, Anthropic, and more for integrating with advanced LLMs.
Synthesis Services
- Available services include Rime.ai, Microsoft Azure, Google Cloud, Play.ht, Eleven Labs, Cartesia, Coqui (OSS), gTTS, StreamElements, Bark, AWS Polly, etc.

Contribution and Roadmap

Vocode is an open-source project that welcomes contributors to enhance features, add new integrations, and improve documentation. Anyone interested in contributing can follow the Contribution Guide and engage with the community on Discord for discussions and collaboration. The project roadmap is available to give insights into future developments and priorities.

Quickstart Guide

To get started with Vocode-core, developers can quickly install it using pip:

pip install vocode

Here's a simplified example of setting up a streaming conversation:

import asyncio
import signal
from vocode.helpers import create_streaming_microphone_input_and_speaker_output
from vocode.streaming.agent.chat_gpt_agent import ChatGPTAgent
from vocode.streaming.models.agent import ChatGPTAgentConfig
from vocode.streaming.models.synthesizer import AzureSynthesizerConfig
from vocode.streaming.models.transcriber import DeepgramTranscriberConfig, PunctuationEndpointingConfig
from vocode.streaming.streaming_conversation import StreamingConversation
from vocode.streaming.transcriber.deepgram_transcriber import DeepgramTranscriber
from vocode.streaming.synthesizer.azure_synthesizer import AzureSynthesizer

# Configure the settings
class Settings(BaseSettings):
    openai_api_key: str = "ENTER_YOUR_OPENAI_API_KEY_HERE"
    azure_speech_key: str = "ENTER_YOUR_AZURE_KEY_HERE"
    deepgram_api_key: str = "ENTER_YOUR_DEEPGRAM_API_KEY_HERE"
    azure_speech_region: str = "eastus"

settings = Settings()

async def main():
    microphone_input, speaker_output = create_streaming_microphone_input_and_speaker_output(use_default_devices=False)
    conversation = StreamingConversation(
        output_device=speaker_output,
        transcriber=DeepgramTranscriber(
            DeepgramTranscriberConfig.from_input_device(
                microphone_input,
                endpointing_config=PunctuationEndpointingConfig(),
                api_key=settings.deepgram_api_key,
            ),
        ),
        agent=ChatGPTAgent(
            ChatGPTAgentConfig(
                openai_api_key=settings.openai_api_key,
                initial_message=BaseMessage(text="What up"),
                prompt_preamble="""The AI is having a pleasant conversation about life."""
            )
        ),
        synthesizer=AzureSynthesizer(
            AzureSynthesizerConfig.from_output_device(speaker_output),
            azure_speech_key=settings.azure_speech_key,
            azure_speech_region=settings.azure_speech_region
        )
    )

    await conversation.start()
    print("Conversation started, press Ctrl+C to end")
    signal.signal(signal.SIGINT, lambda _0, _1: asyncio.create_task(conversation.terminate()))
    while conversation.is_active():
        chunk = await microphone_input.get_audio()
        conversation.receive_audio(chunk)

if __name__ == "__main__":
    asyncio.run(main())

The example above showcases how Vocode can be used to create a voice-based application swiftly.

Comprehensive Documentation

For those interested in exploring more about Vocode-core, its comprehensive documentation is available at docs.vocode.dev.

Vocode-core empowers developers by providing a powerful toolset to create innovative voice-based applications with ease, effective integration capabilities, and extensive support for different audio systems and platforms.