pipecat - Enhance Conversational Agents with Pipecat’s Multimodal Voice Framework

Pipecat: Building Conversational Agents Simplified

Pipecat is a user-friendly framework designed to develop voice and multimodal conversational agents. With Pipecat, one can create a variety of agents including personal coaches, meeting assistants, storytelling toys for kids, customer support bots, intake flows, and social companions with personality. This framework provides a rich set of example applications to help users get started and explore its capabilities.

Starting with Voice Agents

Pipecat offers flexibility in its deployment. Initially, users can operate the framework on a local machine and later transition the processes to the cloud. The platform supports the integration of various features such as telephone numbers, image outputs, video inputs, and different language models to enhance the functionality of conversational agents.

To begin using Pipecat, users need to install the module via pip:

pip install pipecat-ai

Configure the environment with necessary API keys:

cp dot-env.template .env

Pipecat is designed to be lightweight by default, but it can be extended with additional third-party AI services and transports to suit specific project needs. These optional dependencies can be installed with:

pip install "pipecat-ai[option,...]"

Dependencies cover various AI services and transports, allowing users to tailor the installation to their project requirements.

Simple Voice Agent Example

Below is a simple example of a Pipecat application running locally. This application uses Daily for real-time media transport and Cartesia for text-to-speech services. The bot welcomes users as they join a real-time session.

import asyncio
from pipecat.frames.frames import EndFrame, TextFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.task import PipelineTask
from pipecat.pipeline.runner import PipelineRunner
from pipecat.services.cartesia import CartesiaTTSService
from pipecat.transports.services.daily import DailyParams, DailyTransport

async def main():
  transport = DailyTransport(
    room_url=...,
    token="", 
    bot_name="Bot Name",
    params=DailyParams(audio_out_enabled=True))

  tts = CartesiaTTSService(
    api_key=...,
    voice_id=...
  )

  pipeline = Pipeline([tts, transport.output()])
  runner = PipelineRunner()
  task = PipelineTask(pipeline)

  @transport.event_handler("on_first_participant_joined")
  async def on_first_participant_joined(transport, participant):
    participant_name = participant.get("info", {}).get("userName", "")
    await task.queue_frame(TextFrame(f"Hello there, {participant_name}!"))

  @transport.event_handler("on_participant_left")
  async def on_participant_left(transport, participant, reason):
    await task.queue_frame(EndFrame())

  await runner.run(task)

if __name__ == "__main__":
  asyncio.run(main())

To execute the app, save it as app.py and run:

python app.py

Once running, users can connect via Daily’s WebRTC interface and hear the bot greeting them.

WebRTC for Production

For production-grade applications, WebRTC is preferred over WebSockets for real-time media transport due to its efficiency in handling client-server audio interaction. Users can quickly get started with WebRTC by signing up for a Daily developer account, which offers free usage each month.

Voice Activity Detection (VAD)

Voice Activity Detection (VAD) is crucial for determining when a user has finished speaking, ensuring a smooth conversational experience. Pipecat utilizes WebRTC VAD by default, with the option to integrate Silero VAD for enhanced accuracy.

pip install pipecat-ai[silero]

Development and Testing

Developers can contribute to Pipecat by setting up a virtual environment, building the package, and performing tests to expand its capabilities. Testing can be conducted using:

pytest --doctest-modules --ignore-glob="*to_be_updated*" --ignore-glob=*pipeline_source* src tests

Configure Your Editor

Pipecat adheres to strict PEP 8 guidelines for code formatting, supported by tools like Ruff. Configurations for popular code editors like Emacs and Visual Studio Code are provided to streamline the development process.

Community and Support

Pipecat encourages participation in its vibrant community. Users are welcome to join the Pipecat Discord channel for support and discussions or reach out via X (formerly Twitter).

For assistance, the community and documentation resources are readily available, ensuring developers have the help they need to succeed with Pipecat.