Introduction to JARVIS
JARVIS is an innovative voice personal assistant capable of transforming voice commands into actionable responses via an advanced technology stack. This user-friendly assistant not only converts speech into text but also processes information to generate meaningful replies, making interaction seamless through a web interface.
How JARVIS Operates
JARVIS operates through a straightforward yet technologically advanced workflow consisting of several key steps:
- Speaking: The interaction begins when a user speaks into their microphone.
- Speech to Text: This spoken language is then transcribed into text format using the state-of-the-art service provided by Deepgram.
- Response Generation: The text is sent to OpenAI's GPT-3 API, a sophisticated language model, to formulate an appropriate response.
- Text to Speech: This response is then converted back into speech using ElevenLabs, ensuring the reply is both understandable and natural.
- Audio Playback: The generated speech is played for the user through a system using Pygame.
- Web Display: Finally, all conversations are visually represented on a web interface with the help of Taipy, offering a comprehensive view of the interaction.
Demonstration Video
For those interested in seeing JARVIS in action, a video demonstration is available. Click here to watch how effortlessly JARVIS interacts with users.
System Requirements
JARVIS requires Python versions 3.8 to 3.11. Additionally, users must acquire the following API keys for optimal functionality:
Installation Process
Setting up JARVIS is a straightforward process:
-
Clone Repository: Begin by cloning the JARVIS repository from GitHub:
git clone https://github.com/AlexandreSajus/JARVIS.git
-
Install Dependencies: Navigate into the cloned directory and install all necessary packages:
pip install -r requirements.txt
-
Configure Environment: Create a
.env
file in the root directory and input your obtained API keys:DEEPGRAM_API_KEY=XXX...XXX OPENAI_API_KEY=sk-XXX...XXX ELEVENLABS_API_KEY=XXX...XXX
Using JARVIS
To start using JARVIS:
-
Launch the web interface by executing:
python display.py
-
In another terminal, initialize the voice assistant:
python main.py
When both the terminal and web interface indicate "Listening...", JARVIS is ready to accept voice input. After processing the user's spoken request, a response is generated and vocalized, then displayed on the web interface. An example interaction could look like this:
Listening...
Done listening
Finished transcribing in 1.21 seconds.
Finished generating response in 0.72 seconds.
Finished generating audio in 1.85 seconds.
Speaking...
--- USER: good morning jarvis
--- JARVIS: Good morning, Alex! How can I assist you today?
In conclusion, JARVIS seamlessly bridges the gap between human language and machine understanding, offering users a sophisticated yet accessible personal assistant solution.