wukong-robot - Open-Source Chinese Voice Assistant with BCI and Smart Home Features

Introduction to Wukong-Robot

Wukong-Robot is an innovative Chinese voice interaction robot and intelligent speaker project designed to be simple, flexible, and elegant. Its main goal is to enable Chinese Makers and Hackers to quickly create personalized smart speakers. It is potentially the first open-source brain-machine awakening smart speaker.

Key Features

Modular Design: The project boasts a highly modular architecture where function plugins, speech recognition, speech synthesis, and dialogue robots are independently maintained. This makes it easy to extend and develop custom plugins.
Extensive Chinese Language Support: Integrates speech recognition and synthesis technologies from various providers like Baidu, iFlytek, Alibaba, Tencent, OpenAI Whisper, Apple, Microsoft Edge, and more. This support can be expanded further.
Dialogue Support: Features a local dialogue system based on AnyQ and online dialogue robots like Turing Robot and ChatGPT.
Global Listening and Offline Activation: Supports Porcupine and snowboy for offline voice command activation. It includes brain-machine awakening and gesture-based activation options.
Flexibility: Users can customize the robot's name and choose preferred speech recognition and synthesis plugins.
Smart Home Integration: Interacts with smart home protocols and devices such as Xiao Ai speakers, Siri, mqtt, and HomeAssistant, enabling voice-controlled smart appliances.
Backend Support: Offers backend support for remote control, configuration adjustments, and log viewing.
Open API: The open API allows for the enhancement of functionalities.
Simplicity and Versatility: Easier installation compared to dingdang-robot by omitting PocketSphinx. This leads to reduced code complexity, easier maintenance, and compatibility across more platforms including Mac and Linux.

Usage and Growth

As of March 31, 2023, Wukong-Robot has been installed on over 13,000 devices and has accumulated more than 700,000 activation instances, demonstrating its growing popularity and utility.

How It Works

When Wukong-Robot is awakened, the user's voice command is first converted to text by the ASR engine. This text is then analyzed, and the appropriate skill plugin is matched to process the command. The resulting output is synthesized back into speech through the TTS engine and delivered to the user.

This interactive flow can involve multiple network requests, allowing for customization at each step. As technology advances into the 5G era, Wukong-Robot aims to offer speed and performance without compromising personal customization.

Demonstrations

Video Demos:
- Integrating Wukong-Robot with ChatGPT for streaming dialogues.
- Customized versions showcasing dialogues, music, open APIs, and smart home management.
- Using brain-machine technology to awaken Wukong-Robot.
- Google AIY Voice Kit integration with Wukong-Robot.

Overall, Wukong-Robot presents an exciting opportunity for enthusiasts and developers to engage with cutting-edge voice interaction technology, combined with robust customization capabilities. The project continues to evolve, adapting to new trends and technological advancements.