Whisper/LLM Voice Chat

A quick/messy proof of concept as an attempt at a low-latency, natural-feeling (and interruptible) voice-to-voice chat using LLMs.

How to run

Note: You will likely need to adjust some of the constants such as silence threshold etc.

Improve voice activity detection (VAD). Currently, it's very sensitive to background noise which Whisper infers as phrases such as "Thank you" or "Thanks for watching", etc.
Echo cancellation to prevent the assistant from hearing itself and responding.
Stream LLM output to TTS
When the user interrupts, replace the assistant's response in the message history with a cut-off version at the same point
Don't queue up multiple user messages/responses at once. Replace with messages concatenated into a single message.
Remove noise from the audio input, adjust thresholds based on user automatically etc.
General refactor/cleanup

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
lib		lib
prompts		prompts
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt