This project provides a wrapper for large language models (LLMs) and related audio processing tools. It is designed for Ubuntu and WSL2, supporting GPU acceleration and audio workflows for speech-to-text, text-to-speech, and model conversion.
- Ubuntu 24 or WSL2 (Windows Subsystem for Linux)
- Python (recommended: pyenv)
- CUDA toolkit (for GPU support)
- PulseAudio (for audio in WSL)
- PyTorch
- Required Python packages: lark-parser, numpy, empy, catkin_pkg, setuptools, wheel, pygame, pyyaml, sounddevice
- FFmpeg
- Terminator
Make setup.sh
executable and run:
chmod +x setup.sh
./setup.sh
- Install WSL in Windows Powershell:
wsl --install
- Install pyenv dependencies:
sudo apt update sudo apt install -y make build-essential libssl-dev zlib1g-dev libbz2-dev libreadline-dev libsqlite3-dev wget curl llvm libncursesw5-dev xz-utils tk-dev libxml2-dev libxmlsec1-dev libffi-dev liblzma-dev git
- Install pyenv:
curl https://pyenv.run | bash
- Add to
~/.bashrc
:export PATH="$HOME/.pyenv/bin:$PATH" eval "$(pyenv init -)" eval "$(pyenv virtualenv-init -)" source ~/.bashrc
- Install and activate Python:
pyenv install <version> pyenv virtualenv <version> <env_name> pyenv activate <env_name>
To grab all of the submodules just run:
git submodule update --init
Upgrade pip and install requirements:
python -m pip install --upgrade pip
pip install -r requirements.txt --upgrade
pip install --user -U nltk
python src/lip_sync/misc/download_nltk.py
- For PulseAudio (WSLg):
export PULSE_SERVER=unix:/mnt/wslg/PulseServer paplay /usr/share/sounds/alsa/Front_Center.wav
- Install PulseAudio utilities:
sudo apt install -y pulseaudio-utils
Follow official instructions or:
wget https://developer.download.nvidia.com/compute/cuda/repos/wsl-ubuntu/x86_64/cuda-wsl-ubuntu.pin
sudo mv cuda-wsl-ubuntu.pin /etc/apt/preferences.d/cuda-repository-pin-600
wget https://developer.download.nvidia.com/compute/cuda/12.9.0/local_installers/cuda-repo-wsl-ubuntu-12-9-local_12.9.0-1_amd64.deb
sudo dpkg -i cuda-repo-wsl-ubuntu-12-9-local_12.9.0-1_amd64.deb
sudo cp /var/cuda-repo-wsl-ubuntu-12-9-local/cuda-*-keyring.gpg /usr/share/keyrings/
sudo apt-get update
sudo apt-get -y install cuda-toolkit-12-9
- Add CUDA to PATH:
nano ~/.bashrc export PATH=$PATH:/usr/local/cuda/bin source ~/.bashrc
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu129
- Terminator:
sudo apt install terminator
(optional) - FFmpeg:
sudo apt update && sudo apt install ffmpeg
Export workspace root for zonos:
export WORKSPACE_ROOT=/ # Add path to your ros2_ws here
in root directory (Head) run:
colcon build
source install:
source install/setup.bash
ros2 run gaze_tracking head_tracker
ros2 run lip_sync lip_sync_sub
Server:
llama-server -m models/gpt-oss-20b.gguf --jinja -ngl 99 -fa --n-cpu-moe
Bridge:
ros2 launch brain llama_bridge.launch.py
ros2 run text_to_speech service
Listen through microphone:
ros2 run speech_to_text listener
Running the model:
ros2 run speech_to_text stt_model
- For ROS2 setup, see official documentation and consider adding
source /opt/ros/jazzy/setup.bash
to your~/.bashrc
. - For troubleshooting audio, use
parecord
andpaplay
. - For model conversion and usage, refer to the scripts and documentation in the repository.