Project Description

This project provides a wrapper for large language models (LLMs) and related audio processing tools. It is designed for Ubuntu and WSL2, supporting GPU acceleration and audio workflows for speech-to-text, text-to-speech, and model conversion.

Dependencies

Ubuntu 24 or WSL2 (Windows Subsystem for Linux)
Python (recommended: pyenv)
CUDA toolkit (for GPU support)
PulseAudio (for audio in WSL)
PyTorch
Required Python packages: lark-parser, numpy, empy, catkin_pkg, setuptools, wheel, pygame, pyyaml, sounddevice
FFmpeg
Terminator

Quick Setup

Option A: Automated (only tested with wsl)

Make setup.sh executable and run:

chmod +x setup.sh
./setup.sh

Option B: Manual

1. WSL & Python Environment

Install WSL in Windows Powershell:
```
wsl --install
```

Install pyenv dependencies:

sudo apt update
sudo apt install -y make build-essential libssl-dev zlib1g-dev libbz2-dev libreadline-dev libsqlite3-dev wget curl llvm libncursesw5-dev xz-utils tk-dev libxml2-dev libxmlsec1-dev libffi-dev liblzma-dev git

Install pyenv:
```
curl https://pyenv.run | bash
```

Add to ~/.bashrc:

export PATH="$HOME/.pyenv/bin:$PATH"
eval "$(pyenv init -)"
eval "$(pyenv virtualenv-init -)"
source ~/.bashrc

Install and activate Python:

pyenv install <version>
pyenv virtualenv <version> <env_name>
pyenv activate <env_name>

Submodules setup

To grab all of the submodules just run:

git submodule update --init

2. Python Packages

Upgrade pip and install requirements:

python -m pip install --upgrade pip
pip install -r requirements.txt --upgrade
pip install --user -U nltk

python src/lip_sync/misc/download_nltk.py

3. Audio in WSL

For PulseAudio (WSLg):

export PULSE_SERVER=unix:/mnt/wslg/PulseServer
paplay /usr/share/sounds/alsa/Front_Center.wav

Install PulseAudio utilities:
```
sudo apt install -y pulseaudio-utils
```

4. CUDA Toolkit (12.9 Example)

Follow official instructions or:

wget https://developer.download.nvidia.com/compute/cuda/repos/wsl-ubuntu/x86_64/cuda-wsl-ubuntu.pin
sudo mv cuda-wsl-ubuntu.pin /etc/apt/preferences.d/cuda-repository-pin-600
wget https://developer.download.nvidia.com/compute/cuda/12.9.0/local_installers/cuda-repo-wsl-ubuntu-12-9-local_12.9.0-1_amd64.deb
sudo dpkg -i cuda-repo-wsl-ubuntu-12-9-local_12.9.0-1_amd64.deb
sudo cp /var/cuda-repo-wsl-ubuntu-12-9-local/cuda-*-keyring.gpg /usr/share/keyrings/
sudo apt-get update
sudo apt-get -y install cuda-toolkit-12-9

Add CUDA to PATH:

nano ~/.bashrc
export PATH=$PATH:/usr/local/cuda/bin
source ~/.bashrc

5. PyTorch

pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu129

6. Additional Tools

Terminator: sudo apt install terminator (optional)
FFmpeg: sudo apt update && sudo apt install ffmpeg

Run files

Export workspace root for zonos:

export WORKSPACE_ROOT=/ # Add path to your ros2_ws here

in root directory (Head) run:

colcon build

source install:

source install/setup.bash

Gaze tracking

ros2 run gaze_tracking head_tracker

Lip Sync

ros2 run lip_sync lip_sync_sub

Brain

Server:

llama-server -m models/gpt-oss-20b.gguf --jinja -ngl 99 -fa --n-cpu-moe

Bridge:

ros2 launch brain llama_bridge.launch.py

Text to Speech

ros2 run text_to_speech service

Speech to Text

Listen through microphone:

ros2 run speech_to_text listener

Running the model:

ros2 run speech_to_text stt_model

Notes

For ROS2 setup, see official documentation and consider adding source /opt/ros/jazzy/setup.bash to your ~/.bashrc.
For troubleshooting audio, use parecord and paplay.
For model conversion and usage, refer to the scripts and documentation in the repository.

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
src		src
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md
WSLsetup.md		WSLsetup.md
requirements.txt		requirements.txt
setup.sh		setup.sh
test.py		test.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Project Description

Dependencies

Quick Setup

Option A: Automated (only tested with wsl)

Option B: Manual

1. WSL & Python Environment

Submodules setup

2. Python Packages

3. Audio in WSL

4. CUDA Toolkit (12.9 Example)

5. PyTorch

6. Additional Tools

Run files

Gaze tracking

Lip Sync

Brain

Text to Speech

Speech to Text

Notes

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

MaidReal/Head

Folders and files

Latest commit

History

Repository files navigation

Project Description

Dependencies

Quick Setup

Option A: Automated (only tested with wsl)

Option B: Manual

1. WSL & Python Environment

Submodules setup

2. Python Packages

3. Audio in WSL

4. CUDA Toolkit (12.9 Example)

5. PyTorch

6. Additional Tools

Run files

Gaze tracking

Lip Sync

Brain

Text to Speech

Speech to Text

Notes

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages