- Python 3.13
- Linux/macOS/Windows
- espeak-ng (for text-to-speech functionality)
Linux:
sudo apt update
sudo apt install espeak-ngmacOS:
brew install espeak-ngWindows: Download and install from espeak-ng releases
curl -LsSf https://astral.sh/uv/install.sh | sh# Create virtual environment using uv
uv venv -p 3.13
# Activate virtual environment
# Linux/macOS
source .venv/bin/activate
# Windows
.venv\Scripts\activate# Install in editable mode (recommended for development)
uv pip install -e .
# Or sync dependencies from lock file
uv sync-
Create a
wikipedia.idfolder in the project root:mkdir -p wikipedia.id
-
Download the parquet file from HuggingFace:
- Visit: https://huggingface.co/datasets/wikimedia/wikipedia/tree/main/20231101.id
- Download all parquet files
- Save them in the
wikipedia.idfolder
-
Preprocess the data:
python preprocessing.py
-
Train the model:
python train.py