A lightweight CLI for simple model preprocessing, training, evaluation, and optimization.
nano-llm is a streamlined command-line interface for simple transformer model development. It provides a complete workflow from data preprocessing to model training, evaluation, and text generation, designed for educational purposes and small-scale experiments.
- Tokenizer Training: Train custom tokenizers on your text data
- Model Training: Train transformer-based language models
- Model Evaluation: Evaluate models with perplexity metrics
- Text Generation: Generate text using trained models
- YAML Configuration: Simple configuration-based approach
- Model Optimization: Implement pruning and distillation techniques
- Advanced Positional Encoding: Add Rotary positional encoding (RoPE) for better performance
- Efficient Normalization: Replace LayerNorm with RMSNorm for improved efficiency
- HuggingFace Integration: Use the transformers package to enable model deployment to HuggingFace Hub
# Clone the repository
git clone https://github.com/ssubedir/nano-llm.git
cd nano-llm
# Install dependencies
uv syncFor a complete workflow example, see the Getting Started Guide.
For detailed information about commands, configuration, and examples, see the docs folder:
- Getting Started - Complete workflow tutorial
- Configuration Reference - All configuration options
- Command Guides - Detailed command documentation:
nano-llm/
├── app/ # Core application code
│ ├── cli/ # Command-line interface
│ ├── data/ # Data processing
│ └── model/ # Model architecture
├── configs/ # Configuration files
├── dataset/ # Sample datasets
└── docs/ # Documentation
- Python 3.12+
- PyTorch
- CUDA-compatible GPU (recommended for training)
# Lint all files in the current directory
uvx ruff check
# Format all files in the current directory
uvx ruff formatContributions are welcome!
- Fork the repository
- Create a feature branch
- Make your changes
- Run linting and formatting:
uvx ruff check uvx ruff format
- Submit a pull request
See the TODO section for areas that need work.
MIT License - see the LICENSE file for details.