A high-performance text translation service using Meta's NLLB-200 models for 100+ languages, designed as part of a speech-to-speech translation pipeline.
This microservice provides text translation capabilities with a focus on:
- Performance: Optimized for near real-time translation
- Flexibility: Supports multiple model sizes and compute options
- Scalability: Containerized for Kubernetes deployment
- Robustness: Comprehensive error handling and monitoring
- Translation between 100+ languages
- Model size options: small, medium, large, xl
- CPU/GPU acceleration with quantization options
- Language detection for automatic source identification
- Synchronous and asynchronous translation endpoints
- Translation caching
- Prometheus metrics
- Kubernetes/KServe integration
- Python 3.11
- Docker/Podman (for containerization)
- Kubernetes (for cluster deployment)
# Setup environment
./scripts/setup_dev.sh
# or
make setup-local
# Run service (default: port 8000)
./scripts/run_dev.sh
# or
make run-local# Using Docker
docker build -t translation-service:latest .
docker run -p 8000:8000 translation-service:latest
# Using Podman
podman build -t translation-service:latest .
podman run -p 8000:8000 translation-service:latest
# Using Makefile (auto-detects Docker or Podman)
make build
make run-containerThe project automatically detects whether Docker or Podman is available on your system and uses the appropriate runtime.
curl -X POST http://localhost:8000/translate \
-H "Content-Type: application/json" \
-d '{
"text": "Hello, how are you?",
"options": {
"source_lang": "en",
"target_lang": "fr"
}
}'POST /translate- Translate single textPOST /batch_translate- Batch translationPOST /detect_language- Detect languagePOST /async/translate- Asynchronous translationGET /async/status/{task_id}- Check async task status
GET /health- Health checkGET /ready- Readiness checkGET /live- Liveness checkGET /config- Service configurationGET /cache/stats- Cache statisticsPOST /cache/clear- Clear cacheGET /metrics- Prometheus metrics (port 8001)
Configure via environment variables:
| Variable | Description | Default |
|---|---|---|
MODEL_SIZE |
Model size (small, medium, large, xl) | small |
MODEL_DEVICE |
Device (cpu, cuda, mps) | cpu |
MODEL_COMPUTE_TYPE |
Compute type (float32, float16, int8) | float32 |
SERVER_PORT |
Server port | 8000 |
METRICS_PORT |
Metrics port | 8001 |
CACHE_MAX_SIZE |
Cache size | 1000 |
CACHE_TTL |
Cache TTL (seconds) | 3600 |
export REGISTRY_TYPE=acr
export REGISTRY_NAME=your-acr-name
make acr-login
make acr-buildTo deploy the service to KServe:
- Create a Kubernetes secret for pulling from your container registry:
kubectl create secret docker-registry acr-secret \
--docker-server=your-registry.azurecr.io \
--docker-username=00000000-0000-0000-0000-000000000000 \
--docker-password=$(az acr login --name your-registry --expose-token --query accessToken -o tsv) \
--namespace=default- Apply the KServe InferenceService manifest:
kubectl apply -f k8s/translation-inferenceservice.yaml- Check the service status:
kubectl get inferenceservice- Get the service URL:
kubectl get inferenceservice translation-service -o jsonpath='{.status.url}'For local testing with KServe, you can use nip.io to access the service:
# Test the service using nip.io
curl -X POST "http://lexi-shift.default.${EXTERNAL_IP}.nip.io/translate" \
-H "Content-Type: application/json" \
-d '{
"text": "Hello, how are you?",
"options": {
"source_lang": "en",
"target_lang": "fr"
}
}'Note: In production environments, replace the nip.io URL with your actual domain.
Run make help to see all available commands, including:
make setup-local- Setup development environmentmake run-local- Run service locallymake test- Run testsmake build- Build container imagemake acr-build- Build and push to ACR
lexi-shift/
├── k8s/ # Kubernetes manifests
├── scripts/ # Utility scripts
├── src/ # Source code
│ ├── api/ # API endpoints
│ ├── models/ # Translation models
│ └── utils/ # Utility functions
└── tests/ # Test suite
The project includes git hooks for code quality that automatically format and lint your code on commit:
- Pre-commit hook runs black, isort, and flake8 on Python files
- Hooks are automatically installed when you run
./scripts/setup_dev.sh - Alternatively, install manually with
./scripts/install_hooks.sh
- Memory intensive, especially for larger models
- First-time requests slower (model loading)
- int8 quantization compatibility varies by platform
This project is licensed under the MIT License - see the LICENSE file for details.