WEAVE (Well-structured Empirical workflows in Analysis, Visualized selection, and Efficient binarization) is a unified workbench for LLM instruction data engineering. It integrates three core modules:
- Grouped Refinement of Organized Variability Estimation (GROVE) for total dataset automatic visualization and hybrid data selection combining verb-anchored grouping with model-centric variability scoring,
- Mixture Optimization for Structured Subtasks (MOSS) for budget-aware task composition analysis, and
- ZEro annotation Behavior-based Response Alignment (ZEBRA) for zero-annotation preference binarization using model behavior knowledge. (ZEBRA page provides a prototype feature for automatic binarization to support alignment tuning.)
Together, these components enable practitioners to achieve better accuracy-per-token and accuracy-per-GPU-hour than training on unstructured, fully scaled datasets.
The following software must be installed:
- Docker: 20.10 or higher
- Docker Compose: 2.0 or higher
Verify installation:
docker --version
docker compose versioncurl -O https://raw.githubusercontent.com/Jeesu-Jung/weave/main/docker-compose.yml
curl -O https://raw.githubusercontent.com/Jeesu-Jung/weave/main/.env.example# Copy .env.example to .env
cp .env.example .env
# Edit the .env file to set environment variables# Build and run all services (including Milvus + Redis)
docker compose up -d
# View logs
docker compose logs -f
# Check status
docker compose psRun once with optional profiles to populate embedding data in Milvus (OpenAI API costs may apply).
# MOSS embedding index (seed_sentence + instruction_alpaca)
docker compose --profile embed run --rm grove-task-mixture-embed
# Weavy document embedding (for RAG chatbot)
docker compose --profile ingest run --rm weavy-ingest| Service | URL | Description |
|---|---|---|
| Frontend | http://localhost | Main Web UI |
| Attu | http://localhost:8000 | Milvus GUI Management Tool |
Check the status of all services:
# Overall service status
docker compose ps
# Individual service health checks
curl http://localhost:8080/actuator/health # Cache Service
curl http://localhost:8081/actuator/health # Task Mixture
curl http://localhost:8082/actuator/health # Zebra Service
curl http://localhost:8083/health # Model Centric
curl http://localhost:8084/actuator/health # Weavy
curl http://localhost/health # Frontend- Milvus (Vector Database) + etcd, MinIO
- Attu (Milvus GUI Management Tool)
- Redis (Caching)
- Backend Services × 5 (Cache, Task Mixture, Zebra, Model Centric, Weavy)
- Frontend
Key environment variables in the .env file:
# OpenAI API Key (required for Weavy service)
OPENAI_API_KEY=sk-proj-your-actual-key-here
# Hugging Face Token (required for model downloads in grove-model-centric-service)
# Get your token at https://huggingface.co/settings/tokens
HF_TOKEN=hf_your-token-here# Redis settings (default: redis:6379)
REDIS_HOST=redis
REDIS_PORT=6379
REDIS_URL=redis://redis:6379
# Milvus settings (default: milvus:19530)
MILVUS_HOST=milvus
MILVUS_PORT=19530
MILVUS_URI=http://milvus:19530
# Service ports (recommended to use defaults)
CACHE_SERVICE_PORT=8080
TASK_MIXTURE_PORT=8081
ZEBRA_SERVICE_PORT=8082
MODEL_CENTRIC_PORT=8083
WEAVY_PORT=8084
FRONTEND_PORT=80# Start the full stack (background)
docker compose up -d
# Start the full stack (with logs)
docker compose up
# Stop the full stack
docker compose down
# Stop + remove volumes (reset all data)
docker compose down -v# Follow all service logs in real-time
docker compose logs -f
# Follow logs for a specific service
docker compose logs -f grove-cache-service
docker compose logs -f weavy
# View only the last 100 lines
docker compose logs --tail=100# Restart all services
docker compose restart
# Restart a specific service
docker compose restart grove-cache-service
# Rebuild and restart a service
docker compose up -d --build grove-cache-service# List running containers
docker compose ps
# Detailed status (CPU, memory usage)
docker stats
# Detailed info for a specific service
docker compose logs grove-cache-service# List volumes
docker volume ls | grep grove
# Volume details
docker volume inspect grove_milvus-data
# Clean up unused volumes
docker volume prune
# Reset all data (caution!)
docker compose down -v# Build all images
docker compose build
# Build without cache
docker compose build --no-cache
# Build a specific service
docker compose build grove-cache-service
# Parallel build (faster)
docker compose build --parallelSymptom: port is already allocated error
Solution:
# Check which process is using the port (macOS/Linux)
lsof -i :8080
# Change the port in the .env file
CACHE_SERVICE_PORT=18080
# Or kill the existing process
kill -9 <PID>Symptom: Service terminated due to OOM
Solution:
# Check Docker memory settings (Docker Desktop)
# Settings > Resources > Memory — allocate at least 8GB
# Or run only selected services
docker compose up -d redis grove-cache-service grove-frontendSymptom: Default values are still being used
Solution:
# Verify .env file location (same directory as docker-compose.yml)
ls -la .env
# Check environment variables
docker compose config
# Rebuild and restart
docker compose down
docker compose up -d --build