A fine-tuning toolkit for training small language models on Infrastructure-as-Code using reinforcement learning (GRPO/DAPO).
InfraMind fine-tunes SLMs using GRPO/DAPO with domain-specific rewards to generate valid Terraform, Kubernetes, Docker, and CI/CD configurations.
| Model | Method | Accuracy | HuggingFace |
|---|---|---|---|
| inframind-0.5b-grpo | GRPO | 97.3% | srallabandi0225/inframind-0.5b-grpo |
| inframind-0.5b-dapo | DAPO | 96.4% | srallabandi0225/inframind-0.5b-dapo |
InfraMind is a fine-tuning toolkit that:
- Takes an existing small language model (Qwen, Llama, etc.)
- Fine-tunes it using reinforcement learning (GRPO)
- Uses infrastructure-specific reward functions to guide learning
- Produces a model capable of generating valid Infrastructure-as-Code
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────────┐
│ Base Model │ → │ InfraMind │ → │ Fine-tuned Model │
│ Qwen2.5-0.5B │ │ GRPO Training │ │ inframind-0.5b-grpo│
│ -Instruct │ │ + IaC Rewards │ │ (97.3% accuracy) │
└─────────────────┘ └─────────────────┘ └─────────────────────┘
│
▼
┌─────────────────────┐
│ DAPO Training │
│ inframind-0.5b-dapo│
│ (96.4% accuracy) │
└─────────────────────┘
| Component | Description |
|---|---|
| InfraMind-Bench | Benchmark dataset with 500+ IaC tasks |
| IaC Rewards | Domain-specific reward functions for Terraform, K8s, Docker, CI/CD |
| Training Pipeline | GRPO implementation for infrastructure-focused fine-tuning |
Large Language Models (GPT-4, Claude) can generate Infrastructure-as-Code, but:
- Cost: API calls add up ($100s-$1000s/month for teams)
- Privacy: Your infrastructure code is sent to external servers
- Offline: Doesn't work in air-gapped/secure environments
- Customization: Can't fine-tune on your specific patterns
Small open-source models (< 1B parameters) fail at IaC because:
- They hallucinate resource names (
aws_ec2instead ofaws_instance) - They generate invalid syntax that won't pass
terraform validate - They ignore security best practices
- Traditional fine-tuning (SFT/LoRA) only memorizes patterns, doesn't teach reasoning
InfraMind fine-tunes small models using reinforcement learning to reason about infrastructure, not just memorize examples.
| Approach | Method | Result |
|---|---|---|
| SFT/LoRA | "Memorize this Terraform example" | Copies patterns, fails on novel tasks |
| InfraMind | "Generate Terraform, I'll score if it's valid" | Learns reasoning, handles new tasks |
InfraMind uses domain-specific rewards:
Reward = α × Syntax + β × Correctness + γ × Format
Where:
- Syntax: Does it pass `terraform validate`?
- Correctness: Are the right resources used?
- Format: Is the structure proper?
- InfraMind-Bench: 500+ tasks across Terraform, Kubernetes, Docker, CI/CD
- GRPO Training: Reinforcement learning that teaches reasoning
- Model Agnostic: Works with Qwen, Llama, Mistral, or any HuggingFace model
- Alpaca Format: Compatible with standard training pipelines
- Local-first: Runs entirely on your machine
pip install inframindOr from source:
git clone https://github.com/saikiranrallabandi/inframind.git
cd inframind
pip install -e .from inframind import create_dataset, InfraMindTrainer
# Load 500+ IaC tasks
dataset = create_dataset(size=100)
# Fine-tune with InfraMind (GRPO + IaC rewards)
trainer = InfraMindTrainer(model_name="Qwen/Qwen2.5-0.5B-Instruct")
trainer.train(dataset, epochs=1)
# Save your fine-tuned model
trainer.save("./qwen-0.5b-inframind")529 infrastructure tasks in Alpaca format:
{
"instruction": "Create Terraform for AWS EC2 instance",
"input": "t2.micro instance type",
"output": ""
}| Category | Tasks | Examples |
|---|---|---|
| Terraform | 225 | EC2, S3, VPC, RDS, EKS, Lambda, IAM |
| Kubernetes | 138 | Deployments, Services, Ingress, RBAC |
| Docker | 70 | Dockerfiles, docker-compose |
| CI/CD | 96 | GitHub Actions, GitLab CI, Jenkins |
┌─────────────────────────────────────────────────────────────────┐
│ InfraMind TRAINING │
├─────────────────────────────────────────────────────────────────┤
│ │
│ For each IaC task: │
│ │
│ 1. GENERATE: Model produces multiple IaC outputs │
│ "Create EC2" → [output1, output2] │
│ │
│ 2. SCORE: Reward function evaluates each │
│ output1: syntax=1.0, correct=0.8, format=0.9 → 0.89 │
│ output2: syntax=0.0, correct=0.5, format=0.7 → 0.38 │
│ │
│ 3. ADVANTAGE: Compare within group (GRPO) │
│ output1: above average → positive advantage │
│ output2: below average → negative advantage │
│ │
│ 4. UPDATE: Increase probability of better outputs │
│ Model learns: "valid syntax = higher reward" │
│ │
└─────────────────────────────────────────────────────────────────┘
from inframind import IaCReward
reward = IaCReward(alpha=0.4, beta=0.3, gamma=0.3)
# Score a Terraform output
score, details = reward.score(terraform_code, category="terraform")
# score: 0.85
# details: {"syntax": 1.0, "correctness": 0.8, "format": 0.75}Reward components:
| Component | Weight | What it measures |
|---|---|---|
| Syntax | 0.4 | Valid resource declarations |
| Correctness | 0.3 | Right resource types used |
| Format | 0.3 | Proper structure (balanced braces, etc.) |
InfraMind supports multiple training environments:
| Platform | Script | GPU Required |
|---|---|---|
| Local GPU | python train_local.py |
Yes |
| Modal.com | modal run grpo_training.py |
Provided |
| AWS SageMaker | Upload + HF Estimator | Yes |
| GCP Vertex AI | Custom training job | Yes |
| Azure ML | HF integration | Yes |
| HuggingFace Spaces | accelerate launch train_local.py |
Yes |
| Google Colab | Run notebook | Free GPU |
# GRPO Training
python train_local.py --method grpo --epochs 3 --output ./models/grpo
# DAPO Training (from GRPO checkpoint)
python train_local.py --method dapo --checkpoint ./models/grpo --output ./models/dapo
# Quick test with 100 samples
python train_local.py --method grpo --samples 100 --epochs 1
# Evaluate trained model
python train_local.py --evaluate ./models/grpo
# Generate IaC
python train_local.py --generate ./models/grpo --prompt "Create Terraform for AWS EC2"
# Multi-GPU with Accelerate
accelerate launch train_local.py --method grpo --epochs 3# GRPO Training (Stage 1)
modal run grpo_training.py
# DAPO Training (Stage 2 - starts from GRPO checkpoint)
modal run dapo_training.py
# Evaluate GRPO model
modal run grpo_training.py::evaluate
# Evaluate DAPO model
modal run dapo_training.py::evaluate
# Quick test DAPO (110 samples)
modal run dapo_training.py::quick_test# Train only on Terraform
python scripts/train.py --category terraform --epochs 5
# Train only on Kubernetes
python scripts/train.py --category kubernetes --epochs 5from inframind import create_dataset, InfraMindTrainer
# Load specific categories
dataset = create_dataset(categories=["terraform", "kubernetes"], size=200)
# Configure trainer
trainer = InfraMindTrainer(
model_name="Qwen/Qwen2.5-0.5B-Instruct",
lr=1e-5,
group_size=4 # More samples per task for better GRPO
)
# Train
history = trainer.train(dataset, epochs=3)
# Check progress
for epoch in history:
print(f"Epoch {epoch['epoch']}: Reward = {epoch['mean_reward']:.3f}")| Project | Type | Method | IaC-specific |
|---|---|---|---|
| devops-slm-v1 | Fine-tuned Model | LoRA/SFT | Yes |
| AIAC | CLI Tool | Prompting (API) | Yes |
| GPT-4 / Claude | API Service | - | No |
| InfraMind | Fine-tuning Toolkit | GRPO | Yes |
Key differentiator: InfraMind is a fine-tuning toolkit, not a model or API wrapper. It uses reinforcement learning with infrastructure-specific rewards to fine-tune any SLM for IaC generation.
inframind/
├── inframind/
│ ├── __init__.py # Package exports
│ ├── dataset.py # InfraMind-Bench (500+ tasks)
│ ├── rewards.py # IaC reward functions
│ └── train.py # GRPO trainer
├── scripts/
│ └── train.py # Training CLI
├── examples/
│ └── quickstart.py # Quick start example
├── README.md
├── LICENSE
└── pyproject.toml
- InfraMind-Bench dataset (500+ tasks)
- Fine-tuning pipeline with GRPO/DAPO
- Domain-specific reward functions
- Release fine-tuned models on HuggingFace
- inframind-0.5b-grpo (97.3%)
- inframind-0.5b-dapo (96.4%)
- Real validation integration (
terraform validate) - Security scoring (
tfsec,checkov) - CLI tool (
inframind generate "create S3 bucket")
@misc{rallabandi2025inframind,
title={InfraMind: Fine-tuning Small Language Models for Infrastructure-as-Code Generation with Reinforcement Learning},
author={Rallabandi, Sai Kiran},
year={2025},
publisher={HuggingFace},
url={https://huggingface.co/srallabandi0225/inframind-0.5b-grpo}
}Contributions welcome! See CONTRIBUTING.md for guidelines.
MIT License - see LICENSE for details.
- Qwen Team for the base model
- DeepSeek for GRPO
- NVIDIA NeMo for DAPO reference
- TRL for training infrastructure
- Stanford Alpaca for data format