Skip to content

ritazh/modelyard

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 

Repository files navigation

modelyard

Kubernetes manifests and test scripts for trying new open-weight LLMs on GPU clusters.

Each folder is one model and contains everything you need to pull weights, deploy an inference server, and smoke-test reasoning and tool calling via curl. Current inference backend is vLLM; SGLang and TensorRT-LLM variants may land in the same folder structure when useful.

Layout

modelyard/
└── <model-name>/
    ├── README           # run instructions, flag reference, test results, caveats
    ├── pvc.yaml         # PersistentVolumeClaim for model weights
    ├── model-download.yaml   # one-shot Job that pulls weights from HF
    ├── <model>.yaml     # Deployment + Service (recommended aggregated config)
    ├── <model>-disagg.yaml   # optional disaggregated prefill/decode variant
    └── script.sh        # curl-based smoke tests (reasoning + tool calling)

Usage

cd <model-name>

# 1. Set your storage class in pvc.yaml, then create the PVC and download weights:
kubectl apply -f pvc.yaml
kubectl apply -f model-download.yaml

# 2. Deploy the server once weights are downloaded:
kubectl apply -f <model>.yaml

# 3. Port-forward and run the smoke tests:
kubectl port-forward pod/<pod-name> 8000:8000
./script.sh

See each folder's own README for model-specific flags, expected resources, and any quirks worth knowing about.

Available models

Folder Model Backend Notes
deepseek-v4-flash deepseek-ai/DeepSeek-V4-Flash (284B MoE) vLLM Aggregated (4× H100/H200) or disaggregated prefill/decode

Caveats

  • Not production-ready. Expect to tune resource requests, probes, and strategies for your cluster.
  • GPU assignment assumes the NVIDIA k8s device plugin. Time-slicing / MIG configurations may need extra tweaks.
  • Recommended deployment strategy is Recreate for single-replica GPU-heavy workloads — a rolling update can leave two pods fighting over the same GPUs.

License

Apache License 2.0

About

Kubernetes manifests and test scripts for trying new open-weight LLMs

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages