modelyard

Kubernetes manifests and test scripts for trying new open-weight LLMs on GPU clusters.

Each folder is one model and contains everything you need to pull weights, deploy an inference server, and smoke-test reasoning and tool calling via curl. Current inference backend is vLLM; SGLang and TensorRT-LLM variants may land in the same folder structure when useful.

Layout

modelyard/
└── <model-name>/
    ├── README           # run instructions, flag reference, test results, caveats
    ├── pvc.yaml         # PersistentVolumeClaim for model weights
    ├── model-download.yaml   # one-shot Job that pulls weights from HF
    ├── <model>.yaml     # Deployment + Service (recommended aggregated config)
    ├── <model>-disagg.yaml   # optional disaggregated prefill/decode variant
    └── script.sh        # curl-based smoke tests (reasoning + tool calling)

Usage

cd <model-name>

# 1. Set your storage class in pvc.yaml, then create the PVC and download weights:
kubectl apply -f pvc.yaml
kubectl apply -f model-download.yaml

# 2. Deploy the server once weights are downloaded:
kubectl apply -f <model>.yaml

# 3. Port-forward and run the smoke tests:
kubectl port-forward pod/<pod-name> 8000:8000
./script.sh

See each folder's own README for model-specific flags, expected resources, and any quirks worth knowing about.

Available models

Folder	Model	Backend	Notes
`deepseek-v4-flash`	deepseek-ai/DeepSeek-V4-Flash (284B MoE)	vLLM	Aggregated (4× H100/H200) or disaggregated prefill/decode

Caveats

Not production-ready. Expect to tune resource requests, probes, and strategies for your cluster.
GPU assignment assumes the NVIDIA k8s device plugin. Time-slicing / MIG configurations may need extra tweaks.
Recommended deployment strategy is Recreate for single-replica GPU-heavy workloads — a rolling update can leave two pods fighting over the same GPUs.

License

Apache License 2.0

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
deepseek-v4-flash		deepseek-v4-flash
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

modelyard

Layout

Usage

Available models

Caveats

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

modelyard

Layout

Usage

Available models

Caveats

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages