skills

Skills to guide Claude Code, Codex, and other coding agents on using the Weights & Biases AI developer platform to train models and build agents.

For model training

Log metrics and rich media during model training and fine-tuning
Track model training experiments
Analyze runs and experiment results to understand how the model is learning
Tune hyperparameters

For agent building

Trace agentic AI applications
Analyze traces and classify them into failure modes
Evaluate models with labeled datasets
Run online evaluations for production monitoring

Getting Started

npx skills add wandb/skills

Then set your W&B API key:

export WANDB_API_KEY=<your-key>

npx skills is a utility for installing skills into major coding agent CLIs. Use --global to install for all projects, or --agent <name> to target a specific agent. See the npx skills docs for more details.

Available Skills

Skill	Description	Status
`wandb-primary`	Primary W&B skill for broad, mixed-surface W&B project analysis and workflows across runs, Weave, Reports, Signal Builder, and Launch.	experimental

Benchmarks

We maintain Skill Bench in this repository to evaluate public skill changes across coding agents and task categories. Skill Bench uses W&B Agent Factory as the eval runtime for task definitions, agent profiles, sandbox execution, and structured bench rows.

Pull requests run package validation by default. A maintainer can trigger live Skill Bench runs for larger changes.

Plan a local benchmark without model calls:

python3 -m skillbench.cli plan \
  --wbaf-root ../WandBAgentFactory \
  --candidate-ref HEAD \
  --skill wandb-primary

Category	Tasks	Claude Code (`sonnet4.6`)	Codex (`gpt-5.3-codex`)
Weave analysis	26	97%*	63%*
Weave tooling	11	95%*	83%*
Model training	8	90%*	85%*
LLM finetuning & RL analysis	14	72%*	86%*
Failure & outlier detection	8	86%*	63%*

*Pass rates are +/- 3%. Many tasks span multiple categories.

Contributing

See CONTRIBUTING.md.

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
.badges		.badges
.github/workflows		.github/workflows
.reuse/templates		.reuse/templates
bench		bench
scripts		scripts
skillbench		skillbench
skills/wandb-primary		skills/wandb-primary
tests		tests
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
cla.md		cla.md
install.sh		install.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

skills

For model training

For agent building

Getting Started

Available Skills

Benchmarks

Contributing

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

skills

For model training

For agent building

Getting Started

Available Skills

Benchmarks

Contributing

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages