Skip to content

wandb/skills

Repository files navigation

skills

Codex Claude Code

Skills to guide Claude Code, Codex, and other coding agents on using the Weights & Biases AI developer platform to train models and build agents.

For model training

  • Log metrics and rich media during model training and fine-tuning
  • Track model training experiments
  • Analyze runs and experiment results to understand how the model is learning
  • Tune hyperparameters

For agent building

  • Trace agentic AI applications
  • Analyze traces and classify them into failure modes
  • Evaluate models with labeled datasets
  • Run online evaluations for production monitoring

Getting Started

npx skills add wandb/skills

Then set your W&B API key:

export WANDB_API_KEY=<your-key>

npx skills is a utility for installing skills into major coding agent CLIs. Use --global to install for all projects, or --agent <name> to target a specific agent. See the npx skills docs for more details.

Available Skills

Skill Description Status
wandb-primary Primary W&B skill for broad, mixed-surface W&B project analysis and workflows across runs, Weave, Reports, Signal Builder, and Launch. experimental

Benchmarks

We maintain Skill Bench in this repository to evaluate public skill changes across coding agents and task categories. Skill Bench uses W&B Agent Factory as the eval runtime for task definitions, agent profiles, sandbox execution, and structured bench rows.

Pull requests run package validation by default. A maintainer can trigger live Skill Bench runs for larger changes.

Plan a local benchmark without model calls:

python3 -m skillbench.cli plan \
  --wbaf-root ../WandBAgentFactory \
  --candidate-ref HEAD \
  --skill wandb-primary
Category Tasks Claude Code (sonnet4.6) Codex (gpt-5.3-codex)
Weave analysis 26 97%* 63%*
Weave tooling 11 95%* 83%*
Model training 8 90%* 85%*
LLM finetuning & RL analysis 14 72%* 86%*
Failure & outlier detection 8 86%* 63%*

*Pass rates are +/- 3%. Many tasks span multiple categories.

Contributing

See CONTRIBUTING.md.

About

Official Agent Skills for Weights & Biases Models and Weave

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors