The skill runtime

Open-source runner

Turn a SKILL into a product pipeline.

Write the prompt. The agent plans and runs the steps — calling tools in parallel where it can — and traces every run, step by step.

SKILL.md
You are a refund agent.Stay within policy. Be fair. ## Check the policyMatch the order to the rules. ## Verify the orderLook up the order and charge. ## Assess the refundDecide: full, partial, or deny. ## Issue the refundRefund and email the customer.
done · 4 steps
Pipeline · live
Refund request
order · reason
Check the policy
refund_policy$0.002400
Look up the order
orders.get$0.006200
Check the charge
payments.get$0.006200
Assess the refund
claude/sonnet-4-6$0.0142
Issue the refund
needs approval
Refunded
issued · emailed
Job total$0.03

Every call metered to the cent — a line-item receipt per run.

What is a skill

A skill is one folder.

A prompt, its tools and evals, and a typed input/output — bundled, versioned, and deployed as one.

SKILL.mdprompt

## triage

## research

## draft

search.pytool

def search(q)

→ results

fetch.pytool

def fetch(url)

→ text

groundedeval

assert cited

score ≥ 0.9

schematyped I/O

in: question

out: answer

SKILLdeploy once · call it like an API

Deploy · call

Deploy once.
Call it anywhere.

One command ships your skill to a versioned endpoint — then call it from Python, TypeScript, or your coding agent over MCP.

01

Deploy with puras

pip install puras, then puras deploy — the CLI zips your skill, uploads it, and activates a version. No servers, no Dockerfile, no CI.

deploy · cli
$
02

Call it from your app

import puras and call it by its workspace/skillpack/skill path — Python, TypeScript, or MCP. Get a typed result back.

outreach.py

Read the quickstart

Built in, not bolted on

Reliable and observable
by default.

Every pipeline the runtime builds is graded, traced, gated, and resumable — you wire up none of it. Three you'd otherwise build yourself, live on a real pipeline.

Reliability · evals & guardrails

Test a skill like code.

Attach an eval suite as data — graded by schema, exact match, your own code, or an LLM judge — and gate every deploy on pass-rate with --threshold, so a regression never ships. Then guardrails enforce policy at runtime: PII redaction, prompt-injection blocks, and schema or tool-call rails that stop a bad output before it leaves the run.

A mismatched invoice total or a leaked PII field can never silently pass.

puras eval · live
Eval suite
cases.jsonl · 5×
totals_balance
check · 5/5
no_pii_leak
check · 4/5
Threshold gate
--threshold 0.9
Safe to deploy
pass-rate 0.93
✓ 0.93 ≥ 0.90 — ship it
pass-rate 0.93 ≥ 0.90 thresholdgate passed

Human-in-the-loop

Sensitive steps wait for a person.

Mark a high-stakes step — releasing a payment, sending a contract for signature — as needing approval in skill.yaml. The run pauses for a human decision, then resumes exactly where it left off.

The guardrail between an agent and an action it shouldn't take alone.

Run · awaiting approval
Invoice
vendor bill
3-way match
→ PO + receipt
Release payment
needs approval
Post to ledger
write_entry
Settled
paid + posted

approved · payment released · posted to ledger

Durable runs

Resume, don't restart.

Long runs checkpoint each step. A vendor rate-limit or a crash near the end resumes from the last good step instead of re-running the whole pipeline — and you are never double-charged for work already done.

You didn't wire retries — the runtime resumed.

Run · resumable
Accounts
batch · 5k
Validate rows
checkpoint ✓
Enrich firmographics
vendor API
Score & dedupe
claude/sonnet-4-6
Upserted
to warehouse
↻ resumed · not re-charged

resumed from the last good step · validated rows not re-charged

Built into the puras runtime

Your runtime learns from every run.

Because puras runs, traces, and grades every job, it can look back across them. Hindsight reads your recent runs — their traces, eval scores, and user feedback — and hands you concrete fixes. It's not a tool you wire up; it's the runtime improving your skill for you.

lead-enrichment

last 10 runs
#1042381602
#104141141
#1040361801
#103940151
#10388642
#103739170

Hindsight report

  • Outcome

Open source

No lock-in.

The whole runner is open source — the same engine, tools, and capabilities run on your machine and in our cloud. The cloud just runs it for you.

Local

open source · MIT

Run it yourself, on your own keys:

  • The full agent loop & your skill code
  • Your skill.yaml — same format, unchanged
  • Tools: media, web & memory
  • Evals, traces & versioning

pip install "puras[local]" — your machine, your keys.

Cloud

managed

The same engine, run for you — plus:

  • A fresh, isolated machine for every job — secure & sandboxed
  • Managed keys — nothing to wire up
  • Durable resume & human approvals
  • Dashboard, traces & per-job billing

One command: puras deploy.

Same skill format, same engine — walk away with the open-source runner anytime. View on GitHub

FAQ

Questions, answered.

One folder. A skill.yaml manifest with a typed input/output contract, plus a SKILL.md prompt — or a plain Python function — and any tools and evals it needs. puras deploy gives it an immutable, versioned endpoint you call like an API.

You don't author a graph. You write the work; the agent plans the steps, runs them in parallel, retries, and traces every one — then hands back a typed response, files, or the side effects. You watch it run as a live pipeline; you never wire one.

The agent chooses its own path, so two runs can differ. You get confidence a different way: every run is traced step by step, you gate each deploy on an eval suite in CI, and guardrails enforce policy at runtime — so behavior is pinned by tests and rails, not by freezing a graph.

Every run is traced step by step — each model call and tool call, with timing and cost — in the dashboard and over the API. You see exactly what happened, and get a line-item receipt for what it spent.

Mark a tool as needing confirmation in skill.yaml. When the agent calls it, the runtime pauses the run for a human to approve or reject, then resumes exactly where it left off. It's enforced by the runtime — never just asked for in the prompt.

You pay per job, to the cent, from a prepaid balance — only the model tokens and media a run actually uses, plus a flat 5% platform fee. Every run returns a line-item receipt.

import puras and call it by its workspace/skillpack/skill path — a skillpack is a deployed bundle of related skills — from Python, TypeScript, raw HTTP, or your coding agent over MCP (OAuth in the browser, no key to paste). You get a typed result, files, or the side effects back.

Pick any supported family per skill — claude/*, gpt/*, or gemini/* — pinned in skill.yaml and overridable per run. For local parity, pip install "puras[local]" runs the open-source runner — the same agent loop, on your own machine and keys.

Write the skill.
Skip the plumbing.

You define the work; the runtime builds and runs it as a traced, tested pipeline. Deploy once, call it like an API. $10 free credit, no card, no subscription.

Or browse example skills already deployed on puras.