The skill runtime

Turn a SKILL into a product pipeline.

Write the prompt. The agent plans and runs the steps — calling tools in parallel where it can — and traces every run, step by step.

Create workspace — $10 free, no card Read the quickstart

SKILL.md

You are a refund agent.Stay within policy. Be fair. ## Check the policyMatch the order to the rules. ## Verify the orderLook up the order and charge. ## Assess the refundDecide: full, partial, or deny. ## Issue the refundRefund and email the customer.

done · 4 steps

Pipeline · live

Refund request

order · reason

Check the policy

refund_policy$0.002400

Look up the order

orders.get$0.006200

Check the charge

payments.get$0.006200

Assess the refund

claude/sonnet-4-6$0.0142

Issue the refund

needs approval

Refunded

issued · emailed

Job total$0.03

Every call metered to the cent — a line-item receipt per run.

What is a skill

A skill is one folder.

A prompt, its tools and evals, and a typed input/output — bundled, versioned, and deployed as one.

SKILL.mdprompt

## triage

## research

## draft

search.pytool

def search(q)

→ results

fetch.pytool

def fetch(url)

→ text

groundedeval

assert cited

score ≥ 0.9

schematyped I/O

in: question

out: answer

SKILLdeploy once · call it like an API

Deploy · call

Deploy once.
Call it anywhere.

One command ships your skill to a versioned endpoint — then call it from Python, TypeScript, or your coding agent over MCP.

Deploy with puras

pip install puras, then puras deploy — the CLI zips your skill, uploads it, and activates a version. No servers, no Dockerfile, no CI.

deploy · cli

Call it from your app

import puras and call it by its workspace/skillpack/skill path — Python, TypeScript, or MCP. Get a typed result back.

outreach.py

Read the quickstart

Built in, not bolted on

Reliable and observable
by default.

Every pipeline the runtime builds is graded, traced, gated, and resumable — you wire up none of it. Three you'd otherwise build yourself, live on a real pipeline.

Reliability · evals & guardrails

Test a skill like code.

Attach an eval suite as data — graded by schema, exact match, your own code, or an LLM judge — and gate every deploy on pass-rate with --threshold, so a regression never ships. Then guardrails enforce policy at runtime: PII redaction, prompt-injection blocks, and schema or tool-call rails that stop a bad output before it leaves the run.

A mismatched invoice total or a leaked PII field can never silently pass.

puras eval · live

Eval suite

cases.jsonl · 5×

totals_balance

check · 5/5

no_pii_leak

check · 4/5

Threshold gate

--threshold 0.9

Safe to deploy

pass-rate 0.93

✓ 0.93 ≥ 0.90 — ship it

pass-rate 0.93 ≥ 0.90 thresholdgate passed

Human-in-the-loop

Sensitive steps wait for a person.

Mark a high-stakes step — releasing a payment, sending a contract for signature — as needing approval in skill.yaml. The run pauses for a human decision, then resumes exactly where it left off.

The guardrail between an agent and an action it shouldn't take alone.

Run · awaiting approval

Invoice

vendor bill

3-way match

→ PO + receipt

Release payment

needs approval

Post to ledger

write_entry

Settled

paid + posted

approved · payment released · posted to ledger

Durable runs

Resume, don't restart.

Long runs checkpoint each step. A vendor rate-limit or a crash near the end resumes from the last good step instead of re-running the whole pipeline — and you are never double-charged for work already done.

You didn't wire retries — the runtime resumed.

Run · resumable

Accounts

batch · 5k

Validate rows

checkpoint ✓

Enrich firmographics

vendor API

Score & dedupe

claude/sonnet-4-6

Upserted

to warehouse

↻ resumed · not re-charged

resumed from the last good step · validated rows not re-charged

Built into the puras runtime

Your runtime learns from every run.

Because puras runs, traces, and grades every job, it can look back across them. Hindsight reads your recent runs — their traces, eval scores, and user feedback — and hands you concrete fixes. It's not a tool you wire up; it's the runtime improving your skill for you.

lead-enrichment

last 10 runs

#1042381602

#104141141

#1040361801

#103940151

#10388642

#103739170

Hindsight report

Outcome

Open source

No lock-in.

The whole runner is open source — the same engine, tools, and capabilities run on your machine and in our cloud. The cloud just runs it for you.

Local

open source · MIT

Run it yourself, on your own keys:

The full agent loop & your skill code
Your skill.yaml — same format, unchanged
Tools: media, web & memory
Evals, traces & versioning

pip install "puras[local]" — your machine, your keys.

Cloud

managed

The same engine, run for you — plus:

A fresh, isolated machine for every job — secure & sandboxed
Managed keys — nothing to wire up
Durable resume & human approvals
Dashboard, traces & per-job billing

One command: puras deploy.

Same skill format, same engine — walk away with the open-source runner anytime. View on GitHub

FAQ

Questions, answered.

One folder. A skill.yaml manifest with a typed input/output contract, plus a SKILL.md prompt — or a plain Python function — and any tools and evals it needs. puras deploy gives it an immutable, versioned endpoint you call like an API.

You don't author a graph. You write the work; the agent plans the steps, runs them in parallel, retries, and traces every one — then hands back a typed response, files, or the side effects. You watch it run as a live pipeline; you never wire one.

The agent chooses its own path, so two runs can differ. You get confidence a different way: every run is traced step by step, you gate each deploy on an eval suite in CI, and guardrails enforce policy at runtime — so behavior is pinned by tests and rails, not by freezing a graph.

Every run is traced step by step — each model call and tool call, with timing and cost — in the dashboard and over the API. You see exactly what happened, and get a line-item receipt for what it spent.

Mark a tool as needing confirmation in skill.yaml. When the agent calls it, the runtime pauses the run for a human to approve or reject, then resumes exactly where it left off. It's enforced by the runtime — never just asked for in the prompt.

You pay per job, to the cent, from a prepaid balance — only the model tokens and media a run actually uses, plus a flat 5% platform fee. Every run returns a line-item receipt.

import puras and call it by its workspace/skillpack/skill path — a skillpack is a deployed bundle of related skills — from Python, TypeScript, raw HTTP, or your coding agent over MCP (OAuth in the browser, no key to paste). You get a typed result, files, or the side effects back.

Pick any supported family per skill — claude/*, gpt/*, or gemini/* — pinned in skill.yaml and overridable per run. For local parity, pip install "puras[local]" runs the open-source runner — the same agent loop, on your own machine and keys.

Write the skill.
Skip the plumbing.

You define the work; the runtime builds and runs it as a traced, tested pipeline. Deploy once, call it like an API. $10 free credit, no card, no subscription.

Read the quickstart Create workspace — $10 free

Or browse example skills already deployed on puras.