pkspec

[experimental] A language-agnostic test runner built on Pkl. Generalizes the retry / sharding / retry-on-fail machinery that playwright test provides for one ecosystem to any kind of test — shell, HTTP, browser, SQL, and whatever you teach the runner next. First-class support for spec-driven authoring, property-based testing, fuzzing, snapshot, and differential testing across language implementations.

Start with the quick-start guide:

amends "./pkspec/Test.pkl"

tests {
  new {
    name = "login_smoke"
    specRef { "LOGIN-001" }
    steps {
      new { http = new HttpRequest { url = "http://localhost/login"; method = "POST"; body = "..." }
            expectStatus = 200 }
      new { name = "judge_message"
            http = new HttpRequest { url = "http://localhost/login/welcome" }
            expectAi = new AiAssertion {
              prompt = "the response acknowledges the user in English"
              cmd = "claude --no-stream"
              snapshotName = "login-welcome"
            } }
    }
  }
}

pkspec exec -f Test.pkl --shard=2/4         # 4-way history-balanced split
pkspec exec -f Test.pkl --rerun-failed      # only previously failed tests
pkspec timings -f Test.pkl --shard=2/4      # preview the shard without running
pkspec spec tests/**/*.pkl                  # render SPEC.md from Scenario tags

Why Pkl

Tests are typed values, not bash scripts or YAML. Pkl gives the schema:

Static checks at author time: a Test with both cmd and steps is rejected before the runner ever starts.
Composition: a step body is just a Step value — reuse it across scenarios, parameterize it via Pkl import, generate it from a property-based input.
Language-independent: the schema lives in pkl/, the runner in Go (cmd/pkspec/). Low-level step kinds can still be Go executors, while whole native runners are described by the Pkl adapter DSL so every ecosystem does not become a hard-coded core dependency.
Reusable with pkl test: pkspec rendered output is a Pkl module; Pkl's own facts / examples / snapshot machinery still applies, and pkspec run wraps pkl test so its unreliable exit code becomes CI-trustworthy.

Generalized retry, sharding, and timing

The features playwright test ships as built-ins (--retries, --shard=K/N, last-failed re-runs) generalized to every kind:

feature	flag / schema
per-attempt retry	`Test.retries`, `Test.flakyAcceptable`
cross-run shard split	`pkspec exec --shard=K/N` (LPT bin-packing)
rerun last fail set	`pkspec exec --rerun-failed`
global wall-clock cap	`pkspec exec --total-timeout=5m`
per-test wall-clock cap	`Test.timeoutSec`
polling / eventually	`Step.eventually = new { intervalMs; timeoutSec }`
inspection / preview	`pkspec timings -f Test.pkl --shard=K/N`

Sharding uses an append-only .pkspec/timings.jsonl history, median of the most recent 5 runs per test, Longest-Processing-Time bin-packing with deterministic tie-breaking. The same input produces the same shard assignment on every machine. See docs/notes/timing-shard.md, including the GitHub Actions matrix recipe.

Built-in test kinds

kind	schema class	what it does
`shell`	`Step.cmd`	spawn a subprocess; assert exit / stdout / stderr / contains / regex / JSONPath / snapshot
`http`	`Step.http`	HTTP request; assert status / headers / body / jsonpath / cassette
`playwright`	`Step.playwright`	embedded Node harness — single page, pixel diff, console asserts
`playwrightTest`	`Step.playwrightTest`	wrap `@playwright/test` — fixtures, traces, JUnit roundtrip
`sql`	`Step.sql`	embedded SQLite (`modernc.org/sqlite`) — read + DML

A new low-level Step kind is three things:

a Pkl class on the Step (<Kind>Spec)
a Go executor under internal/executor/<kind>.go
a value for Step.kind (the computed discriminator that drives dispatch)

See docs/notes/runner-design.md for the architectural sketch, and the per-kind notes: playwright / playwright-test / http-dsl / cassettes / sql / shell output assertions.

Adapter suites

For existing native runners, prefer the Pkl adapter DSL over adding one Go executor per ecosystem. pkl/Adapter.pkl defines an abstract Adapter, and built-ins are ordinary Pkl subclasses:

pkl/adapters/Vitest.pkl
pkl/adapters/Playwright.pkl
pkl/adapters/NodeTest.pkl
pkl/adapters/GoTest.pkl
pkl/adapters/MoonTest.pkl

Projects select and specialize adapters with extends:

amends "./pkspec/Adapter.pkl"

import "./pkspec/adapters/Vitest.pkl" as Vitest

local class WebVitest extends Vitest.Vitest {
  configPath = "packages/web/vitest.config.ts"
  include = new { "src/**/*.test.ts" }
}

suites {
  new {
    name = "web-unit"
    adapter = new WebVitest {}
    overlays {
      ["src/parser.test.ts::empty input"] = new CaseOverlay {
        specRef { "parser.empty" }
      }
    }
  }
}

Run adapter modules with pkspec adapter -f Adapter.pkl. The runtime executes the generic protocol (discover JSON, manifest run, JSONL events) and post-run coverage collectors. Native shim commands are installed as pkspec-adapter-vitest, pkspec-adapter-playwright, pkspec-adapter-node-test, pkspec-adapter-go-test, and pkspec-adapter-moon-test; built-in adapters select those commands from Pkl instead of a Go registry. See docs/notes/adapters.md.

Spec-driven authoring

Three layers, from low to high:

Test.pkl (low) — declare concrete subprocess / HTTP / browser invocations with explicit expectations.

Spec.pkl (mid) — BDD-style Given / When / Then scenarios that desugar to Tests. A scenario tagged spec with an empty body is auto-pending — the description is the spec, the body lands later without renaming the test. pkspec spec renders Markdown SPEC.md from the scenarios.

expectAi (orthogonal) — fuzzy natural-language assertions on response bodies, delegated to an external judge command (typically an LLM wrapper). The verdict is cached by sha256(prompt + body) under .pkspec/ai-snapshots/; identical inputs reuse the cached verdict and never spawn the judge.

expectAi = new AiAssertion {
  prompt = "the response acknowledges the user in English"
  cmd = "claude --no-stream"
  snapshotName = "greeting-acknowledges-user"
}

Spec knowledge graph + Goals — Spec.pkl scenarios are nodes in a graph: each carries a stable id, lifecycle (reviewStatus draft/review/approved + deprecated), severity, edges (dependsOn / supersedes / replacedBy / parent for sub-specs), an append-only decision log, and a list of open questions. Goals are sibling user-value statements with no test of their own; scenarios point at them via contributes. Test.pkl implements scenarios via specRef. The runner prints (verifies AUTH-001) on each test line, and the default pkspec spec Markdown includes a per-spec implementation index so reviewers can scan Scenario.id back to active tests or implementedAt pointers. The review/CI surface is exposed as top-level commands:

pkspec check — CI gate: exit 1 on any non-draft non-deprecated spec without an implementing test
pkspec coverage — declared vs implemented, broken down by severity and review-status
pkspec graph — graphviz dot of the knowledge graph, including test/code/doc implementation backlinks
pkspec decisions — newest-first Markdown decision log
pkspec goals — Goals listed by priority with per-Goal coverage
pkspec milestones — release/planning Milestones with Goal progress rollups
pkspec next — unimplemented specs ranked by Goal priority + severity ("what to work on next")
pkspec implementations — the reverse index only: spec id → active tests / code / doc pointers
pkspec orphans — active tests that still need a specRef
pkspec lint — broken/deprecated spec links and authoring invariants
pkspec docs --audience X — audience-specific Markdown projection from audience { "X" } or audience:X tags, with implementation details hidden by default

// Spec.pkl
goals {
  new Goal {
    id = "GOAL-SECURE-AUTH"
    name = "users can authenticate securely"
    priority = 90
    reviewStatus = "approved"
  }
}

scenarios {
  new {
    id = "AUTH-001"
    name = "valid credentials"
    severity = "critical"
    reviewStatus = "approved"
    contributes { "GOAL-SECURE-AUTH" }
    dependsOn { "SESSION-001" }
    decisions {
      new Decision {
        date = "2026-03-01"
        author = "mizchi"
        summary = "lock the spec to cookie-based auth"
      }
    }
    tags { "spec" }
  }
  new {
    id = "AUTH-001a"
    name = "valid credentials happy path"
    parent = "AUTH-001"      // sub-spec refines the parent
    contributes { "GOAL-SECURE-AUTH" }
    tags { "spec" }
  }
}

// Test.pkl
new Test { name = "login_happy_path"; specRef { "AUTH-001" }; cmd = "..." }

Shared setup across scenarios uses the top-level prelude (Cucumber Background:). See docs/notes/spec.md / docs/notes/ai-assertion.md / docs/notes/spec-id.md / docs/notes/spec-graph.md and examples/spec-graph/. For project-level local gates and task-runner contracts, see docs/notes/project-gates.md. For advanced Goal progress methods and Milestone rollups, see docs/advanced/goals-and-milestones.md.

Property-based testing, fuzzing, differential testing

QuickCheck-style PBT — Test.iterations, Test.inputs (abstract Input schema with concrete IntInput, …), seed-deterministic generation in Pkl, input-space shrinking in Go. Works with every kind, so generated inputs can drive a shell cmd, an HTTP body, or a SQL parameter. See docs/notes/quickcheck.md.
Snapshot testing — reference bytes under .pkspec/snapshots/<name>.bytes, written on first run, committed to git. Inline snapshots (inlineStdout) get rewritten in-place via --update-inline-snapshots. Mid-port, the reference implementation IS the spec — pkspec runs it, captures the bytes, asserts every port matches. See docs/notes/snapshots.md.
Differential testing across language implementations — two or more impls of the same spec, the same input, the same expected bytes. Snapshots make this trivial: capture from the reference once, every port must match.

Lifecycle, hooks, backgrounds

Beyond the per-test plumbing:

Hooks — before { all { ... }; each { ... } } and after, scoped (all / each), LIFO for after, with stdout-capture into env vars. See docs/notes/hooks.md.
Backgrounds — long-running auxiliary processes with readyProbe and optional portEnv for dynamic-port allocation.
Ephemeral workdirs — Test.ephemeralWorkdir = true for an auto-temp dir cleaned at test exit.

Install

Nix (recommended)

nix run github:mizchi/pkspec/v0.3.0 -- init --dir pkspec
nix run github:mizchi/pkspec/v0.3.0 -- exec -f path/to/Test.pkl
nix profile install github:mizchi/pkspec/v0.3.0

The flake builds pkspec plus the built-in adapter shim binaries and wraps them so the bundled Pkl CLI is on PATH automatically. That Pkl CLI is the upstream native binary, not the Java/JAR build from nixpkgs. The Nix workflow on every push to main and every PR builds the flake on aarch64-darwin and x86_64-linux; the badge above tracks its status. The Go workflow also runs go test ./... and a go install smoke on both platforms.

In a home-manager flake:

{
  inputs.pkspec.url = "github:mizchi/pkspec/v0.3.0";
  inputs.pkspec.inputs.nixpkgs.follows = "nixpkgs";

  outputs = { self, nixpkgs, home-manager, pkspec, ... }:
  let
    system = "aarch64-darwin";
    pkgs = import nixpkgs { inherit system; };
  in {
    homeConfigurations.example = home-manager.lib.homeManagerConfiguration {
      inherit pkgs;
      modules = [
        pkspec.homeManagerModules.default
        {
          programs.pkspec.enable = true;
        }
      ];
    };
  };
}

programs.pkspec.enable = true installs both pkspec and pkl-native by default. Set programs.pkspec.installPkl = false if you only want the wrapped pkspec binary and do not want a standalone pkl command in home.packages.

Go

go install github.com/mizchi/pkspec/cmd/...@v0.3.0
pkspec init --dir pkspec

You also need the Pkl CLI on PATH — that's exactly the friction Nix removes.

After pkspec init, author test modules against the generated local schemas:

amends "./pkspec/Test.pkl"

tests {
  new {
    name = "smoke"
    cmd = "true"
  }
}

GitHub Actions

A setup-only composite action lives at the repo root:

jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v6
      - uses: mizchi/pkspec@v0.3.0
        with:
          init-schema-dir: pkspec
      - run: pkspec exec -f Test.pkl

The action installs pkspec and the Pkl CLI, then adds both to PATH. init-schema-dir is optional; set it when the workflow should materialize local Test.pkl / Spec.pkl / QuickCheck.pkl / Adapter.pkl schemas and built-in adapter modules.

Inputs:

Input	Default	Notes
`version`	the action ref, falling back to latest release	Accepts `local`, `v0.3.0`, `0.3.0`, `v0`, or `latest`.
`pkl-version`	`0.31.1`	Set to `none` to skip Pkl install.
`setup-go`	`true`	Uses this action's `go.mod` to install the Go toolchain needed for `go install`.
`install-dir`	`${{ runner.temp }}/pkspec-bin`	Added to `PATH`.
`init-schema-dir`	empty	Optional target for `pkspec init --dir`.
`init-force`	`false`	Passes `--force` when initializing schemas.
`cache-pkl`	`false`	Set to `true` to cache `~/.pkl/cache`.
`pkl-cache-key`	`pkl-<hashFiles>`	Override the default Pkl cache key.
`github-token`	`${{ github.token }}`	Used only for `latest` / `v0` release lookup.

CLI

pkspec init --dir pkspec                  write Test.pkl / Spec.pkl / QuickCheck.pkl / Adapter.pkl schemas

pkspec exec -f Test.pkl                    run all tests in a module
pkspec exec -f Test.pkl --tag spec         filter by Test.tags (repeatable, OR)
pkspec exec -f Test.pkl --only login       filter by name substring (repeatable, OR)
pkspec exec -f Test.pkl --shard=K/N        run only the K-th shard of N (LPT)
pkspec exec -f Test.pkl --rerun-failed     only tests whose latest record is non-pass
pkspec exec -f Test.pkl --total-timeout=5m abort run after wall-clock cap
pkspec exec -f Test.pkl --junit-reports DIR write JUnit XML

pkspec run [pkl test args...]              wrap `pkl test` with a trustworthy exit code

pkspec adapter -f Adapter.pkl              run adapter discover/run protocol and collectors
pkspec adapter -f Adapter.pkl --dry-run    discover and print merged adapter cases

pkspec spec tests/**/*.pkl                 render Markdown SPEC.md from Scenario tags
pkspec spec tests/**/*.pkl --output SPEC.md
pkspec spec tests/**/*.pkl --tag spec
pkspec docs --audience pm Spec.pkl         render PM-facing docs from audience metadata
pkspec docs --audience end-user --output docs/USER.md Spec.pkl
pkspec check Spec.pkl Test.pkl             CI gate: declared specs vs implementing tests
pkspec coverage Spec.pkl Test.pkl          coverage report by severity / review-status
pkspec graph Spec.pkl Test.pkl             graphviz dot of spec edges + implementation backlinks
pkspec decisions Spec.pkl Test.pkl         newest-first Markdown decision log
pkspec goals Spec.pkl Test.pkl             user-facing Goals + their contributing-spec coverage
pkspec milestones Spec.pkl Test.pkl        release/planning Milestones + Goal progress
pkspec next Spec.pkl Test.pkl              unimplemented specs ranked by Goal priority + severity
pkspec implementations Spec.pkl Test.pkl   spec id -> tests/code/doc implementers
pkspec orphans Test.pkl...                 active tests with no specRef (spec-link backlog)
pkspec lint Spec.pkl Test.pkl...           convention checks: broken/deprecated refs, descriptions, ...
pkspec lint --lint-disable lint.X          suppress one or more rule ids (comma-separated)
pkspec spec --template scenario|goal|module  print a Pkl skeleton (no input files needed)
pkspec spec --discover                     auto-walk for Spec.pkl / Test.pkl / specs/*.pkl
pkspec check --strict                      verify implementedAt paths exist on disk
pkspec check --goal goal.X                 filter review commands to one Goal
pkspec check --severity critical           filter review commands to one severity

pkspec timings -f Test.pkl                 per-test runs / median / p90 / latest / kind
pkspec timings -f Test.pkl --failing       only tests whose latest record is non-pass
pkspec timings -f Test.pkl --shard=K/N     preview which tests would land in shard K/N

PKSPEC_TIMING_ENV=ci-linux pkspec exec ... tags timing records with an explicit environment so CI history doesn't poison local-machine shard balancing (or vice-versa).

For JUnit report semantics and CI publishing notes, see docs/notes/junit.md.

Development

Project maintenance tasks are defined in Taskfile.pkl and run with pkfire:

pkf list
pkf run test
pkf run build
pkf run init-smoke
pkf run release-check

nix develop includes pkf, go, pkl, and gopls. To create and push a release tag after release-check passes:

pkf run tag --version=0.3.0

Pushing v*.*.* tags triggers the Release workflow. It validates the Nix package on Linux and macOS, smoke-tests the wrapped pkl, then creates or updates the GitHub Release so latest resolves to the new tag.

Status

Active development, frequent API churn. v0.1.x is the first dogfooding line for Nix flakes, GitHub Actions, and go install; expect schema and CLI changes before a stability promise.

For decision history per phase, see findings.md; the time-ordered raw log. For thematic deep dives, see docs/notes/ and docs/advanced/.

If you are looking for a real task runner rather than a test runner, see mizchi/pkfire; pkspec is its testing-focused sibling.

Acknowledgements

The Scenario.openQuestions field and the open-questions-policy recipe were prompted by reading NyxFoundation/speca, a specification-anchored security-audit framework. pkspec borrows a small idea — keeping unresolved authoring questions first-class on the spec so they cannot be silently rolled over — and does not import speca's broader proof-attempt pipeline or security framing.

License

MIT.

Name		Name	Last commit message	Last commit date
Latest commit History 125 Commits
.github		.github
cmd		cmd
docs		docs
examples		examples
experiments		experiments
harness		harness
internal		internal
nix		nix
pkl		pkl
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
SPEC.pkl		SPEC.pkl
TODO.md		TODO.md
Taskfile.pkl		Taskfile.pkl
action.yml		action.yml
findings.md		findings.md
flake.lock		flake.lock
flake.nix		flake.nix
go.mod		go.mod
go.sum		go.sum
schema.go		schema.go

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

pkspec

Why Pkl

Generalized retry, sharding, and timing

Built-in test kinds

Adapter suites

Spec-driven authoring

Property-based testing, fuzzing, differential testing

Lifecycle, hooks, backgrounds

Install

Nix (recommended)

Go

GitHub Actions

CLI

Development

Status

Acknowledgements

License

About

Uh oh!

Releases 17

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

pkspec

Why Pkl

Generalized retry, sharding, and timing

Built-in test kinds

Adapter suites

Spec-driven authoring

Property-based testing, fuzzing, differential testing

Lifecycle, hooks, backgrounds

Install

Nix (recommended)

Go

GitHub Actions

CLI

Development

Status

Acknowledgements

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 17

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages