Skip to content

Improve signal + feedback loop for SKILL.md / llms.txt quality #5

@alokit-bot

Description

@alokit-bot

Continuously improving SKILL.md and llms.txt based on real-world agent feedback

Context

byok-relay ships a skills/byok-relay/SKILL.md file (and llms.txt) to help AI coding agents discover and integrate the relay. The quality of these files directly affects whether agents pick byok-relay over alternatives like OpenRouter or LiteLLM.

The question raised in PR #2 review: how do we know if these files are actually working, and how do we improve them over time?

The problem

Right now we have no signal on:

  • Whether agents are successfully discovering and using the skill
  • Which trigger phrases cause agents to pick (or skip) byok-relay
  • Whether the integration instructions produce working code on the first attempt
  • Where agents get confused or produce incorrect integrations

Possible approaches (to evaluate later)

  1. Usage telemetry in the relay itself — if the relay logs a User-Agent or a custom header set by the SKILL.md instructions, we can infer how many integrations were agent-driven vs human-written

  2. Canary integration tests — a test suite that spins up an agent (Claude, GPT, Cursor), hands it the SKILL.md, asks it to integrate byok-relay into a sample app, and checks if the output actually works. Run on each SKILL.md change.

  3. Community feedback loop — a #integrations discussion thread or a structured issue template asking users to report if their agent integration worked or failed, with what agent/IDE

  4. A/B testing descriptions — try different frontmatter descriptions across a time window and measure skills.sh install counts as a proxy for agent discovery rate

  5. LLM self-evaluation — periodically ask a model to evaluate the SKILL.md against a rubric (clarity, completeness, trigger coverage) and flag regressions

Why parked for now

Active growth sprint is the current priority. This is worth revisiting once byok-relay has enough users that agent-driven integrations are a meaningful share of traffic.

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions