Warning
Terca is pre-alpha software. It is almost certainly broken in all kinds of ways.
Terca is a test and evaluation runner for CLI-based coding agents. It allows you to define a suite of tests and run them against different configurations of your agent, helping you gain confidence in its capabilities.
npm i -g terca
terca -vYou should see a version number printed when terca -v is run.
-
Create a
terca.yamlfile in your eval project's root directory. -
Create test directories with
eval.terca.yamlfiles. -
Run Terca from the command line:
terca
Terca uses a hierarchical structure to organize tests and configuration.
project_dir/
terca.yaml # Global config (environments, experiments)
_base/ # Baseline files for all evals
001-some-eval/ # Eval directory
eval.terca.yaml # Eval-specific config
_base/ # Eval-specific file overlays
The _base directories are used to layer files into the test workspace.
- Project Base: Files in
project_dir/_baseare copied first. - Eval Base: Files in
eval_dir/_baseare copied next, overwriting any project-level files.
Each eval directory can contain an eval.terca.yaml file which defines the eval configuration. This file supports the same schema as the evals array in terca.yaml. If name is omitted, it defaults to the directory name.
Each invocation of the terca command creates a new folder in .terca/runs, with subfolders for each eval and environment/experiment/repetition combination. Each run has a results.json which aggregates the final results of each eval as well as log files, artifacts, and the workspace as it was after the task completed for each run.
.terca/runs/2025-10-28-001
results.json # the aggregate results
{eval_name}/
{environment}.{experiment}.{repetition}/
artifacts/ # detailed artifacts e.g. telemetry logs
workspace # the workspace the agent was running in
run.log # a log file of the run's progress# Required. The name of your test suite.
name: my-test-suite
# Optional. A description of your test suite.
description: A description of my test suite.
# Optional. Prefix all test prompts with this content.
preamble: |
You are a helpful assistant.
# Optional. Postfix all test prompts with this content.
postamble: |
Always format your output as JSON.
# Optional. The number of times to run each eval. Can be overridden by CLI flags.
repetitions: 1
# Optional. The number of evals to run in parallel. Can be overridden by CLI flags.
concurrency: 4
# Optional. The number of seconds to wait for an eval to complete before timing out.
timeoutSeconds: 300
# Optional. A list of actions to run before all evals.
before:
- command: npm install
- copy:
template.js: src/template.js
- files:
config.json: |
{
"api_key": "YOUR_API_KEY"
}
# Optional. A list of environments to run the evals in.
environments:
- name: default-agent
agent: gemini-cli
rules: default_rules.txt
# Optional. A list of experiments to run.
experiments:
- name: no-extra-context
- name: with-extra-context
command: setup_context.sh
postamble: Always use the provided context to answer questions.
# A list of evals to run. (Can also be defined in separate directories)
evals:
- name: my-eval
prompt: "Hello, world!"Each eval directory can contain an eval.terca.yaml file which defines the eval configuration.
# Required. The prompt to give to the agent.
prompt: "Hello, world!"
# Optional. The name of the eval. Defaults to the directory name.
name: my-eval
# Optional. A description of the eval.
description: A description of the eval.
# Optional. The number of times to run this eval.
repetitions: 1
# Optional. The number of seconds to wait for this eval to complete before timing out.
timeoutSeconds: 300
# Optional. A list of actions to run before this eval.
before:
- command: npm install
# Optional. A list of tests to run to verify the agent's work.
tests:
- name: check_output_file
fileExists: output.txt
- name: verify_script_runs
commandSuccess: node script.jsThe tests section allows you to define a list of tests to run after each eval.
tests:
# Check if a file or list of files exists.
- name: check_output_file
fileExists: output.txt
- name: check_multiple_files
fileExists:
- output1.txt
- output2.txt
# Check if a command runs successfully.
- name: verify_script_runs
commandSuccess: node script.js
# Check if a command runs successfully and its output contains a string.
- name: validate_content
commandSuccess:
command: grep "Expected output" output.txt
outputContains: "Expected output"terca [options]| Flag | Description |
|---|---|
-t, --test <test_name> |
Run only the specified test. |
-x, --experiment <exp_name> |
Run only the specified experiment. |
-n, --repetitions <n> |
Override the number of repetitions. |
-c, --concurrency <n> |
Override the concurrency level. |
-v, --version |
Print the current Terca CLI version. |