terse is a text-transformation tool. its inputs are strings, its outputs are strings. the security surface is small but real.
- input that crashes the engine. any deterministic string that causes
compress()to raise an uncaught exception, hang, or consume unbounded memory. - preservation bypass. any string where code blocks, inline code, URLs, file paths, or error messages in the input are mutated (not preserved verbatim) in the output.
- marker leak. any case where the internal preservation markers appear in the final output.
- catastrophic regex backtracking. any input that makes the engine take longer than ~2 seconds on modern hardware for reasonable-sized text (under 500KB).
- supply-chain. the repo has zero runtime dependencies. any PR that adds one should be reviewed with care.
- compression that's suboptimal (not bugs, just tuning).
- compression that's more aggressive than desired (that's the mode/level configuration).
- Claude producing a different output than the deterministic engine (the engine is a conservative reference, not a Claude simulator).
please file a GitHub issue with the security label, or email the maintainer (see the repo profile). include:
- the exact input that triggered the issue
- the mode and level
- the output (if any)
- the expected behavior
for serious issues (crashes on adversarial input, preservation bypass), please allow 7 days before public disclosure so a fix can ship.
the engine includes these specific protections, verified by the test suite:
- input size cap —
compress()raisesValueErroron inputs over 500,000 characters. - regex-complete-in-bounded-time — every pattern is a constant from the source; none is built from user input.
- deterministic markers with fallback — preservation markers use non-whitespace control characters; if the input contains the primary marker pattern, the engine escalates to a fallback.
- no eval, no exec, no shell-out — the engine is pure string and regex operations.
- bounded regex patterns — no unbounded
.*across unbounded text; all patterns have natural boundaries.
the regex safety property is tested in tests/test_compress.py::TestSecurity::test_no_regex_catastrophic_backtracking which asserts the engine completes in under 2 seconds on adversarial inputs (10,000 repeated chars, 500 code-fence starts, etc).
- claims about LLM output safety — terse rewrites prose; it does not influence the content of Claude's answers.
- compliance claims (SOC 2, HIPAA, etc.) — terse is an MIT-licensed experimental tool. use in regulated environments at your own discretion and responsibility.
- protecting against prompt injection — terse is not a safety tool. the
references/anti-patterns.mddocument lists contexts where compression should not be applied, but that's a style guide, not a security guarantee.