Skip to content

Tags: konjoai/squish

Tags

v9.13.0

Toggle v9.13.0's commit message
ci(publish): switch to OIDC Trusted Publisher; drop PYPI_TOKEN

v1.1.0

Toggle v1.1.0's commit message
v1.1.0 β€” Production Release

All waves 99–106 complete. Key improvements over v1.0.0:

- INT4 native Metal path (Wave 103): 1.5B stays ~0.9 GB, 8B stays ~4.4 GB
- INT3 native Metal path (Wave 104): 1.5B stays ~0.8 GB, 8B stays ~2.1 GB
- Hot-path speed restoration (Wave 99): 3β†’1 Metal syncs/token; sub-200ms TTFT
- Agent loop verified E2E (Wave 105): /v1/agent/run + 11 built-in tools
- 15,354 tests passing

v9.0.0

Toggle v9.0.0's commit message
Squish v9.0.0 - Cutting-Edge Attention Variants & Distributed Inference

v7.0.1

Toggle v7.0.1's commit message
chore(release): bump version 1.0.1 β†’ 7.0.1, add [7.0.1] CHANGELOG entry

- pyproject.toml: version 1.0.1 β†’ 7.0.1 (matches CHANGELOG [7.0.0] series)
- CHANGELOG.md: add [7.0.1] entry summarising all pre-launch hardening changes
  (MLC stub, squish compress alias, Projected docs fix, hardware test harness,
   bench_eoe.py, MODULES.md, coverage fix, wave 23-26 benchmark docs,
   HF publish script, paper draft)
- docs/announcements.md: HN / r/LocalLLaMA / Twitter thread drafts

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

v1.0.1

Toggle v1.0.1's commit message
chore: post-release cleanup and paper artifacts

- Untrack demos/squish_demo.cast (binary, gitignored)
- Untrack eval_output_7b_full/ eval JSONs and logs
- Remove empty evals/ and bench/ directories
- Fix hardcoded personal paths in tests/test_int4_loader.py,
  tests/test_interface.py, and squish/server.py comment
- Use VECTRO_DIR env var (with ~/vectro fallback) in tests
- Add model weights note to README install section
- Add scripts/generate_paper.py (paper generation + figure tooling)
- Add docs/final/ with Squish_Paper_Final.docx + .pdf
- Add figures/ with 8 vector PDF + raster PNG charts
- Update .gitignore: docs/final/_media/, old draft docx files

v1.0.0

Toggle v1.0.0's commit message
v1.0.0 β€” Initial public release

Sub-second local LLM inference on Apple Silicon.
54Γ— faster cold loads, 6Γ— less RAM, statistically equivalent accuracy.

Paper: coming soon on arXiv
ORCID: 0009-0002-9108-3704