GitHub - ARPAHLS/lc0_vic: Tiered filesystem retrieval (L0/L1/L2) with vic CLI, embeddings, optional skillware & HTTP bridge, Python reference stack.

LC-0 VIC

LC-0 VIC — Tiered filesystem retrieval: ask natural-language questions about your local files, get ranked results with snippets. A reference implementation for intelligent, queryable storage.

Why this exists • Mission • Architecture • Quick Start • Documentation • Contributing • References • Security • Contact

Why this exists

Kioxia Corporation research line: AiSAQ (All-in-Storage ANNS with Product Quantization for DRAM-free Information Retrieval, arXiv:2404.06004) describes product-quantized, all-in-storage approximate nearest-neighbor search suited to DRAM-constrained and flash-resident indices. Open-source reference implementation: kioxia-jp/aisaq-diskann. Context and deployment notes: docs/KIOXIA_ECOSYSTEM.md.

LC-0 VIC (Logical Controller Zero / Virtual Intelligent Controller) is an open-source reference implementation with a real CLI and tests: pip install, then vic index, vic ask, vic demo, and optionally vic bridge. Architecturally it is tiered retrieval—L0 metadata, L1 vectors, L2 optional deep parsing (skillware)—orchestrated by a controller from a Librarian QueryPlan (offline stub or Ollama). Vectors default to LanceDB; Milvus Lite is optional. Code layout: controller/, librarian/, warehouse/, index/, skillware/, bridge/. It runs fully local-first: you control the model endpoint (Ollama local or remote), data boundary, TLS/exposure, quotas, and governance—while still being more than a paper design (CI-backed, integration-tested).

Landscape — why this repo exists alongside “computational storage landscape”: Retrieval over drives and tiers does not land in isolation. Computational storage landscape maps the broader space; LC-0 VIC is a small, runnable, contract-heavy slice: filesystem-first L0 → L1 → L2, optional Ollama planning, and a JSON HTTP bridge for tools and automation.

This codebase runs on the host today—it does not claim to ship inside any vendor’s SSD firmware binary. The research goal is to explore whether tiered retrieval and these API contracts could map to firmware- or device-adjacent runtimes (e.g. Samsung Magician–class host/device stacks), while this repository stays a portable reference for host tooling, bridges, and tests.

Mission

Unstructured data on disk is usually searched by paths and keywords. LC-0 VIC explores intent-aware retrieval: narrow candidates cheaply (L0/L1), then load to DRAM for evidence only when needed (L2). The design fits research and product narratives around queryable storage, local-first AI, and bridge-ready HTTP surfaces for management UIs.

Architecture

lc0_vic/
├── README.md
├── LICENSE
├── AUTHORS.md
├── pyproject.toml
├── requirements.txt
├── docs/
├── src/lc0_vic/
│   ├── controller/       # L0 → L1 → L2 orchestration
│   ├── librarian/        # Natural language → QueryPlan (stub / Ollama)
│   ├── warehouse/       # Filesystem abstraction
│   ├── index/           # LanceDB or Milvus Lite + manifest + jobs
│   ├── skillware/       # L2 modular parsers
│   ├── bridge/          # HTTP JSON service
│   ├── integrations/    # Ollama HTTP client helpers
│   ├── training/        # Synthetic plans helper (`vic-synthetic-plans`)
│   └── hardware_sim/    # Demo timing hooks only
├── training/
├── scripts/
├── tests/
├── data/mock_storage/
├── data/vector_db/
├── .controller/
└── .firmware/

Tiered retrieval

L0 — Path, size, mtime, type hints, tags.
L1 — Chunk embeddings after vic index --full: default LanceDB (vic_chunks); optional Milvus Lite with VIC_VECTOR_BACKEND=milvus and pip install -e ".[milvus]". Embed model default embeddinggemma via Ollama; override VIC_OLLAMA_EMBED_MODEL.
L2 — Skillware modules (PDF layout, OCR, logs) when the plan requests them.

Librarian (Ollama)

stub: default offline planner for CI and quick runs.
ollama: POST /api/chat to VIC_OLLAMA_BASE_URL for both local and remote Ollama-compatible endpoints. Set VIC_OLLAMA_MODEL to a tag from ollama list on that server.

Example families (verify tags locally): Qwen (e.g. qwen2.5:4b), Gemma (e.g. gemma2:2b or your pulled images). For remote inference, set privacy_mode=cloud_reasoning_ok when not using loopback; optional VIC_OLLAMA_API_KEY for Bearer auth on gated gateways. Never commit secrets; use .env (gitignored).

Quick start

Install (pip, all optional stacks used in CI and demos):

pip install -e ".[dev,index,bridge]"

Windows (PowerShell): same pip line; use $env:VAR = "value" instead of export, or set VAR=value in cmd.exe. Prefer copying .env.example to .env so you do not rely on shell-specific syntax.

vic --help

vic index --root ./data/mock_storage --db ./data/vector_db

vic ask --format human "Find PDFs related to contracts"

# One-shot demo (seed + index + sample asks; needs Ollama + `.[index]`)
vic demo

Ollama Librarian (Windows example):

set VIC_LIBRARIAN_BACKEND=ollama
set VIC_OLLAMA_MODEL=qwen2.5:4b
set VIC_OLLAMA_BASE_URL=http://127.0.0.1:11434
vic ask "Who signed the contract?"

On Unix: export VAR=value. Copy .env.example to .env for persistent configuration.

HTTP bridge (after pip install -e ".[bridge]"): vic bridge — GET /health, POST /v1/ask, POST /v1/index/start (202 + job_id), GET /v1/index/status. Optional VIC_BRIDGE_API_KEY (Bearer on /v1/*), VIC_BRIDGE_RATE_LIMIT_PER_MIN, VIC_JOB_DB_PATH for SQLite job persistence. See docs/API.md.

Documentation

Start here: Documentation index — links every Markdown file in the repo (root, docs/, training/, scripts/, assets/, data/, .controller/, .firmware/) so nothing is orphaned.

Highlights:

Architecture — component boundaries and data flow.
API / contracts — QueryPlan, bridge routes, environment variables.
Threat model — privacy modes and Ollama URL policy.
Roadmap & production deep dive — demo path (scripts/demo.sh / demo.ps1) and phased production work.
Terminal demo — vic demo, vic ask --format human, warehouse defaults.
Milvus & scale — Milvus Lite (VIC_VECTOR_BACKEND=milvus, optional [milvus]) plus AiSAQ / scale notes.
Testing — pyfakefs vs real tmp_path for Lance/native I/O.
Security policy — vulnerability reporting and bridge hardening reminders.
Changelog — release notes (PyPI/GitHub release steps are manual until you configure publishing).

Notebook: notebooks/lc0_vic_live_demo.ipynb — mirrors CLI demo steps (also listed from docs/README.md).

Contributing: CONTRIBUTING.md (links the doc index). Authors: AUTHORS.md.

Contributing

See CONTRIBUTING.md for scope, tests, and pull request expectations.

References

Full bibliography and implementation stance (JAX, TurboQuant, papers):
docs/REFERENCES_AND_RELATED_WORK.md

Algorithms & cited methodology (per-component): docs/METHODOLOGY_AND_ALGORITHMS.md — KIOXIA AiSAQ (L1/ANN scale), BitNet report (Librarian SFT/DPO), ExPAND (I/O topology), SolidAttention (KV/SSD inference lessons).

Local PDF copies (with open-access pointers): docs/papers/README.md — AiSAQ (arXiv:2404.06004), BitNet b1.58 2B4T (arXiv:2504.12285), ExPAND (arXiv:2505.18577), SolidAttention (USENIX FAST ’26).

Selected entry points

Computational storage landscape — strategic analysis (TinyLM, SSD-tier retrieval, industry context).
Milvus — AiSAQ — flash-friendly / all-in-storage ANN integration; code lineage: kioxia-jp/aisaq-diskann.
Software-Enabled Flash (Linux Foundation) — host-visible flash control; softwareenabledflash.org.
Google TurboQuant — KV / attention-side compression (inference memory); not a substitute for filesystem indexing in LC-0; see reference doc.
Patent US8780634B2 (CAM NAND) — landscape only.

License

Distributed under the MIT License. See LICENSE.

Security

See SECURITY.md for how to report vulnerabilities and what is in scope. Product-facing privacy and Ollama URL rules: docs/THREAT_MODEL.md.

Contact

Issues: github.com/arpahls/lc0_vic/issues

Organization: ARPA Hellenic Logical Systems

Built & Maintained by ARPA Hellenic Logical Systems & the Community

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LC-0 VIC

Why this exists

Mission

Architecture

Tiered retrieval

Librarian (Ollama)

Quick start

Documentation

Contributing

References

License

Security

Contact

About

Uh oh!

Releases 1

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.controller		.controller
.firmware		.firmware
.github		.github
assets		assets
data		data
docs		docs
notebooks		notebooks
scripts		scripts
src/lc0_vic		src/lc0_vic
tests		tests
training		training
.env.example		.env.example
.flake8		.flake8
.gitignore		.gitignore
AUTHORS.md		AUTHORS.md
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

LC-0 VIC

Why this exists

Mission

Architecture

Tiered retrieval

Librarian (Ollama)

Quick start

Documentation

Contributing

References

License

Security

Contact

About

Topics

Resources

License

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages