A semantic filesystem that turns passive file storage into an intelligent, searchable knowledge base — built for the AI era.
Documentation · Quickstart · Architecture · MCP for agents · Roadmap
StrataFS watches your directories — local or cloud — parses files into semantic chunks, generates vector embeddings, and exposes everything through a hybrid search engine that combines full-text and semantic similarity. It speaks the Model Context Protocol, so any MCP-aware agent can use your filesystem as a structured knowledge resource. No SaaS. No lock-in. Read-only by design.
# 30 seconds to your first semantic search:
npm install -g stratafs && stratafs config init && stratafs serve &
stratafs search "where do we handle JWT refresh?"|
npm npm install -g stratafs |
PyPI pip install stratafs |
Homebrew brew tap neul-labs/stratafs
brew install stratafs |
|
macOS / Linux curl -fsSL https://raw.githubusercontent.com/neul-labs/stratafs/main/scripts/install.sh | bash |
Docker docker run -d -p 8080:8080 -p 8081:8081 \
ghcr.io/neul-labs/stratafs:latest |
From source git clone https://github.com/neul-labs/stratafs.git
cd stratafs && make build |
Then:
stratafs config init # writes ~/.stratafs/config.json
stratafs serve # REST on :8080, MCP on :8081
stratafs search "any natural language query"Native installers (NSIS for Windows, signed .pkg for macOS, .deb / AppImage for Linux) are on the releases page.
$ stratafs search "rate limit middleware"
pkg/api/middleware/ratelimit.go ★ 0.94
────────────────────────────────────────
func RateLimit(rps int) gin.HandlerFunc {
bucket := tokenbucket.New(rps, rps*2)
return func(c *gin.Context) {
if !bucket.Take(1) { c.AbortWithStatus(429) ...
docs/api/rate-limiting.md ★ 0.88
────────────────────────────────────────
Per-IP rate limits default to 100 requests/min...
internal/gateway/policy.yaml ★ 0.71
────────────────────────────────────────
policies:
- name: api-default
rate: 100/m
burst: 200The same query over the REST API:
curl "http://localhost:8080/search?q=rate+limit+middleware&limit=5" | jqOr from an MCP-aware agent — no glue code required:
{
"mcpServers": {
"stratafs": { "command": "stratafs", "args": ["serve", "--mcp-only"] }
}
}|
Stop
|
A clean, layered design built around three first-class invariants: read-only sources, per-source isolation, and hybrid scoring in a single SQL query.
|
Every layer is an extension point. Add a parser, a backend, a chunker, or a ranking signal in a single Go file.
|
# 1. Install
pip install stratafs
# 2. Initialize
stratafs config init
# 3. Add a source (edit ~/.stratafs/config.json)
# {"id":"docs","type":"local","path":"/path/to/anything","enabled":true}
# 4. Start the daemon
stratafs serve &
# 5. Search — CLI, REST, or MCP
stratafs search "the thing I half-remember writing"
curl "http://localhost:8080/search?q=onboarding+flow"The first scan runs at 50–100 files/sec. Searches return in under 100 ms once the index is warm. Everything lives under ~/.stratafs/ — one directory, one filesystem, one source of truth.
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ REST API │ │ MCP Server │ │ CLI / UI │
│ :8080 │ │ :8081 │ │ │
└──────┬───────┘ └──────┬───────┘ └──────┬───────┘
└─────────────────┼─────────────────┘
│
┌──────────▼──────────┐
│ Hybrid Search │
│ FTS5 + Vector │
│ (single SQL CTE) │
└──────────┬──────────┘
│
┌────────────────┼────────────────┐
│ │ │
┌─────▼─────┐ ┌─────▼─────┐ ┌─────▼─────┐
│ SQLite + │ │ FastEmbed │ │ Job Queue │
│ sqlite-vec│ │ + ONNX │ │ (SQLite) │
└───────────┘ └───────────┘ └─────┬─────┘
│
┌─────────▼─────────┐
│ Monitor (local + │
│ remote scanner) │
└─────────┬─────────┘
│
┌───────────▼───────────┐
│ Storage Factory │
└───────────┬───────────┘
│
┌───────────────────────────┼──────────────────────────┐
│ │ │
┌──────▼──────┐ ┌───────▼───────┐ ┌───────▼───────┐
│ Local FS │ │ S3 / GCS / │ │ Future │
│ (fsnotify) │ │ Azure Blob │ │ backends │
└─────────────┘ └───────────────┘ └───────────────┘
Four invariants do most of the work:
- Read-only sources — StrataFS never writes back. All state lives in
.stratafs/. - Per-source SQLite — no central registry, no shared bottleneck.
- Compression-aware schema — gzip above 512 bytes, transparent at query time. 40–60% disk savings.
- Soft delete — files disappear consistently, historical queries are free.
Long version: Architecture overview · Database internals.
import requests
r = requests.get("http://localhost:8080/search",
params={"q": "feature flag rollout"})
for hit in r.json()["results"]:
print(hit["file_path"], hit["relevance_score"])const res = await fetch("http://localhost:8081/mcp/tools/call", {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({
tool: "search",
parameters: { query: "rate limiting", max_results: 5 }
}),
});stratafs search "deployment strategy" --mode hybrid --limit 5 --jsonimport "github.com/neul-labs/stratafs/pkg/search"
eng, _ := search.NewEngine(cfg)
results, _ := eng.Hybrid(ctx, "circuit breaker pattern", search.Opts{Limit: 10})Measured on consumer hardware (M-series Mac, NVMe SSD, BGE Base EN v1.5).
| Metric | Typical value |
|---|---|
| Indexing throughput | 50 – 100 files/sec |
| Search latency (10 k files) | < 100 ms |
| Disk overhead | ~1.5–2× original text (with compression) |
| Memory baseline | ~200 MB + model (~500 MB for BGE Base) |
| Cold start | < 1 s |
Performance tuning, model swaps, and benchmark methodology: Performance guide.
Every moving part is a registry plus an interface. Adding things is intentionally boring.
Add a new file parser
// pkg/parsers/asciidoc.go
type AsciidocParser struct{}
func (p *AsciidocParser) Parse(r io.Reader) (string, error) { /* ... */ }
func (p *AsciidocParser) SupportedExtensions() []string {
return []string{".adoc", ".asciidoc"}
}
func init() { DefaultRegistry.Register(NewAsciidocParserFactory()) }Add a new storage backend
// pkg/filesystem/dropbox.go
type DropboxFS struct{ /* ... */ }
func (fs *DropboxFS) Open(path string) (io.ReadCloser, error) { /* ... */ }
func (fs *DropboxFS) Walk(root string, fn WalkFunc) error { /* ... */ }
// pkg/storage/factory.go
case config.StorageTypeDropbox:
return f.createDropboxFS(source)Add a new chunking strategy
// pkg/chunking/ast.go
type ASTChunker struct{}
func (c *ASTChunker) Name() string { return "ast" }
func (c *ASTChunker) ChunkStream(r io.Reader, o ChunkOptions) (<-chan Chunk, <-chan error) {
// Yield one chunk per top-level AST node.
}Swap the embedding model
{
"embedding": {
"model": "bge-small-en-v1.5",
"dimension": 384
}
}Any ONNX-compatible model works. Drop the weights in ~/.stratafs/fastembed_cache/ and point embedding.model at it.
Add a new ranking signal
Hybrid scoring is a single SQL query with weighted CTEs. Add a CTE, expose a weight, ship a PR. Full walkthrough in the development guide.
- Enterprise security — RBAC for authentication and source-level permissions
- Streaming search results — chunked HTTP for very large result sets
- Custom embeddings — first-class support for any ONNX-compatible model on disk
- Cross-source ranking signals — per-source weight, recency boost, trusted-source pinning
- Encrypted source databases — SQLCipher-backed at-rest encryption
Already shipped: virtual FS export, FUSE/WinFsp mount, GNOME / Spotlight / Windows Search integration, Wails desktop UI, native installers for every desktop OS, enterprise connectors for SharePoint / Google Drive / Jira.
Full list: Roadmap.
The full docs live in documentation/ and are built with MkDocs Material.
| Topic | Where |
|---|---|
| Getting started | documentation/docs/getting-started/ |
| User guide (config, search, CLI, file types) | documentation/docs/user-guide/ |
| REST + MCP integration | documentation/docs/ai-integration/ |
| Storage backends | documentation/docs/user-guide/storage-backends.md |
| Deployment (Docker / systemd / launchd / K8s) | documentation/docs/deployment/ |
| Architecture | documentation/docs/architecture/ |
| Contributing & dev setup | documentation/docs/contributing/ |
Preview the docs locally:
cd documentation
pip install -r requirements.txt
mkdocs serve- Issues — github.com/neul-labs/stratafs/issues
- Discussions — github.com/neul-labs/stratafs/discussions
- Contributing guide — documentation/docs/contributing/development.md
Pull requests welcome. For larger changes, open an issue first to align on the approach. Every PR runs the full test suite plus a Docker build in CI.
MIT. Do whatever you want with it. If StrataFS ends up powering something interesting, we'd love to hear about it.