File management for the lost.
Content-addressable storage with a SQLite database to track everything. Files are deduplicated automatically and can live in multiple locations (local + cloud).
Local:
storage/
├── a3/
│ ├── b5/
│ │ ├── a3b5c7d9e1f2a4b6... # actual file
│ │ └── a3b5c7d9e1f2a4b6....meta.json # optional sidecar backup
- Hash-based sharding (2 levels: 65,536 possible directories)
- ~15 files per directory on average with 1M files
- Same structure mirrored in cloud (S3/Backblaze B2)
Cloud: Same hash-based structure for consistency
CREATE TABLE files (
hash TEXT PRIMARY KEY, -- SHA-256 of file content
size INTEGER,
mime_type TEXT,
original_filename TEXT,
file_extension TEXT,
created_at TIMESTAMP,
modified_at TIMESTAMP,
imported_at TIMESTAMP,
-- Storage locations
local_path TEXT, -- storage/a3/b5/a3b5c7d9...
s3_url TEXT, -- s3://bucket/a3/b5/a3b5c7d9...
-- Origin tracking
original_paths TEXT, -- Projects/MyWebsite/images/logo.png
original_source TEXT, -- "Dropbox", "iCloud", "OldMacDrive"
-- Flexible data
tags TEXT, -- ['Projects', 'MyWebsite', 'vacation']
metadata TEXT -- EXIF, dimensions, AI analysis, etc.
);Automatic deduplication - same file stored only once (by hash)
Multiple locations - track local, cloud, and legacy (Dropbox/iCloud) locations
Preserves context - original paths saved, auto-extracted as tags
Flexible metadata - JSON columns for file-specific data
Simple deployment - just SQLite + Python, no servers
Core operations:
- Hash file → SHA-256
- Check if hash exists in DB (dedupe)
- Store in hash-sharded directory
- Extract tags from original path
- Record in database
- Optionally sync to cloud
- One person system - SQLite is perfect, no complexity
- Gradual migration - can index legacy locations first, move files later
- Extensible - JSON lets you add features (AI analysis, etc.) without schema changes
- Resilient - easy DB backups
- Scales to your needs - handles 1TB/1M files easily
Simple, clean, and you're in full control.