C++ library that indexes a local music collection and resolves cover art by track metadata. Written for discord-music-presence where lookups need to be fast enough to not be noticeable.
Scans configured directories on startup, watches for changes while running, and returns a path to the matching cover art file for a given artist/title/album query. If nothing matches, you get nothing back. That's the whole thing.
Benchmarked against a 4,470-track library on Windows 11 (SSD).
Reproducible via benchmark and scan_report in tools/.
Two modes: in-memory (default, no persistence) and LMDB (set db_path, persistent inverted index).
In-memory is lighter and faster for small collections. LMDB's value is persistence: on warm start the index is already populated, so the constructor returns in a few milliseconds and a background thread reconciles any changes that happened while the app wasn't running. The higher memory you see with LMDB is the memory-mapped DB file, which the OS can page out under pressure.
277 tracks with cover art out of 4,470 total.
| in-memory | lmdb | |
|---|---|---|
| cold startup | 333 ms | 331 ms |
| warm startup | n/a | 4 ms |
| memory | 2.1 MB | 552 KB |
| lookup avg | 5.8 us | 38 us |
| lookup max | 16 us | 128 us |
| hit rate | 277/277 | 277/277 |
4,469 tracks with cover art (embedded or external) out of 4,470 total.
| in-memory | lmdb | |
|---|---|---|
| cold startup | 1.7 s | 1.8 s |
| warm startup | n/a | 4 ms |
| memory | 5.7 MB | 4.5 MB |
| lookup avg | 40 us | 330 us |
| lookup max | 132 us | 2.0 ms |
| hit rate | 4,464/4,469 | 4,464/4,469 |
Extrapolated from the 4.4k embedded numbers above. Tag reading (~0.4ms/file) and per-entry memory scale linearly. Lookup depends on vocabulary diversity but real libraries have far more unique words than entries, so posting lists stay manageable.
| in-memory | lmdb | |
|---|---|---|
| cold startup | ~40 s | ~40 s |
| warm startup | n/a | ~5 ms |
| memory | ~45 MB | ~100 MB (mmap) |
| lookup avg | ~100-300 us | ~500 us - 1.5 ms |
Cold startup is dominated by tag reading at any scale. LMDB warm startup is roughly constant since it just opens the DB and starts the file watcher. A background thread reconciles any changes that happened while the app wasn't running, without blocking lookups.
Tags are read via TagLib.
By default the index lives in memory. Set db_path to persist it on disk via LMDB (using the lmdbxx C++17 wrapper). With LMDB, subsequent starts skip tag reads for unchanged files and the index survives restarts.
A file watcher (efsw) keeps the index current while running.
Lookups use bigram similarity (Sorensen-Dice) so minor tag inconsistencies between what the OS reports and what's in the file don't cause misses.
Artist splitting is caller-provided via config.splitter, which handles cases like "Artist A & Artist B" in a tag versus a single artist reported by the OS.
Cover art is resolved in two ways: external files matched by regex (case-insensitive), or embedded pictures read from audio file tags.
The default pattern for external files is cover\.gif but you can pass any valid regex to match other names or formats.
Set config.embedded = true to also index tracks with embedded album art.
#include <gindexer/gindexer.h>
auto split = [](std::string const& s) -> std::vector<std::string> {
// your splitting logic here
return {s};
};
gindexer::config cfg;
cfg.dirs = {"/home/user/music"};
cfg.splitter = split;
// cfg.cover_pattern = "cover\\.(gif|png|jpg)"; // optional, defaults to "cover\\.gif"
// cfg.embedded = true; // optional, index embedded album art too
// cfg.threshold = 0.8f; // optional, defaults to 0.5
// cfg.exact = true; // optional, defaults to false
// cfg.db_path = "/tmp/gindexer"; // optional, enables LMDB persistent index
gindexer::indexer idx(std::move(cfg));
auto result = idx.lookup("Radiohead", "Creep", "Pablo Honey");
if (result) {
// result->cover - path to the cover art (or audio file if embedded)
// result->embedded - true when cover art is embedded in the audio file
// result->artist - matched track's artist tag
// result->title - matched track's title tag
// result->album - matched track's album tag
}Lookup returns the cover path and the matched track's tag metadata (artist, title, album). Sub-millisecond from a warm index. A miss means the track isn't in the index, not that it hasn't been indexed yet, the watcher keeps things current.
| field | type | default | description |
|---|---|---|---|
dirs |
vector<path> |
root directories to scan and watch (recursive) | |
splitter |
function |
splits combined artist strings into individual names, e.g. "A & B feat. C" |
|
cover_pattern |
string |
"cover\\.gif" |
case-insensitive regex matched against filenames in each audio file's directory |
embedded |
bool |
false |
index tracks with embedded album art when no external cover file exists |
threshold |
float |
0.5 |
minimum score for a fuzzy match (ignored when exact is true) |
exact |
bool |
false |
use normalized string equality instead of bigram similarity |
db_path |
path |
LMDB directory for persistent index; omit for in-memory only |
When embedded is true and a track has no external cover, cover in the result points to the audio file itself.
Check result->embedded to tell whether the path is a standalone image or an audio file with embedded art.
Requires CMake 3.27+ and a C++17 compiler. Dependencies are fetched automatically via FetchContent: TagLib, efsw, LMDB, lmdbxx, Catch2.
cmake -B build
cmake --build buildTo run tests:
cmake -B build -DBUILD_TESTING=ON
cmake --build build
ctest --test-dir build --output-on-failuremp3, flac, ogg, opus, m4a, aac, wav, aiff, wv, ape, mpc, wma.
Anything TagLib can read tags from.
ID3 tags are the source of truth, not filenames or folder structure.
External cover art must be in the same directory as the audio file it belongs to.
With embedded = true, tracks with embedded album art are also indexed even without an external cover file.
Default matching is intentionally loose.
A composite score (title 50%, artist 35%, album 15%) is compared against a configurable threshold (default 0.5) to filter obvious mismatches.
Set exact = true in the config if you need strict matching instead.