Skip to content

NocFA/gindexer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

38 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

gindexer License: MIT

C++ library that indexes a local music collection and resolves cover art by track metadata. Written for discord-music-presence where lookups need to be fast enough to not be noticeable.

Scans configured directories on startup, watches for changes while running, and returns a path to the matching cover art file for a given artist/title/album query. If nothing matches, you get nothing back. That's the whole thing.

performance

Benchmarked against a 4,470-track library on Windows 11 (SSD). Reproducible via benchmark and scan_report in tools/.

Two modes: in-memory (default, no persistence) and LMDB (set db_path, persistent inverted index). In-memory is lighter and faster for small collections. LMDB's value is persistence: on warm start the index is already populated, so the constructor returns in a few milliseconds and a background thread reconciles any changes that happened while the app wasn't running. The higher memory you see with LMDB is the memory-mapped DB file, which the OS can page out under pressure.

external cover art only (default)

277 tracks with cover art out of 4,470 total.

in-memory lmdb
cold startup 333 ms 331 ms
warm startup n/a 4 ms
memory 2.1 MB 552 KB
lookup avg 5.8 us 38 us
lookup max 16 us 128 us
hit rate 277/277 277/277

with embedded art (embedded = true)

4,469 tracks with cover art (embedded or external) out of 4,470 total.

in-memory lmdb
cold startup 1.7 s 1.8 s
warm startup n/a 4 ms
memory 5.7 MB 4.5 MB
lookup avg 40 us 330 us
lookup max 132 us 2.0 ms
hit rate 4,464/4,469 4,464/4,469

estimated scaling (100k tracks)

Extrapolated from the 4.4k embedded numbers above. Tag reading (~0.4ms/file) and per-entry memory scale linearly. Lookup depends on vocabulary diversity but real libraries have far more unique words than entries, so posting lists stay manageable.

in-memory lmdb
cold startup ~40 s ~40 s
warm startup n/a ~5 ms
memory ~45 MB ~100 MB (mmap)
lookup avg ~100-300 us ~500 us - 1.5 ms

Cold startup is dominated by tag reading at any scale. LMDB warm startup is roughly constant since it just opens the DB and starts the file watcher. A background thread reconciles any changes that happened while the app wasn't running, without blocking lookups.

how it works

Tags are read via TagLib. By default the index lives in memory. Set db_path to persist it on disk via LMDB (using the lmdbxx C++17 wrapper). With LMDB, subsequent starts skip tag reads for unchanged files and the index survives restarts. A file watcher (efsw) keeps the index current while running.

Lookups use bigram similarity (Sorensen-Dice) so minor tag inconsistencies between what the OS reports and what's in the file don't cause misses.

Artist splitting is caller-provided via config.splitter, which handles cases like "Artist A & Artist B" in a tag versus a single artist reported by the OS.

Cover art is resolved in two ways: external files matched by regex (case-insensitive), or embedded pictures read from audio file tags. The default pattern for external files is cover\.gif but you can pass any valid regex to match other names or formats. Set config.embedded = true to also index tracks with embedded album art.

usage

#include <gindexer/gindexer.h>

auto split = [](std::string const& s) -> std::vector<std::string> {
    // your splitting logic here
    return {s};
};

gindexer::config cfg;
cfg.dirs = {"/home/user/music"};
cfg.splitter = split;
// cfg.cover_pattern = "cover\\.(gif|png|jpg)";  // optional, defaults to "cover\\.gif"
// cfg.embedded = true;             // optional, index embedded album art too
// cfg.threshold = 0.8f;            // optional, defaults to 0.5
// cfg.exact = true;                // optional, defaults to false
// cfg.db_path = "/tmp/gindexer";   // optional, enables LMDB persistent index

gindexer::indexer idx(std::move(cfg));

auto result = idx.lookup("Radiohead", "Creep", "Pablo Honey");
if (result) {
    // result->cover    - path to the cover art (or audio file if embedded)
    // result->embedded - true when cover art is embedded in the audio file
    // result->artist   - matched track's artist tag
    // result->title    - matched track's title tag
    // result->album    - matched track's album tag
}

Lookup returns the cover path and the matched track's tag metadata (artist, title, album). Sub-millisecond from a warm index. A miss means the track isn't in the index, not that it hasn't been indexed yet, the watcher keeps things current.

config

field type default description
dirs vector<path> root directories to scan and watch (recursive)
splitter function splits combined artist strings into individual names, e.g. "A & B feat. C"
cover_pattern string "cover\\.gif" case-insensitive regex matched against filenames in each audio file's directory
embedded bool false index tracks with embedded album art when no external cover file exists
threshold float 0.5 minimum score for a fuzzy match (ignored when exact is true)
exact bool false use normalized string equality instead of bigram similarity
db_path path LMDB directory for persistent index; omit for in-memory only

When embedded is true and a track has no external cover, cover in the result points to the audio file itself. Check result->embedded to tell whether the path is a standalone image or an audio file with embedded art.

building

Requires CMake 3.27+ and a C++17 compiler. Dependencies are fetched automatically via FetchContent: TagLib, efsw, LMDB, lmdbxx, Catch2.

cmake -B build
cmake --build build

To run tests:

cmake -B build -DBUILD_TESTING=ON
cmake --build build
ctest --test-dir build --output-on-failure

supported formats

mp3, flac, ogg, opus, m4a, aac, wav, aiff, wv, ape, mpc, wma.

Anything TagLib can read tags from.

notes

ID3 tags are the source of truth, not filenames or folder structure.

External cover art must be in the same directory as the audio file it belongs to. With embedded = true, tracks with embedded album art are also indexed even without an external cover file.

Default matching is intentionally loose. A composite score (title 50%, artist 35%, album 15%) is compared against a configurable threshold (default 0.5) to filter obvious mismatches. Set exact = true in the config if you need strict matching instead.

About

C++ library for indexing local music files and resolving animated cover art by track metadata

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors