fqix
fqix is an experimental FASTQ read-name index for ordinary .fastq.gz files.
It combines zran-style gzip restart checkpoints with a read-name lookup table.
fqix now has two explicit index modes:
- sparse: small v1-style anchor index; requires sorted read names.
- exact: larger v2-style MPHF index; works without any read-name order assumption.
The default is sparse to avoid accidentally creating very large exact indexes.
Use --mode exact when the FASTQ order has been shuffled, filtered, merged, or is otherwise unreliable.
Links
Documentation
Status
This is a minimal prototype.
Known limitations:
- FASTQ records are framed as four lines. Wrapped multiline sequence or quality fields are not supported. fqix does not otherwise validate FASTQ semantics such as
+line contents or sequence/quality length agreement. - Sparse mode requires sorted read names.
- Exact mode is larger than sparse mode because it stores one addressable record candidate per FASTQ record.
- Some gzip files may have sparse deflate block boundaries, so zran checkpoints may be farther apart than requested.
fqix checkcompares source file size and second-resolution modification time.- Parallel lookup is not implemented yet.
License
fqix is licensed under the MIT License.
The files under spec/support/ and the implementation in src/fqix/zran.cr are based on Mark Adler’s zran from zlib, and are distributed under the zlib License.