Most gzip libraries provide an easy to use API of the form:
decompress(void* state, const char* input, size_t in_len, char* output, size_t out_len)
It decompresses the input stream to the output buffer until it run either out of input data or out of output buffer space. The user fills/empties the buffers and make another call, until end of file is reached.
In a multi threaded implementation this synchronous interface in no longer possible. The code consuming the decompressed stream must be called back from each decompressor thread when a buffer is ready, not in any particular order.
A gzip file is sliced in sections, processed sequentially, which are themselves splited into chunks, processed in parallel, one for each thread. The first chunk is decompressed normally in CPU cache, and yields ~32KB decompressed buffers to the user callback. The other chunks are decompressed into larger buffers (~100MB) and require a second pass of "back references translation". To do this in a cache friendly manner, we propose to translate the buffer by segments fitting in the L1 cache and invoke the callback after each cache fill.
There is other designs matters, like the possibility to run decompression from your own custom thread pool, C FFI compatibility, etc. See the full discussion bellow.
Constraints:
-
Doing the translation the back-references on the whole buffer before yielding it to the client is costly since it trigger unnecessary TLB + LL cache misses on large memory region. We need a cache efficient translation/consumption step.
-
The user callbacks will be called at random time from arbitrary threads with decompressed content from arbitrary positions in the stream. The user should be able to reorder them, either synchronously (eg. blocking reordering four writing to stdout) or asynchronously (eg. parser with rests).
-
Support different threading implementations. (, OpenMP, custom work stealing, etc)
-
Static library with minimal headers. Smallish machine code blob. Support multiple language, wit h C as the common denominator.)
-
Able to generate, load custom indexes.
Implementation:
- We need to provide:
- Support different threading mechanisms.
Most gzip libraries provide an easy to use API of the form:
It decompresses the
inputstream to theoutputbuffer until it run either out of input data or out of output buffer space. The user fills/empties the buffers and make another call, until end of file is reached.In a multi threaded implementation this synchronous interface in no longer possible. The code consuming the decompressed stream must be called back from each decompressor thread when a buffer is ready, not in any particular order.
A gzip file is sliced in sections, processed sequentially, which are themselves splited into chunks, processed in parallel, one for each thread. The first chunk is decompressed normally in CPU cache, and yields ~32KB decompressed buffers to the user callback. The other chunks are decompressed into larger buffers (~100MB) and require a second pass of "back references translation". To do this in a cache friendly manner, we propose to translate the buffer by segments fitting in the L1 cache and invoke the callback after each cache fill.
There is other designs matters, like the possibility to run decompression from your own custom thread pool, C FFI compatibility, etc. See the full discussion bellow.
Constraints:
Doing the translation the back-references on the whole buffer before yielding it to the client is costly since it trigger unnecessary TLB + LL cache misses on large memory region. We need a cache efficient translation/consumption step.
The user callbacks will be called at random time from arbitrary threads with decompressed content from arbitrary positions in the stream. The user should be able to reorder them, either synchronously (eg. blocking reordering four writing to stdout) or asynchronously (eg. parser with rests).
Support different threading implementations. (, OpenMP, custom work stealing, etc)
Static library with minimal headers. Smallish machine code blob. Support multiple language, wit h C as the common denominator.)
Able to generate, load custom indexes.
Implementation:
(#section, #chunk, [#window flush])std::mutex+std::condition_variable),std::thread.(stream position, content position, context)is enough to generate an index. (overlaps with 2.)