Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This is currently just an experiment:
Instead of ingesting full blocks only (128 KB),
make an a-priori analysis of the data,
and infer a position to split, if useful.
This leads to some small yet non-trivial compression gains, sometimes equivalent to a full compression level, yet for a correspondingly small speed cost.
The benefit is higher when there isn't a post-splitter (like in higher
btultra
and above),but even when there is, there is still some small compression ratio benefit left, making it desirable for higher compression modes.
However, the dynamic analysis is not free, so it's currently reserved for higher compression strategies (currently
btlazy2
and above), where the speed cost can be considered "negligible" (< 5%).For other modes, this is replaced by a static block size strategy, but it's no longer limited to 128 KB only. Through tests, it appears that a 92 KB block size seems preferable, scoring higher compression ratio at a very small compression speed loss (due to increased nb of blocks, hence of block headers).
silesia.tar
:dev
PR
calgary.tar
:dev
PR
It might be preferable to consider this parameter as yet another knob that could be enabled by the trainer.
But this would first require a refactor, to make this parameter independent.
Todo :