Skip to content

rrrlasse/memlz

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

22 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

memlz is a compression library for special use cases where speeds approaching memcpy() are needed.

Benchmark

Fast libraries like Snappy, FastLZ, LZO and others, have better compression ratios but compression speeds well below 1000 MB/s and are not comparable. Only LZ4 with its acceleration parameter set around 32 to 64 and a few other libraries come close.

Benchmark of the files enwik8, Silesia and employees_50MB.json on an Intel i7 with a non-cached memcpy() speed of 14000 MB/s:

Benchmark
Decompression speed is less competitive depending on the data type: Benchmark. Also check out lzbench which is easy to compile and run on your own data and also includes libraries that have even faster decompression speeds.

User friendly

It's a header-only library. Simply include it and call memlz_compress():

    #include "memlz.h"
    ...
    size_t len = memlz_compress(destination, source, size);

With streaming mode you can increase compression ratio if you recevive data in small packets. Simply create a state variable and call memlz_stream_compress() repeatedly:

    memlz_state* state = (memlz_state*)malloc(sizeof(memlz_state));
    memlz_reset(state);
    while(...) {
        ...
        size_t len = memlz_stream_compress(destination, source, size, state);
        ...
    }

Each call to memlz_stream_compress() will compress and return the entire passed payload, which can then be fully decompressed by a single call to memlz_stream_decompress().

The data format also contains a header that can tell the compressed and decompressed sizes:

    size_t memlz_compressed_len(source)
    size_t memlz_decompressed_len(source)

Safety

Decompression of corrupted or manipulated data has two guarantees: 1) It will always return in regular time, and 2) No memory access outside the source or destination buffers will take place, according to what memlz_compressed_len() and memlz_decompressed_len() tell.

No-copy

LZ4 and most other libraries need to maintain an internal payload queue when using streaming mode which adds one additional memcpy() operation. The memlz algorithm eliminates this need.

Let's test the effect by integrating memlz into the eXdupe file archiver in two different ways. eXdupe first performs deduplication and then emits small packets of some kilobytes in size to a traditional data compression library.

If we queue packets with memcpy() until they reach 1 MB and compress them at once we get:

F:\eXdupe>exdupe -x1k "f:\vm\25\Ubuntu 64-bit-s003.vmdk" -stdout > NUL
Input:                       4,012,638,208 B in 1 files
Output:                      2,148,219,886 B (53%)
Speed w/o init overhead:     4,368 MB/s

If we use streaming mode on each individual packet as they are received we get:

F:\eXdupe>exdupe -x1k "f:\vm\25\Ubuntu 64-bit-s003.vmdk" -stdout > NUL
Input:                       4,012,638,208 B in 1 files
Output:                      2,145,241,775 B (53%)
Speed w/o init overhead:     4,616 MB/s

Beta

It has currently been tested on ARM64 and x86-64 with clang, gcc and Visual Studio. Note that compatibility may be broken in these early versions!

About

Extremely fast compression library for C/C++

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published