Skip to content

xem/wordcompression

Repository files navigation

WORDCOMPRESSION

Let's test different approaches for compressing and decompressing a dictionnary of english words (uppercase, no accent or punctuation, 1-5 letters long) in JavaScript.

The goal is not to find the smallest encoded string, but the string that will compress the best through RoadRoller.js and gzip.

Previous work:

  • words.txt: raw data (14915 words, 80.8 kb)
  • words.js: json data (109 kb)
  • words.txt zipped: ~24 kb
  • txt + RoadRoller.js + zip: ~14 kb
  • json + MiniPrefixRemover.js + RoadRoller.js + zip: 12.4kb (12741b)

New approaches:

  • prefix.html:
    Alphabetical ordering + smart prefix handling + one magic number to represent "last word + s"
    encoded json + RoadRoller.js + zip: 11.3kb (11585b)

  • prefix_remapped.html:
    same as above but with a remap of the encoded alphabet to use the most used letters first
    encoded json + RoadRoller.js + zip: 11.2kb (11447b)

  • prefix_remapped_shuffled.html:
    same as above but with a more zip-friendly order for the dictionnay (see shuffler.html)
    encoded json + RoadRoller.js + zip: 11.1kb (11335b)

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages