Let's test different approaches for compressing and decompressing a dictionnary of english words (uppercase, no accent or punctuation, 1-5 letters long) in JavaScript.
The goal is not to find the smallest encoded string, but the string that will compress the best through RoadRoller.js and gzip.
- words.txt: raw data (14915 words, 80.8 kb)
- words.js: json data (109 kb)
- words.txt zipped: ~24 kb
- txt + RoadRoller.js + zip: ~14 kb
- json + MiniPrefixRemover.js + RoadRoller.js + zip: 12.4kb (12741b)
-
prefix.html:
Alphabetical ordering + smart prefix handling + one magic number to represent "last word + s"
encoded json + RoadRoller.js + zip: 11.3kb (11585b) -
prefix_remapped.html:
same as above but with a remap of the encoded alphabet to use the most used letters first
encoded json + RoadRoller.js + zip: 11.2kb (11447b) -
prefix_remapped_shuffled.html:
same as above but with a more zip-friendly order for the dictionnay (see shuffler.html)
encoded json + RoadRoller.js + zip: 11.1kb (11335b)