Last updated: June 28, April 30, 2014 (site layout+comparisons)

NanoZip

NanoZip is an experimental file archiver software. It consists of several original file compression algorithms, put into a single file archiver program aiming for high data compression efficiency. It has many experimental features, such as fine granular (i.e. not block based) parallel compression algorithms.

The latest NanoZip (2011) for 32-bit Windows, includes graphical user interface and command line interface. The other versions include only the command line interfaces. nanozip-0.09a.win32.zip d7b43d2f fecb19e2 ba36c5e5 da8cca6e c254f07b 3c132779 c3b82a2a 2bd8f114 (sha256) nanozip-0.09a.win64.zip 0bc7cfd0 5dec5417 d04d397c 937baf44 d0e4897d 97ae16c1 7ed8b130 c754d276 nanozip-0.09a.linux32.zip db94977e 80bbab02 d122e7bd 6f6f83c7 767931a4 014bde8e ef1b2a5a 9ae7dde1 nanozip-0.09a.linux64.zip e66dd3f9 e0a1da91 37a1483c 36c9979f 3ff1dbf1 175876a0 47504ee4 ff4b2cd7

Thanks for helping me with the website bills.

NanoZip (nz) has good compression performance over wide range of file types. For example: compressing 500 MB Linux binary distribution:

compressed
size (mb)
time
(seconds)
decompression
time (sec)
nz 0.09 -cc73724722
nz 0.09 -cO8316437
nz 0.09 -co916710
uharc 0.6b -mx96448363
7-zip 9.12 -mx991879
nz 0.09 -cD117232
rar 4.2 -m5117414
nz 0.09 -cd1316.51.9
gzip 1.3.3 -9159845
nz 0.09 -cf1691.92
gzip 1.3.3 -3170155
The '-cO' algorithm in NanoZip is the strongest known asymmetric compression algorithm. No other algorithm decompresses faster at this compression ratio. See thorough compression comparison with other file compressors.

Audio compression

NanoZip has special algorithms for compressing audio data. An example case compressing 211 MB WAV files:
compressed
size (mb)
time
(seconds)
decompression
time (sec)
nz 0.09 -cd1239.37.2
flac 1.2.1 -8124283.4
nz 0.09 -cf1283.13.1
flac 1.2.1 -11344.13.2
The above results are with NanoZip multithreading disabled.

Text compression

Much of the original work in NanoZip is build around text compression. An example case compressing 100 MB text file:
compressed
size (mb)
time
(seconds)
decompression
time (sec)
nz 0.09 -cc19.6112111
nz 0.09 -cO21.08.35.3
nz 0.09 -co21.74.52.5
7-zip 9.12 -mx27.51172.2
bzip2 1.0.528.211.85.3
nz 0.09 -cD28.52.90.5
rar 4.2 -m531.129.70.9
gzip 1.3.3 -937.912.71.16

Chess compression

NanoZip understands chess game notation (1. e4 e5...) and outperforms on such data. 100 MB example file:
compressed
size (mb)
time
(seconds)
decompression
time (sec)
nz 0.09 -cO11.810.73.1
nz 0.09 -co12.75.71.9
7-zip 9.12 -mx19.594.21.8
bzip2 1.0.519.914.84.5
rar 4.2 -m523.930.80.7
nz 0.09 -cd24.01.10.5
gzip 1.3.3 -929.813.31.0
No illegal moves are checked nor is there an integrated chess engine, hence the results could be improved.

Multimedia compression

NanoZip algorithms handle linear sequences (e.g. 'abcdef...', 'x0u1a2p3...') embedded in heterogeneous data. 700 MB example:
compressed
size (mb)
time
(seconds)
decompression
time (sec)
nz 0.09 -cc12612131174
nz 0.09 -cO128507120
uharc 0.6b -mx161698572
nz 0.09 -co16211718
7-zip 9.12 -mx16215416
nz 0.09 -cDP165525
nz 0.09 -cD188212.9
rar 4.2 -m519851.56.7
nz 0.09 -cd21310.52.6
bzip2 1.0.52409136
gzip 1.3.3 -92471627.3
nz 0.09 -cf2552.882.77
lzop 1.03 -13436.272.16

Parallel compression algorithms and space use

NanoZip algorithms are memory frugal and recognize similarities between two data blocks even if the distance between them is large. This effect is amplified in the parallel compression algorithms. (-cdP below and -cDP in earlier example.) No such algorithms are known in the computer science literature. These do not split the data into blocks, but compress the input as a continuous stream while utilizing multiple processors. 800 MB example case (compiler binaries) using 500 MB memory:
compressed
size (mb)
time
(seconds)
decompression
time (sec)
nz 0.09 -cO4335132
nz 0.09 -cdP6523.72.7
7-zip 9.12 -mx972229.8
nz 0.09 -cF997.85.7
uharc 0.6b -mx101539401
rar 4.2 -m514350.55.2
gzip 1.3.3 -92281138.1

Archiver architecture

NanoZip outperforms other file archivers using a single thread only. The file archiver architecture allows parallel processing on multiple levels. 1) Independent threads for file reading and writing 2) The compression algorithms are designed in such way that parts of the compression can be done ahead and other threads finish or compliment the compression that was began earlier. 3) Some algorithms (depending on the data content) run with full CPU utilization regardless of number of processors available. 4) High level archiver architecture allows the entire process to be run in multiple branches (controlled by the '-p' -switch) by splitting the input task in any arbitrary number of blocks.

NanoZipLTCB

NanoZipLTCB is a subset of NanoZip compression library to highlight the performance for compressing plaintext with large memory model (multi-gigabyte). It compresses at the rate of 16 MB/s and decompresses 32 MB/s on modern hardware with the compression ratios over 6.2:1. No other file compressor compresses (and/or decompress) faster at these compression ratios. It only accepts the files from the large text compression benchmark. nanozipltcb-0.09.linux64.zip (2010) 0a587667 2c9a497c 2b61338a 87ac98a0 80f484f7 eaa1317e 7fe6d995 3cd29e21

Suffix sorting

NanoZip has original high performance algorithm for computing Burrows Wheeler Transform (BWT).
Archon4r0Deep-ShallowMSufSort3divsufsort2R08
chr22.dna6.0307.5147.1325.3625.985
etext9922.16034.26424.10618.06413.823
gcc-3.0.tar13.85635.82214.95210.08414.533
howto5.8068.2885.6725.3204.034
jdk13c18.10632.18211.3149.0108.268
linux-2.4.5.tar18.17425.91219.89014.29018.121
rctail9632.49062.50221.06017.91415.225
rfc20.73629.66617.93615.65816.728
sprot34.dat22.83232.09623.35217.40415.735
w3c227.26454.68217.09013.48612.750
total seconds187.454322.928162.504126.592125.202
The table shows approximated Manzini Corpus results based on Yuta Mori's timings [4] for the latest suffix array construction algorithms. With the exception of MSufSort3 all algorithms work with similar space requirements. The R08 timings are adjusted by the ratio of MSufSort3 timings done with the same hardware as R08.

BWMonstr

BWMonstr has the highest compression ratio amongst pure Burrows- Wheeler compression algorithms. It achieves the result of 203476 (2.1174 bpb) for book1 from the calgary corpus, which is better than most PPM and CM compression algorithms. The program has the lowest known space requirements (~0.6N) for computing both BWT transform and post-transform compression. This is an unoptimized demo compressor (with command line interface only). It is not indended for practical file compression purposes. bwmonstr.002.win32.zip (2009) 77895735 0d7f1cc7 367ce2c8 154389b4 87b0e7d6 2046bff9 3273b538 b0b9b891 See detailed compression comparison.
BS99F07D05R08
bib1.911.9261.8871.795
book12.272.3562.2642.147
book21.962.0121.9531.840
geo4.164.2684.1293.967
news2.422.4642.3972.268
obj13.733.7653.6923.584
obj22.452.4332.4112.226
paper12.412.4392.3902.274
paper22.362.3872.3292.230
pic0.720.7530.7140.688
progc2.452.4762.4222.307
progl1.681.6971.6601.576
progp1.681.7021.6661.579
trans1.461.4881.4511.354
average bpb2.262.2982.2402.131
BS99 The best results of Balkenhol. [1] F07 Fenwick's best results. [2] D05 The best Deorowicz results. [3] Fenwick (2007) describes this as "the best Burrows Wheeler result to date." R08 This work (from 2008) is part of BWMonstr and NanoZip. The current versions of both BWMonstr and NanoZip outperform R08. [1] B. Balkenhol, Y. M. Shtarkov, "One attempt of a compression algorithm using the BWT", Falculty of Mathematics, University of Bielefeld, 1999
[2] P. Fenwick, "Burrows Wheeler Compression: Principles and Reflections." Theoretical Computer Science Vol 387 (2007) No. 3 pp 200-219
[3] S. Deorowicz "Context exhumation after the Burrows Wheeler transform", Information Processing Letters, Vol 95, No 1, pp 313-320, 2005
[4] Y. Mori http://code.google.com/p/libdivsufsort/wiki/SACA_Benchmarks

Write to sami runsas at google's gmail. -----BEGIN PGP SIGNATURE----- iQEcBAEBCAAGBQJTrqpUAAoJEPNTTkedZlMjHXMH/j+C59HdUjUTHJ72Mu2YJ457 yKtvfitYejiKSehf84pB68xuy83xUly/dlW/sr5AvsEl3H5oV96sT6tPWRBWY79B k1+joFD5/qnLk2b605sphDQ7fe8YraTpt4Wb38R1yRS2iTJjtVF4ll7swBfXhDOs Jl1fmO+o1GD8161+Xo4TD4QQYT/w+GSSMLTeBDL742cjdHFEN00GsImVVRv3hwgH pXMNlNYYmk5WzDypcTl/0ig0E+dnp3dMkZ536l585TZsJhybR37W8QTPcPnHFXJj l29b/LHhilX6tGr+kM7hpFF0SE3tAzE7a6qXnZ6zDmiy8fNfL4LkDO7A6laHnM4= =yGyd -----END PGP SIGNATURE----- -----BEGIN PGP PUBLIC KEY BLOCK----- mQENBFM1WCUBCADAg+qyNZy0+H58x5PjMyhp8wjBT2Qj6x1zoY268YMtnYTL6dcZ g6ah904awjwA6stFmX+tbDF11qnMnzcC2VSqNpRvARmNyAryekZU9oH9FdXYCa/M QMfDY+bDWJs0QSNlFL76IzmB3J86DMXcfINQN8ikwsL046F4rP0OStDkUqWRIuHQ Vig1Q8rM4DxlZZoevL3uRXFAQ4SQdkG8D5SmPRYVXvscQcv5jdX/zie3SvwymQw1 pIUf5ugCvLzMv+7NA3G5ds6pnZH99WA/nvHKHYUqvIhkNkvEH5LfJ2oc1hRZuWml wDgxghjW13nQzSXGVa3THgixeDgjvQpv5acDABEBAAG0BFNhbWmJATgEEwECACIF AlNW/OYCGwMGCwkIBwMCBhUIAgkKCwQWAgMBAh4BAheAAAoJEPNTTkedZlMj4lAH /iUioqcK5mLFZewF8MTYEP/PmZ7nWkY0ORWkFUrtwSucjsaIbQL3BEaN2u/gPNqi Ag18SctxOJbEct3EtXbQbp3PQgvxiQoUIuPzJGLjwouD2od60koSj7CorN69Lu0m /6r7Cl9AUuE0oij1k53kbznfEzsVOtDg5XaL/TP1PygcrueLDvROCX2sjN/LhWRh Nsqh6LAefckzWEiieD6iMPQvKvcodLqgORFYS13OdOOjsdsQn4x+qZ7UmSMKRylW Tk/QbouT3LGreE+VA8OCVMM0Bi9rgYiCW7oiPxjHPG6HUNPopK/j7Y0HCWAlFqvT Nqu5wbWOynUxNaGHaGysjtW5AQ0EUzVYJQEIANVjTuxE4vEg01MjOOWoyHkHAeNL I3/OsH85gmKrSQijMz8xcCykhMZeQk6odcr1A6ANfjUNfC9/qK6bZaCUIQgDmU21 EdVwXaRWL3mqNOkTv4IyiyzWPYTSTxK9INWk6SD1W7HoVGOk8s1qDb1uLXx9zMl1 kvStsJS0buHwkQSNPOk32AiR4uBZsfO5ETIQOBee0ae4hHg4e6SWa7wayi8rlbF2 STxdiZg/ArOG2d90gX1koibbnYw27MJwigdy6W8ICYnD3EVNUBNb82eC0CE304y7 8GsEnhcBPHZ2vtjOlur4h/F6F5EIDCIBY5q3lM7i/GIiHiFHP5mfXiZo79cAEQEA AYkBHwQYAQIACQUCUzVYJQIbDAAKCRDzU05HnWZTI7sEB/9YgKWwmQciVTsQmAK+ 73U0lb5PM+46p8LvCs9LTnhv6894asTIRFijHEI0rAvcR4Xwo/RpBrDWh6vu1QYJ uPyIFh5GV2/UeZYmLdv+ZAe8W8Qhq4hgMdEu2OqtMl2L++WgNqdcft3ttHdauc8K 3cMWJKvEpXhQODj2jSr5YCAAVe/DvZgsu3SgKacjtXDxFklrs5qOXZTXvTZc8jab EcdAalt5zlFIUY5oXetk62BYy6nXw0BWimLM5hl9YY7ddunBxi4IkYykCQ/EfllW lEj0gWW75XrTSNc/3SoNs9asFC+WlnYAx0wtrwh3v7dq39veik3iMRvefk2HEf8j UXjK =z0+J -----END PGP PUBLIC KEY BLOCK-----