CFOG's PIP, November 1989, Volume 8 No. 4, Whole No. 72, page 39

File Compression Techniques for CP/M Users

by Benjamin H. Cohen

copyright Benjamin H. Cohen 1989

When I first started out with my Osborne 1 the file compression method in use was squeeze. Once a file was squeezed, a library utility (LU) would be put in a library file (.LBR) so that files would be grouped together and additional space could be saved. Squeeze has been supplanted (at least in some circles) by Crunch. LU has been largely supplanted by NULU (New Library Utility), and for some folks by VLU (Visual Library Utility) and a few others. In the MS-DOS world, ARC, ZIP, and other compression techniques have been developed, most of which use multiple algorithms (selecting the most efficient one for each target file) and gather multiple files together in a single ARC or ZIP or whatever. As these MS-DOS creations appear they are usually followed by CP/M utilities to open them up, so we have UNARC, ZIPDIR, and UNZIP.

Recently two new contenders have appeared on the CP/M scene: ARK (the CP/M equivalent of ARC) and LZH compression.

ARK compression is essentially similar to the ARC compression used by MS-DOS systems. By convention, ".ARC" files are for MS-DOS files and ".ARK" file are for CP/M files. ARK, like its MS-DOS counterparts, allows you to compress several files at one time while placing them in a single ".ARK", but unlike the LBR utilities the CP/M ARK utility has no ability to do any ARK maintenance: if you want to replace one file in an ARK you must get the group of files together and compress them all over again.

LZH is a new compression algorithm that is more efficient than the CRUNCH algorithm. LZH compression has been grafted onto Steven Greenberg's CRUNCH and UNCRUNCH Version 2.4, with the distribution file being called CRLZH11.LBR, and onto C.B. Falconer's Library Typer, in LT29.LBR. These work essentially the same way as CRUNCH 2.4 and LT 2.8, but the LZH algorithm has been added (or in the case of CRUNCH, it's areplacement).

The new ZIP algorithm for seems still more efficient, but there isn't a CP/M utility to make ZIP files, yet. On the other hand, UNZIP version 1.0 does seem to work well.

I performed a comparison to see how long it would take to squeeze (using NSWP 2.07 -- it's the only program I have around that squeezes), crunch, crunch with LZH, and ARK a 109 record (about 14K) text file, and how efficiently these programs would compress the text file. Table Three shows the relative efficiency in compressing the files, with a "Squeeze Rating" showing the compressed file size ratio to that achieved by LZH, the most efficient compressor.

Table One shows the timing results when done on a Kaypro 1 with a RAM disk and my "Overall Rating", which is the Time Rating (relative time compared to CRUNCH, the fastet compressor). Table Two shows the results when done on the same system on floppy disk.

The overall rating gives equal weight to elapsed time and file size saving. CRUNCH comes out on top because it is a lot faster than any of the other CP/M alternatives. The LZH algorithm saves about 7% more space, but takes more than twice as long. LZH comes out better on floppy disk because it makes the smallest file and the time to write the file is more significant on floppy disk than on the RAM disk.

Time saved for single files may be lost if you have to deal with multiple files, crunching them and then assembling them in a LBR file. ARK gains here, because it not only compresses the files but puts them in an ARK file at the same time. VLU (Visual Library Utility) will allow you to tag a series of files which VLU will then crunch and put in a LBR file all in one step. There are versions for vanilla CP/M systems and Z-System. For LBR file maintenance, however, NULU is still the king.

The compatibility factor may, however, indicate that ARK should be used. Files that may be transferred to MS-DOS computers can readily be UNARCed, but MS-DOS users may have problems with CRUNCHed files. My tests of the UNCR232 (for MS-DOS) indicate that it is unreliable. I don't know yet whether CRLZH files can be un-LZHed by any MS-DOS programs.

For personal use of single files, CRUNCH may be best: the files are only 12% larger than LZH files with a time saving that is considerable. But only the second half of the old saw, 'ya pays yer money and ya takes yer choice', applies here: all the programs are public domain.

TABLE ONE:  In RAM disk

Raw Time Squeeze Overall
Method Time Rating Rating Rating
==============================================
SQ 52.8 1.70 1.76 2.98
CR 31.1 1.00 1.12 1.12
LZH 105.0 3.38 1.00 3.38
ARC 64.3 2.07 1.21 2.51
TABLE TWO:  On floppy disk

Raw Time Squeeze Overall
Method Time Rating Rating Rating
==============================================
SQ 81.7 1.76 1.76 3.09
CR 46.4 1.00 1.12 1.12
LZH 124.6 2.69 1.00 2.69
ARC 99.8 2.15 1.21 2.61
TABLE THREE:  Space Saved

Size in Percent Squeeze
Records Squeezed Rating
==========================================
ORIGINAL 429 --.-- -.--
SQ 297 69.23% 1.76
CR 189 44.06% 1.12
LZH 169 39.39% 1.00
ARC 205 47.79% 1.21