Topics Map > Endpoint Support > Linux
Using Compressed Data in Linux
Linux has a variety of tools for working with compressed data. This article will describe how to use them, and why.
The catch is that it takes CPU time to compress or uncompress a file. Thus compression is really a way to trade CPU power for disk space. For files you use constantly, this may not be a good trade. But we strongly encourage you to compress any data sets you are not using on a regular basis. The SSCC's current disk space was quite costly and we hope to avoid adding to it any sooner than necessary. This article will not attempt cover all the available compression tools or all the things they can do, just the most common usage. Full details are available by typing man and then the name of the command in Linux (e.g.man compress).
Compression Types
compress/uncompress
The compress and uncompress commands are very easy to use:
compressfile Will replace file with the compressed file, file.Z(think zipped).
uncompressfile replaces the compressed file with the original. Uncompress doesn't care if you include the.Z at the end or not--it will find the file either way.
gzip/gunzip
gzip and gunzipwork in exactly the same way: gzip file will replace file with the compressedfile.gz. gunzipfile will replace the compressed file with the original.
bzip2/bunzip2
bzip2 and bzip2 are another variation on the same theme. bzip2 file will replace file with the compressedfile.bZ2. bunzip2 file.bz2 will replace the compressed file with the original. Note that in this case you must type the .bz2 at the end of the name of the file to be uncompressed.
zip/unzip
zip works slightly differently in that it asks you to name the compressed file: zip compressedFile file will create compressedFile.zip (the.zipis added automatically), containing a compressed version of file. The original file is not removed. unzip compressedFile will recreate the original file. The compressed file is not removed.
Which Command Should I Use?
Unfortunately which command will work best depends on the exact properties of the file you're working with. Bzip will usually give the best compression, while Zip files are more easily used on Windows.
How Do I Uncompress this File?
Suppose you've obtained a file, perhaps via email or from the web, and you know it's compressed but you don't know what program was used to compress it. Look at the last letters of the file name, following the period:
Last Letters of the File Name... | Program it was probably compressed with... |
---|---|
.Z | compress |
.gz | gzip |
.bz2 | bzip2 |
.zip | zip(possibly a Windows program like Winzip) |
Note that both uncompress and unzip will handle Windows .zipfiles just fine. Feel free to just experiment: if you try to uncompress a file using a program that can't read the needed format, it will just give you an error message and quit.
zcat/bzcat
The zcat command reads a compressed file and sends the results to the standard output (use bzcat with bzip2). Just typing zcatfile wherefileis a compressed file, will display the tables of the file on the screen. But the real point is to use the results in other programs. For example, to see the results one page at a time pipe the output to the more command: zcatfile| more Both SAS and Stata can read directly from the output of the zcat command. For instructions seeUsing Compressed Data in SAS orUsing Stata on Linux. Note that SAS has compression built in as a data set option. Stata users should consider using the user-written gzsave and gzuse commands. These act just like the regular save and usecommands, but the file on disk is compressed just as if you had used gzip on it.