|
ccomp.sh - script for compressing files
Download ccomp.sh
Usage explained by Hikaru: ccomp.sh is designed around the idea that if it's going to screw up, it's best to give up immediately; so if any error occurs while it's being used, it gives up right away and leaves things
in an inconsistent state; generally speaking, this means that you'll either be left with the original file untouched, or a gzipped version of the file and a partially created bzip2'd version of the file. If this second scenario does happen, your original file is okay, just gzip -d the .gz file and delete the .bz2 version.
Okay, forgive me for not remembering how to do this before. One of the big limitations of ccomp.sh is that it does not have the ability to handle multiple files at once - luckily, both bash and find are designed to work with programs that have this exact problem.
An example of how to use find+ccomp to compress *all* files in the current directory and all subdirectories:
find -type f -exec /path/to/ccomp.sh \{\} \;
Another example, this specifies it to start searching for files to compress from a specific location, in this case /home/tm/foo/
find /home/tm/foo/ -type f -exec /path/to/ccomp.sh \{\} \;
To decompress all of the files in those directories later, you'd need to run both of these:
find /home/tm/foo/ -type f -name \*.bz2 -exec bzip2 -d \{\} \;
find /home/tm/foo/ -type f -name \*.gz -exec gzip -d \{\} \;
This example in bash compresses any file in the current directory, but does not compress any subdirectories. Unfortunately, it's slightly dumb and will attempt to send any subdirectories as arguments to ccomp.sh; this will cause ccomp.sh to error and abort that argument, this is *harmless* and bash will continue, but will cause ccomp.sh to complain. I just want you to be aware
of this.
for i in *; do /path/to/ccomp.sh "$i"; done
In any case, this should give you plenty of examples of how to use it for multiple files.
Let me explain what kind of situation you would want to use ccomp.sh for multiple files in by first giving an example of when NOT to use it on multiple files:
Lets say you have plenty of disk space free to make a tarball of all of the files - that is to say, you have at least 3* of the amount free the files take up. It's much easier in this case *and* gives you better compression ratios to make a tarball of the files, delete the files, and then compress the tar using ccomp.sh
Hmm... Maybe I should explain how to get this magic value and why it's
important before I continue:
First, let me use a real example on my hard disk:
du -h ~/mid
29M mid
^^^ in the above example the 'mid' subdirectory, all the files in it and all of the files in all of the subdirectories in it take up 29MB.
df -h shows me how much disk space I have free
Filesystem Size Used Avail Use% Mounted on
/dev/md0 226G 215G 11G 96% /
^^^^^
As you can see, I clearly have more than 3*29MB free. (1GB is 1024MB)
Now for an explanation of where I get the 3* value:
When ccomp.sh compresses a file, it first compresses the file using gzip - during compression until it is completely finished, the original file still exists, so assuming worst case behavior, we want at least 2* the amount of the file free on the disk. Given that it's *possible* though *extremely unlikely* for gzip+bzip2 to create larger output files than the original and since when compressing for bzip2, the gzip'd file exists at the same time the bzip2'd one does, it's best to make sure that there's 3* the amount of space free. The worst case scenario doesn't honestly happen very often, but it's important to be *aware* it exists. (FYI: ccomp.sh is smart about this:
after compressing the gzip+bzip2'd versions, if it notices the original file size was smaller than either of the compressed versions, it'll restore the original file and remove the compressed ones.)
(to create mid.tar.gz I did:
tar cvf mid.tar mid
ccomp.sh mid.tar
)
Anyway, a demonstration of how the tarball compresses better
4.0M mid.tar.gz
vs, using ccomp to compress the individual files:
6.7M mid
Also, you may not be aware of this, but directories take up space too and aren't counted in the totals: On ext3, each directory generally will take up 4K each.
ls -lad ~/mid <-- this just shows the specific directory.
drwxr-xr-x 24 tm users 4096 2006-10-01 04:14 mid/
So, now you're aware of the fact that making a tarball of all of the files would give you much better compression ratios, as well as the fact that once you rm'd the files off the filesystem you'd be saving even *more* just from the directories not being around...
So... Why would you ever *want* to use ccomp.sh recursively?
In my situation, I needed to create an ISO image before I could burn it - my hardware was flaky and couldn't handle making the ISO while burning it to acd. The ideal ISO size was ~650MB ... Anyway, I didn't have that much space free!
So, I had a catch-22, - I needed to burn files to the cd to free space, but I couldn't create the image because there wasn't enough space in the first place.
Long story short, I temporarily compressed a slew of files using find+ccomp.sh and magically got enough space free to create the cd image...
|