Helping people with computers... one answer at a time.

Compressing files prior to archiving to CD or DVD doesn't increase the chance of corruption, but could dramatically increase the impact, if it happens.

You've mentioned CD-Rs and DVD-Rs more than once as excellent ways to back up data. Now I have about ten gigabytes of data to backup. If I compress the files to ZIP format, I can reduce them down to under four gigabytes--small enough to burn to a DVD-R. But I am scared to do this, because I fear my important files might eventually get corrupted or damaged if compressed. I've had many bad experiences using compressed file formats (ZIP, RAR, 7z, etc). It seems that any compressed file I leave alone for too long ends up damaged or corrupted at some point. My question is, will burning my compressed files to a finalized, non-rewriteable DVD prevent them from getting corrupted? (I would assume that data on a finalized DVD cannot be changed?)

There's nothing about compression that increases the likelihood of corruption. It doesn't matter what format you pick, or how well the compression is performed, the actual chances of corruption are completely, and totally unrelated.

The impact of corruption, on the other hand, is an entirely different story.

The kind of corruption we're talking about here is likely caused by a bad sector on the media - be it CD, DVD or even a hard disk. Some data in the middle of one of your files becomes unreadable. Depending on exactly what that file contains, the impact could be negligible (some easily cleaned up noise in the middle of a document perhaps), or catastrophic (the entire file is rendered useless).

"... a single bad sector is likely only to affect a single file."

If you have lots of files archived or backed up in an uncompressed form then a single bad sector is likely only to affect a single file. Depending on the file, as I mentioned above, this might be benign or catastrophic but it's limited to that single file. In particular, you might never even notice if that happens to be a file you never attempt to recover.

Compression utilities most often do two things:

  • Compress each file's data so as to take up less space.

    Compression is simply a mathematical algorithm that analyzes the raw bytes within the file, and uses alternate ways to represent the same information in less space. For one over-simplified example, a series of ten asterisks (**********) might be replaced by an indicator that what follows is compressed data, a single asterisk, and a count of 10. 10 bytes have been reduced to 3, with no data loss. On decompression the encoded data is expanded back to the 10 asterisks.

  • Bundle a number of compressed files together into a single file, so that the aggregate will take up less space.

    Files are stored on hard disks in "clusters" or "sectors", which have a minimum size. For example, a file containing our 10 asterisks will take up at least 512 bytes on a disk - the size of one sector. A file containing 513 bytes will take up at least 1024 bytes - two sectors, and so on.

    By collecting all the compressed files into a single container file, all this storage inefficiency is avoided. Files take up whatever compressed space they need in the container, and no more.

Here's the problem: we've said that in the worst case a single bad sector could render an entire file unrecoverable.

And we've just placed all our "files" into a single container file.

The bad news that you often hear about corruption in archives has nothing to do with increased corruption at all. It has to do with the fact that everything was placed into a single container file, and after a corruption that container file was rendered completely inaccessible.

And all the files within it, lost.

So as you can see, by using a compressed archive format, you haven't really increased the likelihood of corruption on the disk, but you have increased the impact of such corruption should it happen - perhaps dramatically.

So, what should you do?

  • Choose and use good media.

  • Test your media by making sure that once written it can be read on other machines - preferably more than one.

  • Make multiple copies - Even quality CDs and DVDs are cheap.

  • Consider compressing individual files, rather than creating compressed archives. While the results are larger overall, this is most effective on larger files, and reduces your exposure to corruption back to the single-file level.

  • Consider not compressing at all. Perhaps create a set of DVDs with your data, rather than trying to get it all on one.

I've explicitly avoided talking about specific compression tools, like Zip, WinZip, 7-Zip, Rar, Gzip and others, simply because the characteristics of each, and the availability of recovery tools for each, varies widely. And while choosing a good one is important (I'm a fan of 7-Zip), I think it's less important than the choices you make above to avoid or sidestep the problem in the first place.

Article C3694 - April 2, 2009 « »

Share this article with your friends:

Share this article on Facebook Tweet this article Email a link to this article
Leo Leo A. Notenboom has been playing with computers since he was required to take a programming class in 1976. An 18 year career as a programmer at Microsoft soon followed. After "retiring" in 2001, Leo started Ask Leo! in 2003 as a place for answers to common computer and technical questions. More about Leo.

Not what you needed?

6 Comments
Richard FDisk
April 7, 2009 9:55 AM

Good Info;
as always "Back up, Back up, Back up..."
I never rely on a single copy of anything whether it's documents, images, audio or video files etc.
and when I do archiving I make a duplicate of the disc, CD, DVD are rather inexpensive now, even if it costs a $1 per disc that's still cheaper than replacing or trying to replace some irreplaceable data.

Roger
April 7, 2009 1:36 PM

I believe you've mentioned using True Image to do backups. I've been using that as well. After reading your article, I started thinking... A dangerous thing for me. When you use True Image, doesn't that mean you're putting all your stuff in one file - increasing the potential problems with corruption? How much does having True Image verify data when backing up help reduce the chances of corruption?

All backup solutions that create a single image file do indeed fall into this bucket. Verifying after write helps, as do the additional techniques mentioned in the article (multipel copies, good media, etc.)
- Leo
08-Apr-2009

Michael
April 7, 2009 2:20 PM

Why compress at all? I back up very frequently on to a 420gig external hard drive that I purchased for about 70. A good backup programme handles everything for you and the files stay uncompressed. You will then have no issues over compression.

David Martin
April 7, 2009 6:12 PM

I'm with Michael on this one. Use an external drive (and keep it somewhere else - not next to your computer or in the cupboard - take it to work or something). And make copies to DVD's (although even they can,IMHO, let you down. Finally, consider free/cheap offline/online backup services such as http://www.topshareware.com/Backup2Net-download-40601.htm

Mark
April 7, 2009 6:24 PM

I have dozens of backups DVDs around; I'm constantly backing up data, copying and moving data. I also store things regularly on flash drives. It's not a good idea to put all files in a single compressed container, for the reasons stated above. And with today's cheap, large storage, there's no real reason to compress.

Terry Hollett
April 8, 2009 4:19 AM

I have avoided compression of files for the same reasons. If I had a thousand files on a CD uncompressed and a few bad sectors developed I could still recover some files. If they where all in one and it became corrupted - they are all gone. That has always been my reasoning.

Except for programs/games. If even one program file becomes corrupt,its screwed anyway. So I don't mind compressing games and program files.

And I always make 2 copies of my backup CD/DVDs.

http://www.geocities.com/terryhollett2003/

Comments on this entry are closed.

If you have a question, start by using the search box up at the top of the page - there's a very good chance that your question has already been answered on Ask Leo!.

If you don't find your answer, head out to http://askleo.com/ask to ask your question.