Technology in terms you understand. Sign up for the Confident Computing newsletter for weekly solutions to make your life easier. Click here and get The Ask Leo! Guide to Staying Safe on the Internet — FREE Edition as my thank you for subscribing!

If I empty the contents of a file and then save it, is the data really deleted?

Question:

You’ve recently spent some time on deleting files. I understand that just
hitting Shift+Delete doesn’t rid the hard disk of the file, but I’ve long
wondered about some other things: Suppose I have an Excel or Word file that
contains personal info (say a list of passwords or other sensitive
information) and I decide that’s not such a good idea. If I delete all of the
information, then save the file, is that information gone forever? Likewise,
suppose this file is called “password.xls,” and I create a new (even
blank) spreadsheet, save it as the same file (password.xls), and click
‘Yes’ to “Replace existing file?” Have I successfully hidden those passwords
(or whatever) forever? Are they off my disk now? Any chance that life could
be this simple?

Let me put it this way: when it comes to computers, life is rarely
simple.

This situation is no exception.

The short answer to your question is of course not – the data might still
be recoverable.

The longer answer is all about why.

]]>

Overwritten files aren’t overwritten

The assumption in your question is that when you update a file with new data, or even removing everything within the file, the new file will be written into the exact same place on the hard disk as the original.

That’s a bad assumption. In fact, it’s not really even what you want the programs to do.

The steps to update a file

I’m going to run with your “password.xls” example; it’s possibly one of the most common user-generated files in existence as we all try to manage our online world.

This is just a conceptual example – I’m not claiming that Excel, any other spreadsheet program, or any specific program for that matter works in exactly the way that I’m going to describe. However, I know that many do (or do so in very similar ways).

Let’s say that you make a change to password.xls – it could be either a small change or a large change, like deleting all of the entries.

Here’s how Excel might operate:

  • You open password.xls.

  • You make your changes.

  • You click Save.

  • Excel opens a temporary file in the same folder as the original.

  • Excel writes the updated spreadsheet to the temporary file.

  • Excel closes the temporary file.

  • Excel deletes the original password.xls.

  • Excel renames the temporary file to password.xls.

See what happened there? In the next to the last step, “Excel deletes the original password.xls.” That’s a plain old delete, not a secure one. The areas on the hard disk that the original password.xls occupied were not overwritten and could potentially be recovered.

It’s all about error handling

You might be tempted to ask why any program would go through such a convoluted series of steps to just update the contents of a file.

It’s something software engineers have to think of constantly: what if something goes wrong?

What if the program crashes? What if writing to the disk fails? What if, what if, what if?

If the program were updating the file in-place and crashed during the operation, the result might well be a corrupt, garbled, or destroyed original file with nothing to back it up.

By writing to a new file and not deleting the original until that new file has been successfully written, the program ensures that the original remains intact for as long as possible.

In fact, many programs don’t delete the file at all. They simply rename the original to the original.bak.

It’s not just saving in the application

Almost anything that looks like you’re about to overwrite the file has the high probability of operating as I’ve just described:

  • Using Save As… to overwrite an existing file.

  • Using Windows Explorer to copy a file “on top of” your original.

  • Using most command-line copy utilities to copy a file on top of your original.

  • And probably several other ways to overwrite a file that I can’t think of right now.

Even “Replace existing file?” is really talking about the file name and doesn’t imply that the old data is being physically overwritten.

The bottom line

If it matters to you, a secure delete or free-space wipe remains the technique of choice to make sure that the contents of the files that you delete – whether by actually deleting them or overwriting them – are actually overwritten on the hard drive.

About password.xls

I said above that password.xls is perhaps one of the most popular user created files in existence for the obvious reason that spreadsheets are a pretty useful way to keep track of passwords.

Yes, I even have password.xls.

The problem is that it’s also one of the least secure ways to store passwords if not done properly. Even if the spreadsheet is itself password-protected, it still isn’t considered really secure; there are apparently ways to crack password-protected spreadsheets.

I keep mine in a TrueCrypt volume. When the update process that I outlined above happens, the new file and the old file both exist only in the encrypted partition. While file recovery tools could be used to access the old file, they would work only if the encrypted volume were actually mounted using the passphrase – at which point the latest and greatest is right there anyway.

By putting it into an encrypted volume, the need for multiple-pass secure delete is also eliminated as the information that’s actually written to my hard disk is always encrypted. What could perhaps be recovered from the hard disk is useful if and only if you already have the volume’s encryption passphrase.

One final consideration

I bring up password.xls and using an encrypted volume to raise another possibility.

The program that you’re using to edit the file could work like this:

  • You click Save.

  • The program opens a temporary file in the system temp folder.

  • The program writes to the temporary file.

  • The program closes the temporary file.

  • The program deletes the original file.

  • The program copies the temporary file to the original file.

  • The program deletes the temporary file in the system temp folder.

So even though the file exists on a secure encrypted drive, it is possible that the program being used to modify it may have made a copy in another temporary location – a temporary location that is not encrypted and from which the data could be recovered.

Once again, if it matters (and it may not), a secure delete or free space wipe is the answer.

Do this

Subscribe to Confident Computing! Less frustration and more confidence, solutions, answers, and tips in your inbox every week.

I'll see you there!

9 comments on “If I empty the contents of a file and then save it, is the data really deleted?”

  1. Leo, what if you saved a new file to an external storage media, such as a flash drive. You make some changes to the file, and then save the changes to the same file on the flash drive. Is there anyway these actions on the flash drive could have also caused a temporary copy of the file to be saved somewhere on the PC (hard disk, etc.), which in turn now makes possible the recovery of the data from the PC? (Yikes!) Thanks…

    It depends on the software being used to edit the file, but yes – temporary files could be created elsewhere.

    Leo
    17-Mar-2012
    Reply
  2. to ‘Yeppers’
    Considering Windows is ALWAYS using some portion of the hard drive as ‘memory’, the answer is a definate “maybe” – regardless of where it actually stores the working version of the file.

    Reply
  3. Many advanced document formats like MS Office support versions within the each file (not to mention OS versions of each file). So changes may be remembered between saves.

    Reply
  4. Very good example, Leo. I do have a bit of an exception to your sample, though.
    At least with MS Office products, the temporary file is created upon opening the original. That way, all changes are made to the temp file. The original remains unchanged. Pressing the SAVE button causes the sequence you described: delete original, rename temp.
    A difference would occur if the SAVE AS option were chosen. In this case, the temp file is named as indicated by the SAVE AS name and a NEW temp file is opened. In fact, the original file retains all of its characteristics including the created/last modified date. This sequence can continue ad nauseum.
    Please correct me if I’m wrong.

    Reply
  5. –geek rant–
    From an IT perspective, the “modify file” permissions of windows has always been one of frustration despite the fact that it’s logically necessary. When a user modifies a file and the original finally gets deleted, that’s an actual “DELETE” permission and when you try to set permissions on windows folders to not allow deletes, you inadvertantly set the permissions to not allow modifies either.
    I get that the original file needs to be preserved and that’s more important than what I’m complaining about but you’d think the tech geeks at MS would have considered the security permissions during this process.
    –/geek rant–

    Leo…where you part of this original think tank?!?!

    …just curious ;-)

    I’m innocent! Smile My *guess* is that we got here by needing to be compatible with MS-DOS for some period of time. I could be wrong.

    Leo
    17-Mar-2012
    Reply
  6. “Pressing the SAVE button causes the sequence you described: delete ORIGINAL, rename TEMP.”

    I suggest that the use of “delete original” here is misleading, although I agree that the action is so-called, mainly for simplicity.

    Surely what really happens is:-

    “Rename and Note that ORIGINAL (File) Disk Space is available for (Later) Over-Writing, if required at some Indeterminate Time in the future.”

    Theoretically, there is a chance that the Contents of ORIGINAL and any other file are NEVER OVER-WRITTEN, depending upon how intensively the HDD and PC are used.

    And that Indexing Information is stored in the
    (Reserved) Disk INDEX part of the HDD etc, thus the Working Contents on the rest of the HDD, are effectively NOT ALTERED until a subsequent Over-Write occurs from the Saving of any File that needs the space, whether that later file has any conventional relationship to ORIGINAL or not.

    The whole process is very much more complex than normally presented.

    And those complexities also apply to Folders/Directories as well.

    Reply
  7. Another thing to take into consideration with programs like MS Office and many others, is that they also save a copy of your program every so and so number of minutes as a backup in case the program crashes. So even if you only save the file once and use a secure wipe on it, there can be one or more backup copies of that file on your computer which were simply deleted in a non-secure manner.

    Reply
  8. I used to have a “password.xls” file but realised that it would be the first thing anyone getting access to my computer (or coming across the backup DVDs ‘filed’ in the shed at the bottom of the garden) would look at. So now the same file has a misleading name, is in .xlsx (slightly more secure) format and is password protected itself.

    Reply
  9. Leo, in the above section entitled, “The Steps to Update a File”, it sounds like if I save a file 10 times, I can end up having 10 copies of that file: 9 temporary copies and 1 current copy. If that is the case (or even if I really end up with only 1 temporary and 1 current copies), will a secure deletion software find ALL the temporary copies and the current copy of that file? If not, do you think a more effective approach is to just do a regular delete, followed by a free-space wipe? This will take a lot of, maybe too much, time, but it seems to produce a more secure of a deletion.

    (This article was truly eye-opening! Count me as being part of the paranoid group, as I’m beginning to think that the only real secure deletion tool is a sledgehammer.)

    Thanks…

    A secure delete only operates on the one file that you are saying “securely delete this” – there’s no concept of temporary files. You’ll want to run a free space wipe.

    Leo
    17-Apr-2012
    Reply

Leave a reply:

Before commenting please:

  • Read the article.
  • Comment on the article.
  • No personal information.
  • No spam.

Comments violating those rules will be removed. Comments that don't add value will be removed, including off-topic or content-free comments, or comments that look even a little bit like spam. All comments containing links and certain keywords will be moderated before publication.

I want comments to be valuable for everyone, including those who come later and take the time to read.