Ask Leo! by Leo A. Notenboom

How can an anti-malware program possibly scan all my files in a reasonable amount of time?

Search First! Then browse: Categories | Full Archive | By Date | Newsletter

Home » Viruses and Malware » Malware Prevention

Summary: Anti-malware software is amazingly tuned and optimized for doing what it does. On top of that, scanning all your files might not always be needed.

How in the world can my antivirus/antispyware/antimalware program possibly scan all of my files for the thousands of trojans/signatures out there without taking an eon to do so? Don't they have to scan every file on your computer (or at the very least the exes, zips, dlls and registry) sequentially for a trojan-name and/or each signature? I can only presume they must do this one trojan-name/signature at a time, and then repeat. I can't fathom how it can be done so quickly, relatively speaking, given the task at hand. Heck - just a manual search for one or two obscure files on my computer can take me almost as long to find them - if I even do!

And here I was thinking that the virus scans take forever, and you're wondering how they can be so fast! It's all a matter of perspective, I suppose.

The short answer is that sometimes it does take a really long time. But there are techniques that scanners use to dramatically speed up the process, or at least make it look that way.

In addition not everything is, in fact, a scanner.

Time for some explanation of how anti-malware software typically works.

I want to throw out an entire class of anti-malware software before we even start: those that don't scan files at all.

Much of what we classify as "anti-spyware" software doesn't scan files. In fact, it's one of the semi-accurate rules of thumb that differentiate anti-spyware tools from anti-virus (though the line continues to blur over time.)

"... in a sense, the scanner's actually looking for almost all the viruses at once."

Rather than scanning files, these tools monitor behavior. For example, they might look for attempts to reset your browser home page as it happens. If things look kosher, they allow it, if not, they alert. No scanning was involved, they just hook into places where spyware-like behavior is likely to happen, and keep an eye out.

I'm not saying all anti-spyware software doesn't scan, (or that all anti-virus tools don't watch behavior), I'm just saying that a large portion of what anti-spyware software often does doesn't involve scanning at all.

So I'll focus on anti-virus software, which typically does scan.

As the question outlines, you would think that a complete scan would involve two things:

  • Read the contents of every file on your hard drive (or whatever media is being scanned).

  • Compare the entire contents of each file against the pattern of every known virus.

In other words, one heck of a lot of work.

Fortunately there are several shortcuts that anti-virus software can take.

  • Full scans can happen in the background. In reality, a full scan typically happens in the background as you're doing other things. As a result you might not realize just how long the scan is taking. A good scanner will prioritize its work in such a way so as not to impact what you're doing, but still get its work done. It's not uncommon for this type of scan to take hours, and if its any good, you'd never notice.

  • Full scans can be scheduled for when you're not using your computer. Once again, in the "so you'd never notice it" category, a full scan could be scheduled to happen automatically in the middle of the night (assuming you leave your computer on), or at some other time that's appropriate. It could take a long time, but if you don't see it, did it matter?

  • Full scans might not be needed after the first. After you install your anti-virus software it typically does one full scan shortly thereafter. Theoretically as long as it then monitors all the files that arrive on your machine as they come in, and any changes to the files that are already on your machine, another full scan isn't really necessary. There's often no need to re-scan an old file that's never changed. Many anti-virus products' default configuration use exactly this model.

  • It might not scan every file. In reality, not all file types can carry viruses. ".exe" or ".dll" files are typical targets, but a ".dat", ".chm" or even a ".leo" files are not. That's not to say that they couldn't contain a virus, just that there's typically no way for that virus to be run. Virus scanners can take advantage of that and not bother scanning many types of files at all. Once again, this is typically an option that is set by default.

The other part of this scenario is that actual algorithms used to perform the scan aren't as brute force as we might think.

Let's say there are 100,000 virus definitions that your anti-virus software needs to look for. The scan is most certainly not 100,000 cases of "is it this one?", repeated for each file being scanned. That really would take forever.

In reality, the data and the patterns that make up the various virus signatures are optimized and stored in such a way that, in a sense, the scanner's actually looking for almost all the viruses at once. It's difficult to describe without getting all geeky, or even computer science-y, but I'm sure there's a lot of math, organization and optimization around setting up anti-virus databases in such a way as to optimize for the fastest and most complete scan possible. I'd bet it rivals the complexity of encryption in many ways.

I'd also bet it's one of the key differentiators in anti-virus software.

So, in a nutshell: not everything's a scanner, you might not notice full scans, full scans might not even be needed, and the actual technology of a scan is much faster than you might think.

Me, I'm just glad there are smart people in the world who are writing these critical pieces of our security infrastructure.

Related:

Helpful? Get new articles weekly by email in my FREE newsletter!

Your Name:
Your Email:


Why Subscribe?

Article C3740 - May 22, 2009

Recent Comments
2 Comments

could fast scans be a bad thing? NOD32 does a full scan 5 times faster than Avast.

But..

i scanned Omea Pro.exe (an insanely powerful free RSS reader) with NOD32 v4. the log shows it tried to scan various .dll, .txt, .exe objects but against each it said something like "OmeaSetup-2.2.1.exe » NSIS » StartMenu.dll - decompression could not complete (possible reasons: insufficient free memory or disk space, or a problem with temp folders)"

i have gigs of free space and 1gb of my RAM was free when i ran the scan. so could it be part of the complex algorithm? how do i ensure that a particular file is actually scanned? there seems to be no way i could make NOD32 scan that file.

Posted by: umair at May 22, 2009 11:28 PM

A simple principle which explains how so many things can be done very fast is putting things in alphabetcal order. Suppose i have a book written in a foreign languge but which uses our alphabet and I want to know if it contains any English words. (An English word is any of 100,000 or so which is on a list.) I look at each word in the foreign book and for each the question is, is it on the English list. That does not mean comparing iot with 100,000 others. My English list is a dictionary. Finding out whether "reciept" say is in the dictionary (whch we might have to do if we are not sure how to spell "receipt") just menas finding it if it is in or finding the pace where it would be if is isn't. This takes far fewer look ups. 17 in fact. About the log to base 2 of the number of words in the dictionary.

Posted by: Philip Roe at May 26, 2009 10:31 AM

Post a comment on "How can an anti-malware program possibly scan all my files in a reasonable amount of time?":






(Email Address will not be published.)

Remember Me?

By popular demand...
my tip jar
Cuppa Joe
Buy Leo a Latte!

(you may use HTML tags for style)

RSS feed Subscribe to the RSS Feed specifically for comments on this article.

Before commenting, please...

  • Read the article at the top of this page. If your comment shows you didn't, it'll be deleted and ignored.

  • Comment only on this article. Use the Google search box at the top of the page if you have a question about something else.

  • Don't include personal information in the comment. No email addresses. No phone numbers. No physical addresses.

  • Don't spam. Excessive links to unrelated sites within a comment or across multiple comments will cause all such comments to be removed.

  • Don't ask me to recover lost passwords or hacked accounts. I can't, and those comments will be deleted.

  • I can't respond to every comment. And I can't vouch for the accuracy of others who do.

Please wait. Your comment is being processed ...


Question? Ask Leo!