How Can Two Programs that Are Nearly Identical Be so Different in Size and Speed?

Question:

Why does one stand alone anti-spyware program (just as an example) take up 10MB on my machine but a similar anti-spyware program
takes up 150MB? They both have a GUI; have real time scans; scan for spyware, adware, trojans, keyloggers, rootkits; automatic
signature definition updates; allow the user to configure what gets scanned, when it gets scanned, etc. etc. And they’re both
free.

Other than one program having a fancier dashboard, they seem to be almost identical in functionality and they’ve been rated just
about equal by various reviewers. Does the larger program hog more CPU resources simply because it is larger?

I’m sure it’s kinda strange to the non technical person, but what you describe is very, very common. Programs that do similar
things are often dramatically different in both size, and speed.

The answer’s actually fairly complex, since there are many things that factor in – every thing from a variety of choices made by
the software’s designer, to the age of the product.

]]>
<![CDATA[

Become a Patron of Ask Leo! and go ad-free!

The most obvious difference that comes to mind is the choice of programming language in which the software is
written. Different languages are more or less efficient in converting human readable instructions into the sequence of actual
machine instructions that the CPU executes. Sometimes those differences can be quite large.

“The most obvious difference that comes to mind is the choice of programming language …”

Now, of course, one might ask “well, then why don’t they all choose the one that generates the smallest/fastest program?” – and of
course it’s not that simple. The trade-off is typically development time. Different programming languages typically require
different amounts of work to write the same program. At one extreme, one could write a program in assembly language, and it could be
extremely small and fast – and would probably take twice as long, if not longer, to write than if the program were written in a
higher level programming language like C++, Java or others. In today’s business climate, when you can deliver is often as
important as what you can deliver, so the amount of time it takes just to write the software is a serious
consideration.

Choice of programming language goes well beyond simple development speed. Things like staff training, appropriateness to the
task, availability of development and diagnostic tools for that language can all play a part. Even personal taste – since many
consider programming as much an art as a science – can play an important role in this fundamental choice.

•

Which leads to another difference: software design. For any problem that can be solved with software, there are
as many ways to write that solution as there are programmers who would write it. Many solutions are simple, fast and elegant – as I
said, almost an art. But there are two important things to note: not all programmers are alike, and not all solutions are the
obvious choice.

There’s a rule of thumb among software development that the best engineers are generally 10 times better than the average. And
the worst are 10 times worst than the average. That’s quite a wide spectrum of ability within the programming community. Now
“best” and “worst” are fuzzy terms, but a good way to illustrate the difference might be to put it this way: what a great
programmer might be able to write in a single “line” of programming language, a less talented individual might solve with
significantly more code. And of course more code means a bigger program.

It’s also not always the case that a single line of code is the right choice, depending on the overall design goals of the
software. Two different solutions, perhaps very different in size, may solve the same problem but do so in ways that expose other
significant differences. For example, the smaller solution might be exceptionally difficult to understand and very fragile, while
the larger could be very simple to understand and difficult to break. A design decision might be made to make a trade-off of
stability over size. The result might be that the larger solution would be the one implemented.

•

Another difference that’s often overlooked is something called runtime libraries or runtime support.

Programmers rarely write everything from scratch. They assume and use existing libraries of software that already exist. As a
simple example a programmer should never have to write software to concatenate (join) to strings of text (say “ABC” and “XYZ”)
together into a single string (“ABCXYZ”). There are library routines that exist that a programmer can use to do exactly that
without having to write (and test and debug) that code him or herself.

In an ideal world, such a library of existing software would be very granular, picking up only those routines actually needed by
the program being written. Alas, this is not an ideal world, and one of the differences between programming languages and systems
used is, in fact, the size of the runtime library that they drag along. In one language, using a string concatenation function might
cause exactly and only that function to be included. In another language, it might cause that, and several other string functions to
also be included in your program, whether or not you actually use them.

At a slightly higher level this is actually why things like the .NET framework exist. The .NET framework includes many higher
level functions that can be used to make accessing Windows and other services easier for programmers. By programming to it,
applications can assume that everything in the .NET framework is already on the machine – which means that they don’t have to
write, or include, any of that functionality in their software. The result is that their programs appear smaller. (.NET is just an
example, there are actually many approaches, both similar and different, from the Visual Basic runtime to the Java virtual
machine.)

•

A final aspect I want to address is software age. Left to itself, software only gets larger. By that I mean that as software
progresses from version to version and reacts to the changing landscape of both operating system changes and changing user
requirements, it only grows in size.

The term “ball of mud” has been used to describe software growth. Each new feature, each new request, each new demand on the
software just causes a little bit more mud to get slapped on the ball until the ball becomes enormous (and occasionally collapses
under its own weight).

There are a couple of ways this can happen: new features of course require additional code. That much is fairly obvious; more
work means more instructions to the computer on how to perform that work. Changed features, sometimes even removal of features, can
also often cause the program size to expand in unexpected ways. As one simple example, it’s often more expedient (and safer) to
disable a feature rather than remove it. The code to implement the feature remains, but is never accessed because code is
added to prevent that access from occurring. Particularly in already large and complex programs this, and similar side
effects of code changes, often cause growth.

In fact, often the only way to make an elderly program smaller is to chuck it and re-write it from scratch. This is typically a
very expensive proposition, but depending on the age, stability and ease of maintenance of the older software it can be a very
legitimate and lucrative choice. When starting over every design decision and choice, starting from the programming language to be
used on up, is open for review.

•

Finally, size is not an indicator of efficiency or expected CPU usage. Size and speed are, essentially, two independent things.
While software designers strive for small and fast, in fact the complexities of software design often include many tradeoffs that
may, or may not, relate the two. Small programs can be fast or slow. Large programs can be fast or slow.

6 comments on “How Can Two Programs that Are Nearly Identical Be so Different in Size and Speed?”

John Williams

July 28, 2009 at 8:41 am

Good article and something I often wondered. I read on http://www.grc.com that steve Gibson always writes in assembly language
Hugh

July 28, 2009 at 8:42 am

compiling can make a huge difference in the end product… ? i.e. GCC -G linux of course but there is a cygwin equivalent.
Michael Horowitz

July 28, 2009 at 7:33 pm

There is still another possibility: inclusion of other programs as part of a software “suite”. Anti-malware software may be bundled with other things such as a firewall, anti-spam software, etc. The vendor may have one large package and depending on which parts you buy only those parts are activated.

I saw this with the free ZoneAlarm firewall, which used to be very small. But, at one point the size of the downloaded EXE grew enormously. I can only surmise that it includes anti-malware software too, even though that code is disabled in the free version of ZoneAlarm.
Catmoves

July 29, 2009 at 7:22 am

Yes, Steve Gibson writes in assembly. And so do his staff. And their programs are small, and fast, and great. No bloat.
My feeling is that many software companies become so enamored of the “sell, sell , sell” attitude that they constantly add bells and whistles to their software until it becomes unwieldy and unwanted. Especially when there are free versions of the same type available, that don’t need to bloat their product. Sometmes they decide to follow me around the internet. They don’t even have the courtesy to ask my permission. But I guess thats why I have a recycle bin.
Bloat is why I no longer use Symantec. In fact, it is why I no longer use many programs from old established companies. Tough luck companies, but it is my dough, you know.
Chris

August 6, 2009 at 1:41 pm

As a 67 yr old pc user from about 1986 this article explains things I have been wondering about for years. Thanks Leo – you deserve a latte (again)
Chris
ps I have no idea what a coffee costs in the US – with the exchange rates as they are I will have to guess
Jim de Graff

April 9, 2010 at 8:42 pm

Program design can greatly affect performance. My first major progrm for a company I worked at for 29 years was written in PL/1. It ran on an IBM mainframe and was a complete rewrite of a program originally created by my predecessor to verify database definition files for a hierarchichal database. His version used many temporary disk files and cost over $20 per run. My version used linked lists instead of temporary files. The cost per run of my version was 18 cents. Outwardly there was little difference between the two programs.

How Can Two Programs that Are Nearly Identical Be so Different in Size and Speed?

Do this

6 comments on “How Can Two Programs that Are Nearly Identical Be so Different in Size and Speed?”

Leave a reply: