Helping people with computers... one answer at a time.

Software design is as much an art as a science. As a result of various decisions and requirements similarly tasked software can often vary greatly.

Why does one stand alone anti-spyware program (just as an example) take up 10MB on my machine but a similar anti-spyware program takes up 150MB? They both have a GUI; have real time scans; scan for spyware, adware, trojans, keyloggers, rootkits; automatic signature definition updates; allow the user to configure what gets scanned, when it gets scanned, etc. etc. And they're both free.

Other than one program having a fancier dashboard, they seem to be almost identical in functionality and they've been rated just about equal by various reviewers. Does the larger program hog more CPU resources simply because it is larger?

I'm sure it's kinda strange to the non technical person, but what you describe is very, very common. Programs that do similar things are often dramatically different in both size, and speed.

The answer's actually fairly complex, since there are many things that factor in - every thing from a variety of choices made by the software's designer, to the age of the product.

The most obvious difference that comes to mind is the choice of programming language in which the software is written. Different languages are more or less efficient in converting human readable instructions into the sequence of actual machine instructions that the CPU executes. Sometimes those differences can be quite large.

"The most obvious difference that comes to mind is the choice of programming language ..."

Now, of course, one might ask "well, then why don't they all choose the one that generates the smallest/fastest program?" - and of course it's not that simple. The trade-off is typically development time. Different programming languages typically require different amounts of work to write the same program. At one extreme, one could write a program in assembly language, and it could be extremely small and fast - and would probably take twice as long, if not longer, to write than if the program were written in a higher level programming language like C++, Java or others. In today's business climate, when you can deliver is often as important as what you can deliver, so the amount of time it takes just to write the software is a serious consideration.

Choice of programming language goes well beyond simple development speed. Things like staff training, appropriateness to the task, availability of development and diagnostic tools for that language can all play a part. Even personal taste - since many consider programming as much an art as a science - can play an important role in this fundamental choice.

Which leads to another difference: software design. For any problem that can be solved with software, there are as many ways to write that solution as there are programmers who would write it. Many solutions are simple, fast and elegant - as I said, almost an art. But there are two important things to note: not all programmers are alike, and not all solutions are the obvious choice.

There's a rule of thumb among software development that the best engineers are generally 10 times better than the average. And the worst are 10 times worst than the average. That's quite a wide spectrum of ability within the programming community. Now "best" and "worst" are fuzzy terms, but a good way to illustrate the difference might be to put it this way: what a great programmer might be able to write in a single "line" of programming language, a less talented individual might solve with significantly more code. And of course more code means a bigger program.

It's also not always the case that a single line of code is the right choice, depending on the overall design goals of the software. Two different solutions, perhaps very different in size, may solve the same problem but do so in ways that expose other significant differences. For example, the smaller solution might be exceptionally difficult to understand and very fragile, while the larger could be very simple to understand and difficult to break. A design decision might be made to make a trade-off of stability over size. The result might be that the larger solution would be the one implemented.

Another difference that's often overlooked is something called runtime libraries or runtime support.

Programmers rarely write everything from scratch. They assume and use existing libraries of software that already exist. As a simple example a programmer should never have to write software to concatenate (join) to strings of text (say "ABC" and "XYZ") together into a single string ("ABCXYZ"). There are library routines that exist that a programmer can use to do exactly that without having to write (and test and debug) that code him or herself.

In an ideal world, such a library of existing software would be very granular, picking up only those routines actually needed by the program being written. Alas, this is not an ideal world, and one of the differences between programming languages and systems used is, in fact, the size of the runtime library that they drag along. In one language, using a string concatenation function might cause exactly and only that function to be included. In another language, it might cause that, and several other string functions to also be included in your program, whether or not you actually use them.

At a slightly higher level this is actually why things like the .NET framework exist. The .NET framework includes many higher level functions that can be used to make accessing Windows and other services easier for programmers. By programming to it, applications can assume that everything in the .NET framework is already on the machine - which means that they don't have to write, or include, any of that functionality in their software. The result is that their programs appear smaller. (.NET is just an example, there are actually many approaches, both similar and different, from the Visual Basic runtime to the Java virtual machine.)

A final aspect I want to address is software age. Left to itself, software only gets larger. By that I mean that as software progresses from version to version and reacts to the changing landscape of both operating system changes and changing user requirements, it only grows in size.

The term "ball of mud" has been used to describe software growth. Each new feature, each new request, each new demand on the software just causes a little bit more mud to get slapped on the ball until the ball becomes enormous (and occasionally collapses under its own weight).

There are a couple of ways this can happen: new features of course require additional code. That much is fairly obvious; more work means more instructions to the computer on how to perform that work. Changed features, sometimes even removal of features, can also often cause the program size to expand in unexpected ways. As one simple example, it's often more expedient (and safer) to disable a feature rather than remove it. The code to implement the feature remains, but is never accessed because code is added to prevent that access from occurring. Particularly in already large and complex programs this, and similar side effects of code changes, often cause growth.

In fact, often the only way to make an elderly program smaller is to chuck it and re-write it from scratch. This is typically a very expensive proposition, but depending on the age, stability and ease of maintenance of the older software it can be a very legitimate and lucrative choice. When starting over every design decision and choice, starting from the programming language to be used on up, is open for review.

Finally, size is not an indicator of efficiency or expected CPU usage. Size and speed are, essentially, two independent things. While software designers strive for small and fast, in fact the complexities of software design often include many tradeoffs that may, or may not, relate the two. Small programs can be fast or slow. Large programs can be fast or slow.

Article C3820 - July 27, 2009 « »

Share this article with your friends:

Share this article on Facebook Tweet this article Email a link to this article
Leo Leo A. Notenboom has been playing with computers since he was required to take a programming class in 1976. An 18 year career as a programmer at Microsoft soon followed. After "retiring" in 2001, Leo started Ask Leo! in 2003 as a place for answers to common computer and technical questions. More about Leo.

Not what you needed?

6 Comments
John Williams
July 28, 2009 8:41 AM

Good article and something I often wondered. I read on www.grc.com that steve Gibson always writes in assembly language

Hugh
July 28, 2009 8:42 AM

compiling can make a huge difference in the end product... ? i.e. GCC -G linux of course but there is a cygwin equivalent.

Michael Horowitz
July 28, 2009 7:33 PM

There is still another possibility: inclusion of other programs as part of a software "suite". Anti-malware software may be bundled with other things such as a firewall, anti-spam software, etc. The vendor may have one large package and depending on which parts you buy only those parts are activated.

I saw this with the free ZoneAlarm firewall, which used to be very small. But, at one point the size of the downloaded EXE grew enormously. I can only surmise that it includes anti-malware software too, even though that code is disabled in the free version of ZoneAlarm.

Catmoves
July 29, 2009 7:22 AM

Yes, Steve Gibson writes in assembly. And so do his staff. And their programs are small, and fast, and great. No bloat.
My feeling is that many software companies become so enamored of the "sell, sell , sell" attitude that they constantly add bells and whistles to their software until it becomes unwieldy and unwanted. Especially when there are free versions of the same type available, that don't need to bloat their product. Sometmes they decide to follow me around the internet. They don't even have the courtesy to ask my permission. But I guess thats why I have a recycle bin.
Bloat is why I no longer use Symantec. In fact, it is why I no longer use many programs from old established companies. Tough luck companies, but it is my dough, you know.

Chris
August 6, 2009 1:41 PM

As a 67 yr old pc user from about 1986 this article explains things I have been wondering about for years. Thanks Leo - you deserve a latte (again)
Chris
ps I have no idea what a coffee costs in the US - with the exchange rates as they are I will have to guess

Jim de Graff
April 9, 2010 8:42 PM

Program design can greatly affect performance. My first major progrm for a company I worked at for 29 years was written in PL/1. It ran on an IBM mainframe and was a complete rewrite of a program originally created by my predecessor to verify database definition files for a hierarchichal database. His version used many temporary disk files and cost over $20 per run. My version used linked lists instead of temporary files. The cost per run of my version was 18 cents. Outwardly there was little difference between the two programs.

Comments on this entry are closed.

If you have a question, start by using the search box up at the top of the page - there's a very good chance that your question has already been answered on Ask Leo!.

If you don't find your answer, head out to http://askleo.com/ask to ask your question.