Helping people with computers... one answer at a time.

Email attachments are always larger than the original file being attached. I'll look at why, how it works, and why it matters.

Why are email attachments in general much larger than the actual attachment sent? Is something added to them by the email client?

I wouldn't say that anything is "added" to the attachment other than perhaps some administrative data like its name - the attachment is still just the attachment.

However, something is done to the attachment that will most definitely make it larger.

And it all has to do with the fact that the technology behind email is, basically, older than dirt. (In internet terms, of course.)

First we need to explain how data is represented, and the difference between "text" and "binary" data.

Data that is text (and I'm restricting myself to "plain old ascii text" here) - the traditional letters, numbers and a specific set of symbols - are represented by numbers from 0 to 127. (I won't get into the actual 1's and 0's representation, but the reason it's 127, is because that is 2 to the 7th power - 128 - minus one. A lot of the magic numbers you'll see related to computers are related to powers of two.)

"A limit on the size of the emails is not same as the size of attachment you can send."

Binary data, on the other hand, is represented by numbers from 0-255. You might recall that a "byte" is also a number from 0-255, and that all the data stored in your computer is stored this way. We measure file sizes, memory sizes, disk sizes all in bytes - be it megabytes, gigabytes, or any of several other shorthand's for large quantities of bytes.

Text data, when stored on your computer, is still stored in bytes which could hold values greater than 127, but the characteristic of plain old text data is that all the actual values will be less than 128.

Why does this matter?

Email is primarily (or at least originally) a text-only media.

And yet, attachments, by definition, are binary. An attachment can be anything: text, a program, a video, a music file - these are all by definition binary data.

So the question becomes: how do you send data that can have values between 0 and 255 through a medium that itself can only handle values from 0 to 127?

Answer: you encode it. You come up with a way to represent the binary data as text.

If you've ever seen something like this:

iVBORw0KGgoAAAANSUhEUgAAAQQAAABOCAIAAABJ3v/jAAAACXBIWXMAABJ0AAASdAHeZh94
AAAKT2lDQ1BQaG90b3Nob3AgSUNDIHByb2ZpbGUAAHjanVNnVFPpFj333vRCS4iAlEtvUhUI
IFJCi4AUkSYqIQkQSoghodkVUcERRUUEG8igiAOOjoCMFVEsDIoK2AfkIaKOg6OIisr74Xuj
...

That's binary data, encoded as text. That particular example shows the first three of 390 lines that are a "base64" encoding of my logo from the top of this page - a binary file. "Base64" is one of several possible encoding mechanisms.

And now to your point: that image (a PNG file) is 20,985 bytes of binary data. The base64 encoded text version? 28,760 bytes - a 7,775 byte or 37% increase in size.

Representing binary data as text makes it bigger. The actual growth depends on several factors, but is mostly related to the encoding scheme used.

Why do we have to do this?

Remember when I said that email technology is "older than dirt"? It's actually one of the oldest technologies on the internet. And originally binary data could not be transmitted via email at all. Email was restricted to text data with values less than 128. When people decided that sending binary files around was a good idea, they had to come up with this approach of encoding that data as text to allow it to go through.

Today there are ways that many forms of binary data could be transmitted directly - in particular Asian and other character sets rely on it. But the basic problem remains: while most email programs and email servers and other email related software might work with it quite well - we can't guarantee that all will. Hence email is often transmitted using the "lowest common denominator" - the basic encoding that by definition all email programs must support.

Why Care?

That's pretty simple, actually. A limit on the size of an email doesn't imply you can send attachments that approach that size. Because of the expansion caused by text encoding the size of attachments you can include is typically much smaller.

Let's say your ISP imposes a 10 megabyte limit per message. If, as in my example, your email program uses an encoding method that increases the size of your attachments by 37%, then the largest file you could attach and send successfully would only be at most around 6.5 megabytes.

If your email's bouncing because of size limitations, that's important to know.

Article C4215 - March 12, 2010

Leo Leo A. Notenboom has been playing with computers since he was required to take a programming class in 1976. An 18 year career as a programmer at Microsoft soon followed. After "retiring" in 2001, Leo started Ask Leo! in 2003 as a place for answers to common computer and technical questions. More about Leo.

Not what you needed?

Recent Comments
5 Comments

Hi Leo - Clarity must be your middle name as you always make your points in a way most people can understand. Nice work!

As to mapping an 8-bit world on a 7-bit system, the same problem arises when mapping a decimal world on a binary world: the fit is not exact. One approach in the old days was to approximate and on the early IBM PC's, this could lead to compounding errors - for instance I took the number 3 and squared it on a loop many times (20?)
then took the square root of the result the same number of times and did NOT get back to 3!

Another approach, much preferred, was to represent the number in binary-equivalent to, say, 15 decimal points as was done on the WANG T-CPU in the late '70's (I cut my teeth on this Z-80 machine) but this wasted precious memory address space in the interests of precision. On the WANG system, it was possible to take the number 3 and square it any number of times (within reason, I guess) then reverse the process and, voila, back to 3 exactly. Neat.

Me, I'll take precision...

Posted by: Chucko at March 16, 2010 10:22 AM

Finally! I've wondered about this for years.

Posted by: Naomi at March 30, 2010 12:29 PM

I was out sick for a week and when I logged back into my email, the file sizes of some of the attachments increased by up to 10 times. i.e. what once was a 25kb .pdf is now 10mb large. Not sure why. Help?

Posted by: CANYMO at May 10, 2010 9:10 AM

question - when archiving E-mails to a pst file - the original byte size of the email increases insize..ie if its 30 k in the inbox it now shows its 35 K insize inthe archive pst. if I grab another E-nail it starts to increase in size by 10 K if I grab another it increases in size by 20 K or 32K or even 45K for another set - so if it was originally 45K in size it now is 90 K in size in the pst file...so incrementally the padding start small but then increases to the point its double the original size of the E-mail...what is causing the E-mails to be padded to the point where one tries to open a archived E-mail you get a out of virtual memory error opening the file..... would greatlly greatlly appreciate some help...

Posted by: richard vanderwal at April 12, 2011 3:06 PM

Hi,
I just wanted to know, is there any way I can get the size of message body or the size of encoding part.

Posted by: Vivek saurabh at August 4, 2011 12:21 AM
Post a comment on "Why are emailed attachments larger than the original file?":





Remember Me?

(You may use HTML tags for style)

Before commenting, please...

  • READ THE ARTICLE. A comment that shows you didn't will be deleted and ignored.

  • Comment only on the article. Use the search box at the top of the page if you have a question about something else.

  • NO PERSONAL INFORMATION in the comment. No email addresses. No phone numbers. No physical addresses.

  • Anything that looks the least bit like spam will be deleted. Links to unrelated sites or links that appear to be primarily promotional will be deleted, or the comment will be deleted.

  • Don't ask me to recover lost passwords or hacked accounts. I can't. Those comments will be deleted.

  • I can't respond to every comment. And I can't vouch for the accuracy of others who do.

Please wait. Your comment is being processed ...