Helping people with computers... one answer at a time.

It's possible that a PDF created from a document may be larger, perhaps much larger, than the original. I'll look at a few reasons why this might be.

I publish a newsletter for two Ham Radio Clubs; I use Open Office Writer to format and save eventually to PDF to send out. I noticed that the file before converting (ODT to PDF) is quite a bit smaller than the PDF file is. Especially when I have a page of photos. What is added to my newsletters to make the file swell up a bit? I use Adobe for the PDF.

PDF (Portable Document Format) files are a common and popular way to distribute documents. Their primary "feature" is simply that they look pretty much the same on just about any computer.

And of course a PDF file typically mimics the layout and feel of an actually printed document, only in electronically displayed form.

Why might it be larger than the original word processing or other original document? I can think of a few possibilities.

Compress me Once...

Adobe has compression options that control how aggressively - or not - it compresses images in PDF documents it creates.

That actually makes sense since an uncompressed image can be large, and a good compression algorithm can reduce the size required to represent the image significantly, even more significantly if you're willing to trade off some of the image quality.

"... small size isn't really a primary goal for PDF ..."

A potential problem, however, is that attempting to compress something that's already efficiently compressed can make it larger.

It's possible that if a document contains a large number of images, perhaps ".jpg" formatted photos which are by definition already compressed, the process of creating the PDF might actually cause those photographs to become somewhat larger. From what you say, that might well be the issue that you're facing.

Recommendation: check and experiment with the compression settings of your PDF creation utility.

Fonts: Here, but not There

Fonts and typefaces can be fairly confusing. We're all familiar with nearly ubiquitous fonts like Times New Roman, Arial, and even (dare I say it?) Comic Sans.

But what happens if you use a font in your document that most people don't have? When you print it out on paper it looks great, because that all happens on your computer where the font is present. On someone else's machine, things might look quite different if that font's not present. Use an obscure font and take your original document to a machine where that font isn't present, and you'll see what I mean - it'll look different.

PDF attempts to solve this problem by including fonts within the document. My belief is that it embeds only non-standard fonts - those which can't be assumed to be on most machines - however the rules may be more complex than that.

As a test, I created a small Microsoft Word document consisting of two sentences, 25 words total, all in the default font Times New Roman. Changing one word in the document to the font "Algerian" took the generated PDF from around 2,000 bytes to over 10,000.

Recommendation: examine your font usage, and see if you can reduce the number of non-standard fonts in your document.

Size Doesn't Matter (or So They Say)

PDF is relatively efficient, but creating a small file actually isn't its primary goal. That, as its name implies, is to be a Portable Document - one that looks pretty much the same everywhere, and one that can be viewed on a wide variety of machines. If achieving that goal means the file gets bigger, then so be it.

One of the apparent design decisions in the format is that a lot of information in the document is stored as "plain text", which presumably is easier for that "wide variety of machines" to understand.

If you ever open a .pdf file in notepad, or just "Type" it at the Windows Command Prompt you'll see a lot of plain text - text you can read and make some sense of (even if what it's saying is obscure).

Now, plain text isn't the most efficient way to store information from a space perspective. If you want proof, go grab a large plain text document and zip it. I'll use the Project Gutenberg copy of Tolstoy's War and Peace as an example. The plain text version of this book, known for its length, weighs in at a little over 3 megabytes. Zipping it using 7-Zip the result is less than 1/3rd the size of the original. That smaller version contains the exact same information, albeit in an unreadable form. All you need do is decompress it to recover the exact original copy.

Recommendation: try zipping your PDF. Yes, you might be re-compressing compressed or even doubly-compressed pictures, per the earlier point, but it's worth experimenting with. In a text-heavy document zipping the file for distribution might make a fair amount of sense.

There's Probably More

I've probably just scratched the surface of reasons that a PDF file might end up being larger than its original. The big take away from my perspective is that small size isn't really a primary goal for PDF and as a result some kinds of things it needs to do might well end up increasing the size of the result.

And zipping the file is always a quick and easy thing to try, often with good results.

Article C4465 - September 26, 2010 « »

Share this article with your friends:

Share this article on Facebook Tweet this article Email a link to this article
Leo Leo A. Notenboom has been playing with computers since he was required to take a programming class in 1976. An 18 year career as a programmer at Microsoft soon followed. After "retiring" in 2001, Leo started Ask Leo! in 2003 as a place for answers to common computer and technical questions. More about Leo.

Not what you needed?

Recent Comments
11 Comments
Ron
September 28, 2010 12:01 PM

Open Office has functions to control PDF size. In the PDF Options dialog (displays when choosing Export to PDF), the file size will be largest when you choose Lossless in the General\Images tab. Choosing lower than 100% JPEG compression greatly reduces the file size. If you don't expect the PDF to be printed, and the resolution of your images is larger than 72dpi, choosing Reduce image resolution and using the 75dpi setting will greatly reduce file size while. The images will be the resolution appropriate for screen display.

Phillip
September 28, 2010 12:34 PM

The size and compression of the picture may be quite irrelevant. Many programs do not save the image in the document - merely a link to the image file which is read to be displayed. My DTP PagePlus allows me to choose embedding the image or saving it as a link.

The PDF, however, has to be entirely self-contained - so as discussed elsewhere, it has to contain all images and unusual fonts - and as a result can end up much larger than the sosurce document.

And my printer-based PDF generator does not convert everything to image form - plain text remains as text, but text with effects can be converted to images.

David
September 28, 2010 4:52 PM

Slightly off topic but touched on here - as a 99%-of-the-time rule, never use more than two font families. If you go crazy with fancy fonts or non-common fonts, most competent programs will embed the files for the fonts in the document. Apart from that, more than two fonts is visually unpleasant and gives a 'ransom note' appearance. The old saying "give a person fonts and they'll use them" is meant to be a basic typesetting guide.

Peter
September 28, 2010 9:26 PM

I have found that, when scanning documents to PDF the size is extremely large (could be function of printe/scanner software. I have discovered a way to reduce it size drastically by loading a PFD converter into the printer options.I then select to print to PDF file and it saves a much smaller version of the file.

James M
September 29, 2010 1:14 AM

I doubt that image files (JPEG, etc., are the problem. I often convert Excel files etc., with embedded images to PDF format, and the files are almost always drastically reduced in size: sometimes to 1/10 of the original size! I suspect that strange fonts may be a more likely explanation.

The few times that PDF conversion has not reduced the size of my files has been when there was something in the file I didn't realize.

Comments on this entry are closed.

If you have a question, start by using the search box up at the top of the page - there's a very good chance that your question has already been answered on Ask Leo!.

If you don't find your answer, head out to http://askleo.com/ask to ask your question.