Helping people with computers... one answer at a time.

It's possible that a PDF created from a document may be larger, perhaps much larger, than the original. I'll look at a few reasons why this might be.

I publish a newsletter for two Ham Radio Clubs; I use Open Office Writer to format and save eventually to PDF to send out. I noticed that the file before converting (ODT to PDF) is quite a bit smaller than the PDF file is. Especially when I have a page of photos. What is added to my newsletters to make the file swell up a bit? I use Adobe for the PDF.

PDF (Portable Document Format) files are a common and popular way to distribute documents. Their primary "feature" is simply that they look pretty much the same on just about any computer.

And of course a PDF file typically mimics the layout and feel of an actually printed document, only in electronically displayed form.

Why might it be larger than the original word processing or other original document? I can think of a few possibilities.

Compress me Once...

Adobe has compression options that control how aggressively - or not - it compresses images in PDF documents it creates.

That actually makes sense since an uncompressed image can be large, and a good compression algorithm can reduce the size required to represent the image significantly, even more significantly if you're willing to trade off some of the image quality.

"... small size isn't really a primary goal for PDF ..."

A potential problem, however, is that attempting to compress something that's already efficiently compressed can make it larger.

It's possible that if a document contains a large number of images, perhaps ".jpg" formatted photos which are by definition already compressed, the process of creating the PDF might actually cause those photographs to become somewhat larger. From what you say, that might well be the issue that you're facing.

Recommendation: check and experiment with the compression settings of your PDF creation utility.

Fonts: Here, but not There

Fonts and typefaces can be fairly confusing. We're all familiar with nearly ubiquitous fonts like Times New Roman, Arial, and even (dare I say it?) Comic Sans.

But what happens if you use a font in your document that most people don't have? When you print it out on paper it looks great, because that all happens on your computer where the font is present. On someone else's machine, things might look quite different if that font's not present. Use an obscure font and take your original document to a machine where that font isn't present, and you'll see what I mean - it'll look different.

PDF attempts to solve this problem by including fonts within the document. My belief is that it embeds only non-standard fonts - those which can't be assumed to be on most machines - however the rules may be more complex than that.

As a test, I created a small Microsoft Word document consisting of two sentences, 25 words total, all in the default font Times New Roman. Changing one word in the document to the font "Algerian" took the generated PDF from around 2,000 bytes to over 10,000.

Recommendation: examine your font usage, and see if you can reduce the number of non-standard fonts in your document.

Size Doesn't Matter (or So They Say)

PDF is relatively efficient, but creating a small file actually isn't its primary goal. That, as its name implies, is to be a Portable Document - one that looks pretty much the same everywhere, and one that can be viewed on a wide variety of machines. If achieving that goal means the file gets bigger, then so be it.

One of the apparent design decisions in the format is that a lot of information in the document is stored as "plain text", which presumably is easier for that "wide variety of machines" to understand.

If you ever open a .pdf file in notepad, or just "Type" it at the Windows Command Prompt you'll see a lot of plain text - text you can read and make some sense of (even if what it's saying is obscure).

Now, plain text isn't the most efficient way to store information from a space perspective. If you want proof, go grab a large plain text document and zip it. I'll use the Project Gutenberg copy of Tolstoy's War and Peace as an example. The plain text version of this book, known for its length, weighs in at a little over 3 megabytes. Zipping it using 7-Zip the result is less than 1/3rd the size of the original. That smaller version contains the exact same information, albeit in an unreadable form. All you need do is decompress it to recover the exact original copy.

Recommendation: try zipping your PDF. Yes, you might be re-compressing compressed or even doubly-compressed pictures, per the earlier point, but it's worth experimenting with. In a text-heavy document zipping the file for distribution might make a fair amount of sense.

There's Probably More

I've probably just scratched the surface of reasons that a PDF file might end up being larger than its original. The big take away from my perspective is that small size isn't really a primary goal for PDF and as a result some kinds of things it needs to do might well end up increasing the size of the result.

And zipping the file is always a quick and easy thing to try, often with good results.

Article C4465 - September 26, 2010 « »

Share this article with your friends:

Share this article on Facebook Tweet this article Email a link to this article
Leo Leo A. Notenboom has been playing with computers since he was required to take a programming class in 1976. An 18 year career as a programmer at Microsoft soon followed. After "retiring" in 2001, Leo started Ask Leo! in 2003 as a place for answers to common computer and technical questions. More about Leo.

Not what you needed?

11 Comments
Bob
September 27, 2010 1:16 AM

Another method with pictures, is to edit to the size you need it, before including it in the document.
I have found that including large pictures 'scaled down' in a document before turning it into a PDF stores the unmodified image.

Steve
September 28, 2010 9:30 AM

Most PDF creation programs have check-box options to embed entire fonts in the final PDF document, embed only those font characters that are actually used in the document or to embed only those fonts which are not generally found on most systems.

Gabe
September 28, 2010 9:51 AM

Adobe Acrobat has a "PDF Optimizer" utility that helps you fine-tune all the areas that take up space and it includes an auditor that shows a percentage breakdown of what is taking up the most space. I assume other programs have something similar, I just didn't see Leo mention anything about them.

Good point though, that "Portable" doesn't mean smaller. I'll be using that on my boss the next time he forgets why his PDF's "seem so big".

Jeremy Smee
September 28, 2010 9:52 AM

Assuming you created the original document in Word, did you compress ALL the pictures and other jpg inserts in the Word doc? Very easy to do and reduces file size considerably while still making it quite suitable for printing. I also prepare a quarterly newsletter and have quite a few pics. PDF file is VERY large unless I compress the pics in the Word doc.

Bob
September 28, 2010 11:10 AM

This doesn't really apply to this user but if you use a printer based PDF converter to make your PDF's your entire page will be a picture and there will be no text at all in the document. This will of course make the PDF much larger and if it's going on a web site, slower to download and cannot be indexed by search engines.

That may be true for some of the print-to-pdf utilities, but not all. PDFCreator, which I use, does not.
Leo
29-Sep-2010

Jim McMillen
September 28, 2010 11:51 AM

PDF files for short documents may be larger, but book-length PDF files can be much smaller than their word processing equivalent. Using WordPerfect X4, which has its own PDF file converter, has, for example, converted three WordPerfect files to PDF in the following way: (1) 17.2MB WP to 6.8MB PDF, (2) 12.4MB to 9.3MB, and (3) 29.3MB to 7.0MB. The difference in conversion ratios is largely due to the number and size of illustrations, the complexity of the formatting, and the number of fonts used, all of which affected the book (2) conversion ratio. All conversions are with the X4 default settings and the ratios could probably be improved by tweaking the settings.

These are genealogy books with the usual book front matter; headers; foot- or endnotes; everyname indexes; maps, photos, and genealogy charts; etc. The PDF files were sent to lulu.com for printing and turned out well.

Ron
September 28, 2010 12:01 PM

Open Office has functions to control PDF size. In the PDF Options dialog (displays when choosing Export to PDF), the file size will be largest when you choose Lossless in the General\Images tab. Choosing lower than 100% JPEG compression greatly reduces the file size. If you don't expect the PDF to be printed, and the resolution of your images is larger than 72dpi, choosing Reduce image resolution and using the 75dpi setting will greatly reduce file size while. The images will be the resolution appropriate for screen display.

Phillip
September 28, 2010 12:34 PM

The size and compression of the picture may be quite irrelevant. Many programs do not save the image in the document - merely a link to the image file which is read to be displayed. My DTP PagePlus allows me to choose embedding the image or saving it as a link.

The PDF, however, has to be entirely self-contained - so as discussed elsewhere, it has to contain all images and unusual fonts - and as a result can end up much larger than the sosurce document.

And my printer-based PDF generator does not convert everything to image form - plain text remains as text, but text with effects can be converted to images.

David
September 28, 2010 4:52 PM

Slightly off topic but touched on here - as a 99%-of-the-time rule, never use more than two font families. If you go crazy with fancy fonts or non-common fonts, most competent programs will embed the files for the fonts in the document. Apart from that, more than two fonts is visually unpleasant and gives a 'ransom note' appearance. The old saying "give a person fonts and they'll use them" is meant to be a basic typesetting guide.

Peter
September 28, 2010 9:26 PM

I have found that, when scanning documents to PDF the size is extremely large (could be function of printe/scanner software. I have discovered a way to reduce it size drastically by loading a PFD converter into the printer options.I then select to print to PDF file and it saves a much smaller version of the file.

James M
September 29, 2010 1:14 AM

I doubt that image files (JPEG, etc., are the problem. I often convert Excel files etc., with embedded images to PDF format, and the files are almost always drastically reduced in size: sometimes to 1/10 of the original size! I suspect that strange fonts may be a more likely explanation.

The few times that PDF conversion has not reduced the size of my files has been when there was something in the file I didn't realize.

Comments on this entry are closed.

If you have a question, start by using the search box up at the top of the page - there's a very good chance that your question has already been answered on Ask Leo!.

If you don't find your answer, head out to http://askleo.com/ask to ask your question.