Helping people with computers... one answer at a time.

Scanners are handy ways to get information from paper into your computer. However spreadsheets are highly structured which makes things very difficult.

Can you recommend a good software program for scanning (I have an Epson 1650) and converting documents to Excel 2007? My Epson software will not do this.

No, I can't. And it's not because I don't want to. What you're asking for is very very difficult to do well, if at all. It all has to do with exactly what it means to scan something, and what it means to be an Excel spreadsheet.

The most important thing to realize is that a scan of a document is almost exactly the same as taking a picture of the document. So what you end up with is the equivalent of a ".jpg" picture of the document - even if the scanning software doesn't actually save it before performing other tasks.

A representation of a 'picture' of the letter 'A'
A representation of a 'picture' of a light green letter 'A'

A digital picture of a letter is just a collection of individual pixels that are turned on in the appropriate color in the pattern of the original letter.

The second most important thing to realize is that applications such as Word or Excel don't operate on pictures. They operate on documents in specific formats and with data arranged in specific ways. So rather than operating on a picture of the letters ABCD, Word operates on an internal representation of the letter A, the letter B and so on. This way Word, and applications like it, make it easy to edit - say replacing or deleting one of the letters, and reformatting the entire document if needed to reflect the effects of that change.

In Word and Excel and countless other applications, the letter "A" is represented by the number 65.

So the first problem is converting those pictures of letters into the computer's internal representation of the equivalent letter.

Software to solve this problem exists, and is called "optical character recognition" or OCR software. These programs, when given a picture of a page with text, will analyze the image and produce a document with the program's best guess as to what those letters are and how they were organized. Many scanners nowadays actually come with basic OCR software included.

"... understanding just how that data should be placed into which cells on a spreadsheet is beyond the ability of most OCR software."

The problem with OCR is that it's never 100% accurate. In fact, it's often far less than that meaning that any OCR'd document will still required a fair amount of cleanup once it's been converted. OCR also has a particularly difficult time with formatting, which can confuse it, or can just be lost.

And, of course, the quality of OCR is also dependant on the quality of the original document scan.

But, after all's said and done, OCR is often a good first step to converting a scanned document back into some kind of editable form.

But what about spreadsheets like Excel?

The problem with spreadsheets is that they contain more than just text. The rows and columns represent structure. OCR software is typically not able to determine the structure of a document from just a scanned image. Some do try, as in the case of trying to determine paragraphs in a more traditional text document.

Unfortunately, scanning and converting the letters and numbers is one thing, but understanding just how that data should be placed into which cells on a spreadsheet is beyond the ability of most OCR software. In fact, I'm not currently aware of any that can do so. (Check the comments to this article, as I'm sure readers will add suggestions.)

Even as OCR technology continues to improve, I'd fully expect there to be spreadsheet based documents that even the best software would never be able to "understand".

My recommendation today is to OCR the document into a the software's best guess at a regular word processing document, and then spend the time to create your new spreadsheet by hand, copy/pasting the data in the ways that you know the spreadsheet should be organized. It is work, but you'll get the spreadsheet you expect.

Article C2966 - March 19, 2007

Leo Leo A. Notenboom has been playing with computers since he was required to take a programming class in 1976. An 18 year career as a programmer at Microsoft soon followed. After "retiring" in 2001, Leo started Ask Leo! in 2003 as a place for answers to common computer and technical questions. More about Leo.

Not what you needed?

Recent Comments
11 Comments

how do i scan a document into my computer?

Posted by: Alva Lichtenstein at January 5, 2010 12:51 PM

Abbyy Scan to Office is a great little program but you are not able to buy it anywhere. Abbyy in their wisdom have discontinued a perfectly usable program for everyday people and have replaced it with another program that does not have a trial period and cost $50 more..

Posted by: TONI KLEMKO at March 31, 2010 12:03 PM

OCR softwares are everywhere nowadays. I prefer online ones they don't need installation and most of them are free, like this one: Free OCR.

Posted by: wonderhowto at August 22, 2010 7:58 PM

i have tiff image and fonts are in French script Mt and i tried so many software. can u help me.

Posted by: Nayandeep at January 21, 2011 6:44 AM

i have tiff image and fonts are in French script Mt and i tried so many software. can u help me.

Posted by: Nayandeep at April 11, 2011 2:55 AM
Post a comment on "How do I scan a document into Excel?":





Remember Me?

(You may use HTML tags for style)

Before commenting, please...

  • READ THE ARTICLE. A comment that shows you didn't will be deleted and ignored.

  • Comment only on the article. Use the search box at the top of the page if you have a question about something else.

  • NO PERSONAL INFORMATION in the comment. No email addresses. No phone numbers. No physical addresses.

  • Anything that looks the least bit like spam will be deleted. Links to unrelated sites or links that appear to be primarily promotional will be deleted, or the comment will be deleted.

  • Don't ask me to recover lost passwords or hacked accounts. I can't. Those comments will be deleted.

  • I can't respond to every comment. And I can't vouch for the accuracy of others who do.

Please wait. Your comment is being processed ...