Technology in terms you understand. Sign up for the Confident Computing newsletter for weekly solutions to make your life easier. Click here and get The Ask Leo! Guide to Staying Safe on the Internet — FREE Edition as my thank you for subscribing!

How do I scan a document into Excel?

Question:

Can you recommend a good software program for scanning (I have an Epson
1650) and converting documents to Excel 2007? My Epson software will not do
this.

No, I can’t. And it’s not because I don’t want to. What you’re asking for is
very very difficult to do well, if at all. It all has to do with
exactly what it means to scan something, and what it means to be an Excel
spreadsheet.

Become a Patron of Ask Leo! and go ad-free!

The most important thing to realize is that a scan of a document is almost
exactly the same as taking a picture of the document. So what you end up with
is the equivalent of a “.jpg” picture of the document – even if the scanning
software doesn’t actually save it before performing other tasks.

A representation of a 'picture' of the letter 'A'
A representation of a ‘picture’ of a light green letter ‘A’

A digital picture of a letter is just a collection of individual pixels that
are turned on in the appropriate color in the pattern of the original
letter.

The second most important thing to realize is that applications such as Word
or Excel don’t operate on pictures. They operate on documents in specific
formats and with data arranged in specific ways. So rather than operating on a
picture of the letters ABCD, Word operates on an internal representation of the
letter A, the letter B and so on. This way Word, and applications like it, make
it easy to edit – say replacing or deleting one of the letters, and
reformatting the entire document if needed to reflect the effects of that
change.

In Word and Excel and countless other applications, the letter “A” is
represented by the number 65.

So the first problem is converting those pictures of letters into the
computer’s internal representation of the equivalent letter.

Software to solve this problem exists, and is called “optical character
recognition” or OCR software. These programs, when given a picture of a page
with text, will analyze the image and produce a document with the program’s
best guess as to what those letters are and how they were organized. Many
scanners nowadays actually come with basic OCR software included.

“… understanding just how that data should be placed
into which cells on a spreadsheet is beyond the ability of most OCR
software.”

The problem with OCR is that it’s never 100% accurate. In fact, it’s often
far less than that meaning that any OCR’d document will still required a fair
amount of cleanup once it’s been converted. OCR also has a particularly
difficult time with formatting, which can confuse it, or can just be lost.

And, of course, the quality of OCR is also dependent on the quality of the
original document scan.

But, after all’s said and done, OCR is often a good first step to converting
a scanned document back into some kind of editable form.

But what about spreadsheets like Excel?

The problem with spreadsheets is that they contain more than just text. The
rows and columns represent structure. OCR software is typically not
able to determine the structure of a document from just a scanned image. Some
do try, as in the case of trying to determine paragraphs in a more traditional
text document.

Unfortunately, scanning and converting the letters and numbers is one thing,
but understanding just how that data should be placed into which cells on a
spreadsheet is beyond the ability of most OCR software. In fact, I’m not
currently aware of any that can do so. (Check the comments to this article, as
I’m sure readers will add suggestions.)

Even as OCR technology continues to improve, I’d fully expect there to be
spreadsheet based documents that even the best software would never be able to
“understand”.

My recommendation today is to OCR the document into a the software’s best
guess at a regular word processing document, and then spend the time to create
your new spreadsheet by hand, copy/pasting the data in the ways that you know
the spreadsheet should be organized. It is work, but you’ll get the spreadsheet
you expect.

Do this

Subscribe to Confident Computing! Less frustration and more confidence, solutions, answers, and tips in your inbox every week.

I'll see you there!

12 comments on “How do I scan a document into Excel?”

  1. ABBYY Finereader Professional Edition (I have used v8) makes a good job of scanning and converting columnar data automatically into Excel format where the rows and columns are well defined (ie with horizontal/vertical lines in the original document). If they are not well defined the software enables you to put in “separators” yourself – something that can be done quite quickly. You can also load .pdf documents into it, OCR them and also output to Excel format. However, regardless of the source, once loaded in Excel all numbers are treated as text and use of one of Excel’s text to number features is necessary to convert the numerical information into numbers.

    Hope this helps

    Reply
  2. I am interested in David’s comment on3/23/07. I want to scan an Excel document with vert. and horiz. data that is calculated by formulae in the document. Will the scanned in document include the fomulae?

    Reply
  3. I had a Excel spreadsheet that I had printed out years ago but I couldn’t find the file on my computer. I did have a copy of the printout and I just tried using ABBYY “Scan to Office” program and it worked great! It’s a lot cheaper than the ABBYY Finereader Professional, too!

    Reply
  4. Another tip I have used successfully is to draw the vertical lines with a ruler and a pen in the document (the original or a copy). Then, after scanning it into word it is easily recognizable as a table, and then you can paste it into excel.

    Reply
  5. Abbyy Scan to Office is a great little program but you are not able to buy it anywhere. Abbyy in their wisdom have discontinued a perfectly usable program for everyday people and have replaced it with another program that does not have a trial period and cost $50 more..

    Reply
  6. Progress has been made here, but it’s not cheap.
    ReadIRIS Pro is about $140, but it has a “table” function. (not affiliated, just found it, tried the free trial – it works!) After scanning, it gives you an image with everything it ‘thinks’ you want – you can change that by deleting its selections, choosing the table icon (pink) and re-scanning, and saving in Excel. Rows and columns, baby !!!

    Reply

Leave a reply:

Before commenting please:

  • Read the article.
  • Comment on the article.
  • No personal information.
  • No spam.

Comments violating those rules will be removed. Comments that don't add value will be removed, including off-topic or content-free comments, or comments that look even a little bit like spam. All comments containing links and certain keywords will be moderated before publication.

I want comments to be valuable for everyone, including those who come later and take the time to read.