Helping people with computers... one answer at a time.
Copying a web page to archive or read later is not terribly difficult, but getting everything to copy as you see it can be a challenge.
How do I copy an entire web page? I copy and paste, but not everything appears as I see it. For example I'm copying and pasting a bank statement to Word, but portions of the page appear empty.
It depends a little on exactly what you're trying to do. I know you're trying to copy a web page, but why? So you can modify it, or just save a copy for your archives?
There are several approaches. None of them are what I'd call really clean, but depending on your goal, one or more of them might work for you.
Print to PDF
If all you're attempting to do is save a copy of the page for your records, this is my number one recommendation. I do it myself for almost all of my banking records. I visit my bank's web site every month, display the statement, and then "print" it to a PDF file which I then save.
PDF files are great for several reasons. With the right software installed they're easy to produce, and PDF has become so ubiquitous that finding a PDF reader is almost trivially easy. Chances are you already have one downloaded on your machine.
If you're running Windows XP or older I recommend PDF Creator. This is a free, open source utility for creating PDFs. Install it, and you'll get a virtual printer driver that you simply print to which produces PDFs as your "printed" results. PDFCreator appears to have difficulty in Windows Vista. Scan the discussion forums for some hints, if you'd like to try to get it working.
Alternately Foxit Software, makers of the free Foxit Reader, also make their own PDF Creator. It's not free, but does apparently work under Vista. The highly regarded screen capture utility SnagIt also includes a PDF capture printer driver. And of course there's always Adobe Acrobat itself; it happens to be what I use on my Vista laptop since it came bundled.
Print to Paper
It's probably not what you were looking for, but it had to be said. Quite often for archival purposes actual hard copy is the way to go.
Side note: some HTML pages will print differently than they appear on screen. This is actually under the control of the web page designer. If you print this page, for example, items such as the advertisements and menu bar will not be printed. Ideally printing will give you useful but not necessarily identical results.
OK, so if you still want to take the copy/paste route there are approaches, but there's almost no chance of getting exactly what you see in your browser. Depending on the page design and the program you're pasting into, there are many things that will not copy over or will copy over slightly differently. Consider that the same exact page viewed in two different browsers, for example Internet Explorer and FireFox, will look slightly different. You'll see the same exact page, and yet not the same exact results.
If different browsers which are specifically designed for viewing web pages can't get it the same, then the chances of other programs such as Word doing so are basically zero.
To start with, in your browser copy the document by doing this:
Type CTRL+A - this selects everything on the page. It's much more reliable than trying to select everything with the mouse. (I always miss something :-).
Type CTRL+C - this copies the selection to the clipboard.
Now in Word, type CTRL+V to paste. If you do this with, say, the Ask Leo! home page you'll see it looks quite different than the original:
The content is there, but the formatting is gone. In fact, it appears that Word did not get the stylesheet that is associated with my pages. Stylesheets can control a tremendous amount of the content and formatting of web pages. In my case the results are still somewhat usable, but I can easily see that other sites which rely even more heavily on stylesheets might be more seriously affected.
In general, copy/paste is a reasonable approach when you want to save only a portion of text that you see on a web page. Various limitations make it less than ideal for trying to save the entire page.
Most people miss the fact that there's a "Save" item on the file menu in their browser. While viewing a web page you want to save, click on the File menu, and then the Save or Save As.... Make sure that the save type is a ".htm" or ".html", and you'll get a true copy of the web page saved to your local machine.
Naturally, there are caveats here also.
The web page may be saved as only the html. Meaning that all the images or other files referenced within the HTML page may not be saved. Depending on your browser when you the view that saved page later, these items may not display, or they may be fetched automatically from the web, assuming that they're still on the original web site.
The web page may be saved with all the images and additional files. This is handy because it's as close to a snapshot of the web page as you can get. The problem is that it's not saved in a single file. You may find "mysavedfile.html" as your saved file, if that's what you called it, but then you'll also find a sub-folder called "mysavedfile_files" where all of the images and other components have also been downloaded. You'll need to keep both that ".html" file as well as the files that came with it to accurately save a copy of the page.
Comments on this entry are closed.
If you have a question, start by using the search box up at the top of the page - there's a very good chance that your question has already been answered on Ask Leo!.
If you don't find your answer, head out to http://askleo.com/ask to ask your question.