Ask Leo!

How do I copy an entire web page?

Home » Web

Summary: Copying a web page to archive or read later is not terribly difficult, but getting everything to copy as you see it can be a challenge.

How do I copy an entire web page? I copy and paste, but not everything appears as I see it. For example I'm copying and pasting a bank statement to Word, but portions of the page appear empty.

It depends a little on exactly what you're trying to do. I know you're trying to copy a web page, but why? So you can modify it, or just save a copy for your archives?

There are several approaches. None of them are what I'd call really clean, but depending on your goal, one or more of them might work for you.

Print to PDF

If all you're attempting to do is save a copy of the page for your records, this is my number one recommendation. I do it myself for almost all of my banking records. I visit my bank's web site every month, display the statement, and then "print" it to a PDF file which I then save.

PDF files are great for several reasons. With the right software installed they're easy to produce, and PDF has become so ubiquitous that finding a PDF reader is almost trivially easy. Chances are you already have one downloaded on your machine.

If you're running Windows XP or older I recommend PDF Creator. This is a free, open source utility for creating PDFs. Install it, and you'll get a virtual printer driver that you simply print to which produces PDFs as your "printed" results. PDFCreator appears to have difficulty in Windows Vista. Scan the discussion forums for some hints, if you'd like to try to get it working.

Alternately Foxit Software, makers of the free Foxit Reader, also make their own PDF Creator. It's not free, but does apparently work under Vista. The highly regarded screen capture utility SnagIt also includes a PDF capture printer driver. And of course there's always Adobe Acrobat itself; it happens to be what I use on my Vista laptop since it came bundled.

Print to Paper

It's probably not what you were looking for, but it had to be said. Quite often for archival purposes actual hard copy is the way to go.

Side note: some HTML pages will print differently than they appear on screen. This is actually under the control of the web page designer. If you print this page, for example, items such as the advertisements and menu bar will not be printed. Ideally printing will give you useful but not necessarily identical results.

Copy/Paste

"In general, copy/paste is a reasonable approach when you want to save only a portion of text that you see on a web page."

OK, so if you still want to take the copy/paste route there are approaches, but there's almost no chance of getting exactly what you see in your browser. Depending on the page design and the program you're pasting into, there are many things that will not copy over or will copy over slightly differently. Consider that the same exact page viewed in two different browsers, for example Internet Explorer and FireFox, will look slightly different. You'll see the same exact page, and yet not the same exact results.

If different browsers which are specifically designed for viewing web pages can't get it the same, then the chances of other programs such as Word doing so are basically zero.

To start with, in your browser copy the document by doing this:

  • Type CTRL+A - this selects everything on the page. It's much more reliable than trying to select everything with the mouse. (I always miss something :-).

  • Type CTRL+C - this copies the selection to the clipboard.

Now in Word, type CTRL+V to paste. If you do this with, say, the Ask Leo! home page you'll see it looks quite different than the original:

Ask Leo home page copied into Microsoft Word

The content is there, but the formatting is gone. In fact, it appears that Word did not get the stylesheet that is associated with my pages. Stylesheets can control a tremendous amount of the content and formatting of web pages. In my case the results are still somewhat usable, but I can easily see that other sites which rely even more heavily on stylesheets might be more seriously affected.

In general, copy/paste is a reasonable approach when you want to save only a portion of text that you see on a web page. Various limitations make it less than ideal for trying to save the entire page.

File Save

Most people miss the fact that there's a "Save" item on the file menu in their browser. While viewing a web page you want to save, click on the File menu, and then the Save or Save As.... Make sure that the save type is a ".htm" or ".html", and you'll get a true copy of the web page saved to your local machine.

Naturally, there are caveats here also.

The web page may be saved as only the html. Meaning that all the images or other files referenced within the HTML page may not be saved. Depending on your browser when you the view that saved page later, these items may not display, or they may be fetched automatically from the web, assuming that they're still on the original web site.

The web page may be saved with all the images and additional files. This is handy because it's as close to a snapshot of the web page as you can get. The problem is that it's not saved in a single file. You may find "mysavedfile.html" as your saved file, if that's what you called it, but then you'll also find a sub-folder called "mysavedfile_files" where all of the images and other components have also been downloaded. You'll need to keep both that ".html" file as well as the files that came with it to accurately save a copy of the page.

Related:

More articles about: Web

Article Useful? Link to it from your own website; just copy/paste this HTML:

Article 11521 | Posted May 24, 2007

Recent Comments

Now that I downloaded PDF Creator, I just tried to save this page using it. All it got was down to the first paragraph of the copy/paste portion. It also didn't save ads, but that didn't hurt my feelings. However the lost information would have bothered me. I think saving to the archive is safer!

Posted by: Judith Currier at May 26, 2007 07:26 AM

A great subject, mini-tutorial reply by Leo and useful informed comments.Another simple consideration when using Copy & Paste from a web page, especially at your bank's site: highlight the material/info you want to save/copy type Ctrl +C, then go to the page where you're going to place the info . . . Click > 'Edit'> 'Paste Special' and click 'Unformated Text'. This will place the saved info on your page without any of the annoying formating from the web page this allows you to format it to match existing font, color, size etc.or just leave the copied results as they are.

Posted by: Frank at May 26, 2007 10:24 AM

Another great pdf creator is Primo. It is a free tool and easy to install and us.

Posted by: Neville Turbit at May 27, 2007 01:19 AM

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

For those mentioning "mht", or any other bundled-archive format: my concern is
future compatibility. What tools can be used to read those formats, and will
they really be around, or will they be everywhere and on every platform?

PDF has become such a defacto standard for document production and archival
that it seems the safest. I can read it on pretty much any machine and any OS -
even my phone. And I expect it to remain viable for a long, long time.

That being said, MHT and others are certainly viable alternatives as well if
you're comfortable with them.

Leo

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (MingW32)

iD8DBQFGWcTqCMEe9B/8oqERAv9uAJ9B4rZZEkc6C/3tM2TYVSEx7HBTZgCfRjb9
TFYKidvjy1od/TJmEVaApqc=
=gmOY
-----END PGP SIGNATURE-----

Posted by: Leo A. Notenboom at May 27, 2007 10:50 AM

I'm surprised noone's mention HTTrack, "HTTrack is a free (libre/open source) and easy-to-use offline browser utility. Source code is available for Windows and Linux/Unix/BSD."

The website is here http://www.httrack.com/.

Regards
Stuzz

Posted by: Stuzz at May 28, 2007 04:25 PM

Thanks Leo.Very informative article.
Nevertheless,I seldom save to PDF.I find it too restrictive.
It's also as proprietary to Adobe as is the .doc format is to Microsoft.
Personally I don't believe that the PDF format will survive nor will the M$ doc format.
There is a movement afoot to get away from proprietary formats for which royalties have to be paid.
One example would be the Open Document format.
Anyway,my preferred format for now is mht most of the time.
Then html, if the page contains elements (pictures etc)that I may want to save separately.
It's easy to "lift" them out of a html file.
Another advantage is that most browsers open a html or mht file.
They will not open a pdf,unless you have the plugin.
Additionally,most wordprocessors from various companies can open htm,html and mht formats.

Some programs were mentioned for saving webcontent.
I'm sure the posters wrote this in all innocence,
but NetSnippets is no longer available and FastStone Capture went shareware 2 days before the comment was posted.
http://www.faststone.org/index.htm
Regardless,Capture is still a great screencapture program.
I also use easyWebSave from
http://www.easywebaction.com/en/
This is a great, low cost utility for saving webcontent.
As always,just my point of view.
(or 2 cents if you will :)

Posted by: Peter at May 28, 2007 10:31 PM

I personally use the ScrapBook Addon with Firefox. It's an excellent Addon to capture the current level of the current page (text, images et al), multiple levels of the current page, a selected portion of the current page, and last but not the least, the ability to save all pages currently open in tabs. The import/export function of saved pages is neat. All in all, it's a very neat and handy tool for research work, where you need to save numerous web pages.

Posted by: Cyber Dude at June 8, 2007 11:01 AM

How can I copy the following web site into word using IE 7.0?

http://www.castis.com/english/ch0202.htm

Something is disabling CTRL-A.
Is there any way to enable it again?

Posted by: Steve at April 10, 2008 08:34 AM

I often want to save only a portion of text that I see on a web page, but on some web pages, I can't select the exact portion, it includes either section before or section after. Do you know why is that?

Posted by: Jo at June 26, 2008 10:17 AM

just use the "print screen" key to paste t to paint

Posted by: fiona at July 4, 2008 03:45 PM

Post a comment on "How do I copy an entire web page?":






(Email Address will not be published.)

Remember Me?

By popular demand...
my tip jar
Cuppa Joe
Buy Leo a Latte!


New!

RSS feed Subscribe to the RSS Feed specifically for comments on this article.

Before commenting, please...

Please wait. Your comment is being processed ...


Ask Your Question:


ask-leo.com
Web

Archives

By Category
By Date

Advertisers

Advertise on Ask Leo!

««   »»

Question? - Ask Leo!
Who is Leo?
Link to Leo!

Terms, Conditions & Privacy