Helping people with computers... one answer at a time.

Copying a web page to archive or read later is not terribly difficult, but getting everything to copy as you see it can be a challenge.

How do I copy an entire web page? I copy and paste, but not everything appears as I see it. For example I'm copying and pasting a bank statement to Word, but portions of the page appear empty.

It depends a little on exactly what you're trying to do. I know you're trying to copy a web page, but why? So you can modify it, or just save a copy for your archives?

There are several approaches. None of them are what I'd call really clean, but depending on your goal, one or more of them might work for you.

Print to PDF

If all you're attempting to do is save a copy of the page for your records, this is my number one recommendation. I do it myself for almost all of my banking records. I visit my bank's web site every month, display the statement, and then "print" it to a PDF file which I then save.

PDF files are great for several reasons. With the right software installed they're easy to produce, and PDF has become so ubiquitous that finding a PDF reader is almost trivially easy. Chances are you already have one downloaded on your machine.

If you're running Windows XP or older I recommend PDF Creator. This is a free, open source utility for creating PDFs. Install it, and you'll get a virtual printer driver that you simply print to which produces PDFs as your "printed" results. PDFCreator appears to have difficulty in Windows Vista. Scan the discussion forums for some hints, if you'd like to try to get it working.

Alternately Foxit Software, makers of the free Foxit Reader, also make their own PDF Creator. It's not free, but does apparently work under Vista. The highly regarded screen capture utility SnagIt also includes a PDF capture printer driver. And of course there's always Adobe Acrobat itself; it happens to be what I use on my Vista laptop since it came bundled.

Print to Paper

It's probably not what you were looking for, but it had to be said. Quite often for archival purposes actual hard copy is the way to go.

Side note: some HTML pages will print differently than they appear on screen. This is actually under the control of the web page designer. If you print this page, for example, items such as the advertisements and menu bar will not be printed. Ideally printing will give you useful but not necessarily identical results.

Copy/Paste

"In general, copy/paste is a reasonable approach when you want to save only a portion of text that you see on a web page."

OK, so if you still want to take the copy/paste route there are approaches, but there's almost no chance of getting exactly what you see in your browser. Depending on the page design and the program you're pasting into, there are many things that will not copy over or will copy over slightly differently. Consider that the same exact page viewed in two different browsers, for example Internet Explorer and FireFox, will look slightly different. You'll see the same exact page, and yet not the same exact results.

If different browsers which are specifically designed for viewing web pages can't get it the same, then the chances of other programs such as Word doing so are basically zero.

To start with, in your browser copy the document by doing this:

  • Type CTRL+A - this selects everything on the page. It's much more reliable than trying to select everything with the mouse. (I always miss something :-).

  • Type CTRL+C - this copies the selection to the clipboard.

Now in Word, type CTRL+V to paste. If you do this with, say, the Ask Leo! home page you'll see it looks quite different than the original:

Ask Leo home page copied into Microsoft Word

The content is there, but the formatting is gone. In fact, it appears that Word did not get the stylesheet that is associated with my pages. Stylesheets can control a tremendous amount of the content and formatting of web pages. In my case the results are still somewhat usable, but I can easily see that other sites which rely even more heavily on stylesheets might be more seriously affected.

In general, copy/paste is a reasonable approach when you want to save only a portion of text that you see on a web page. Various limitations make it less than ideal for trying to save the entire page.

File Save

Most people miss the fact that there's a "Save" item on the file menu in their browser. While viewing a web page you want to save, click on the File menu, and then the Save or Save As.... Make sure that the save type is a ".htm" or ".html", and you'll get a true copy of the web page saved to your local machine.

Naturally, there are caveats here also.

The web page may be saved as only the html. Meaning that all the images or other files referenced within the HTML page may not be saved. Depending on your browser when you the view that saved page later, these items may not display, or they may be fetched automatically from the web, assuming that they're still on the original web site.

The web page may be saved with all the images and additional files. This is handy because it's as close to a snapshot of the web page as you can get. The problem is that it's not saved in a single file. You may find "mysavedfile.html" as your saved file, if that's what you called it, but then you'll also find a sub-folder called "mysavedfile_files" where all of the images and other components have also been downloaded. You'll need to keep both that ".html" file as well as the files that came with it to accurately save a copy of the page.

Article C3034 - May 24, 2007 « »

Share this article with your friends:

Share this article on Facebook Tweet this article Email a link to this article
Leo Leo A. Notenboom has been playing with computers since he was required to take a programming class in 1976. An 18 year career as a programmer at Microsoft soon followed. After "retiring" in 2001, Leo started Ask Leo! in 2003 as a place for answers to common computer and technical questions. More about Leo.

Not what you needed?

44 Comments
Dan Ullman
May 24, 2007 12:13 PM

Vista and up-to-date xp machines have a third option, XPS printer. Click on File Print Printer Setup (how you get there depends on the program) and select XPS printer as your printer. This is MS's answer to PDF.

That said, I haven't used it myself.

Mat
May 25, 2007 2:01 AM

Leo, you missed out taking a screenshot - Ctrl+Shift+Print Screen. That gets a faithful representation of (most of) what's on the screen. Problematic, of course, if the webpage goes off the bottom or side of the screen, but nothing's perfect.

Carl R. Goodwin
May 25, 2007 6:05 PM

The easiest thing to do is use Firefox, and then install the extension that allows you to do it (I forget the name right now, but I know there is one). Besides, Firefox is a million times better than IE will EVER be. :)

David Hernandez
May 25, 2007 6:41 PM

Another way:

Copy the Web site address (URL), open a word processor (Ex. MSWord or Writer from OpenOffice.org; it may work with other processors).

In the File menu, select Open, and paste the URL in the line where you type the name of the file that you want to open.

Press Enter or Accept and wait.

The word processor will open the site.

In MSWord you have to "break the links" or something like that. I think that option is in the Edition menu.

I believe this works specially with .html files

Thanks Leo for your site!

David
May 25, 2007 8:17 PM

Excellent tip regarding the "save to pdf". One I didn't know...lol Thanks!

Patty
May 25, 2007 11:23 PM

Try the program Net Snippets. I use it a lot, in particular to save copies of web pages showing my receipts of things I have just purchased off the web. The program even makes a cute "snatching" sound as it snatches the content right off the screen and into a nice format that can be easily categorized, organized, searched,etc. It comes with a toolbar and one of the buttons on the toolbar is "Add Entire Page" which, as you guessed, copies the entire page for you. Check it out.

Roger Turner
May 26, 2007 2:20 AM

The best way to save a web page is to "save as" "Web Archive, single file, *.mht". This is much better than using .htm or .html as it does not need to create sub directories and is much more compact. Firefox has an add-on called "Mozilla Archive Format" which saves as *.maf, which can be opened in either Firefox or IE. However, it is only available for Firefox 1.5x but will work in version 2.x but has to be modified and is a bit tricky to do.

John Baldry
May 26, 2007 4:10 AM

You can obtain a screenshot right down to the bottom of a web page even if it goes off the screen - use FastStone Capture, a great piece of freeware from www.faststone.org SnagIt also does scrolling sreenshots, but you have to pay for it.

Judith Currier
May 26, 2007 6:49 AM

I enjoyed the article and downloaded the pdf creator which I had not known about -- sounds handy, and I might go to paperless bank statements now also.

However, I was curious why you didn't mention the Save as Archive in IE (mht extension). That seems to get an exact copy of the page without the fuss of a separate file that contains the graphics, etc. I used to use it a lot, though now that I am using Firefox I don't have that ability anymore.

Judith Currier
May 26, 2007 7:26 AM

Now that I downloaded PDF Creator, I just tried to save this page using it. All it got was down to the first paragraph of the copy/paste portion. It also didn't save ads, but that didn't hurt my feelings. However the lost information would have bothered me. I think saving to the archive is safer!

Frank
May 26, 2007 10:24 AM

A great subject, mini-tutorial reply by Leo and useful informed comments.Another simple consideration when using Copy & Paste from a web page, especially at your bank's site: highlight the material/info you want to save/copy type Ctrl +C, then go to the page where you're going to place the info . . . Click > 'Edit'> 'Paste Special' and click 'Unformated Text'. This will place the saved info on your page without any of the annoying formating from the web page this allows you to format it to match existing font, color, size etc.or just leave the copied results as they are.

Neville Turbit
May 27, 2007 1:19 AM

Another great pdf creator is Primo. It is a free tool and easy to install and us.

Leo A. Notenboom
May 27, 2007 10:50 AM

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

For those mentioning "mht", or any other bundled-archive format: my concern is
future compatibility. What tools can be used to read those formats, and will
they really be around, or will they be everywhere and on every platform?

PDF has become such a defacto standard for document production and archival
that it seems the safest. I can read it on pretty much any machine and any OS -
even my phone. And I expect it to remain viable for a long, long time.

That being said, MHT and others are certainly viable alternatives as well if
you're comfortable with them.

Leo

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (MingW32)

iD8DBQFGWcTqCMEe9B/8oqERAv9uAJ9B4rZZEkc6C/3tM2TYVSEx7HBTZgCfRjb9
TFYKidvjy1od/TJmEVaApqc=
=gmOY
-----END PGP SIGNATURE-----

Stuzz
May 28, 2007 4:25 PM

I'm surprised noone's mention HTTrack, "HTTrack is a free (libre/open source) and easy-to-use offline browser utility. Source code is available for Windows and Linux/Unix/BSD."

The website is here http://www.httrack.com/.

Regards
Stuzz

Peter
May 28, 2007 10:31 PM

Thanks Leo.Very informative article.
Nevertheless,I seldom save to PDF.I find it too restrictive.
It's also as proprietary to Adobe as is the .doc format is to Microsoft.
Personally I don't believe that the PDF format will survive nor will the M$ doc format.
There is a movement afoot to get away from proprietary formats for which royalties have to be paid.
One example would be the Open Document format.
Anyway,my preferred format for now is mht most of the time.
Then html, if the page contains elements (pictures etc)that I may want to save separately.
It's easy to "lift" them out of a html file.
Another advantage is that most browsers open a html or mht file.
They will not open a pdf,unless you have the plugin.
Additionally,most wordprocessors from various companies can open htm,html and mht formats.

Some programs were mentioned for saving webcontent.
I'm sure the posters wrote this in all innocence,
but NetSnippets is no longer available and FastStone Capture went shareware 2 days before the comment was posted.
http://www.faststone.org/index.htm
Regardless,Capture is still a great screencapture program.
I also use easyWebSave from
http://www.easywebaction.com/en/
This is a great, low cost utility for saving webcontent.
As always,just my point of view.
(or 2 cents if you will :)

Cyber Dude
June 8, 2007 11:01 AM

I personally use the ScrapBook Addon with Firefox. It's an excellent Addon to capture the current level of the current page (text, images et al), multiple levels of the current page, a selected portion of the current page, and last but not the least, the ability to save all pages currently open in tabs. The import/export function of saved pages is neat. All in all, it's a very neat and handy tool for research work, where you need to save numerous web pages.

Steve
April 10, 2008 8:34 AM

How can I copy the following web site into word using IE 7.0?

http://www.castis.com/english/ch0202.htm

Something is disabling CTRL-A.
Is there any way to enable it again?

Jo
June 26, 2008 10:17 AM

I often want to save only a portion of text that I see on a web page, but on some web pages, I can't select the exact portion, it includes either section before or section after. Do you know why is that?

fiona
July 4, 2008 3:45 PM

just use the "print screen" key to paste t to paint

Brian Kusumawan
July 8, 2008 7:23 AM

I've tried to use PDF Creator as mentioned but then it does not print the entire web page into PDF. Anyone can help me to solve this? Is there any necessary additional setting for this issue?

Regards,

Brian

rajender
August 12, 2008 4:35 AM

This Web Page Could not be saved ? I am also wont to Apply This Code For My Web Page.But I dont have code for my web page Please give me solution & Source Code.

Leon Soles
September 8, 2008 5:58 PM

With XP Home and XP Pro I use the Print Screet key. That copies the web page or whatever you are trying to save or print a copy of. Then you can paste it into WordPad or whatever wordprocessor you have. I just tried it with Leo's Home Page and came out with what looked like me to be a perfect copy.

Ed
October 21, 2008 12:33 AM

PDF is not a proprietary standard it is an open standard that was officially published on July 1, 2008 by the ISO as ISO 32000-1:2008.
As for printing to pdf on Windows Vista I recommend Smart PDF Converter. One can manipulate the output pdf the same way as he/she can change printer settings for 'usual' paper printer

MANDER
November 4, 2008 10:19 AM

I have to do mark ups on new web page design and use multiple monitors. Best way I found of copying a page and putting it to PDF or Powerpoint is to have the web page active and key Alt+Print Screen then key Ctrl+V to paste it to your end source.

Scot
February 28, 2009 4:49 AM

So now I understand how to save one page on a website, but what if I want to save the entire website. I am going to use file> save all (including images).
Thanks

Bdubs
May 19, 2009 1:20 PM

I am having a brain fart here. Years ago I used to use a SAVE feature that let me select the number of levels deep to go....I used it to work offline and save me dial up speed and time. Had to becarefull that I didn't do too many levels as it would download pages from the links etc (got exponentially larger).... BUT I COULD SWEAR that was built into Internet Explorer??? Version 5 or ??? Was it not?

I can say for sure that it wasn't a paid program, it was free and easy to do. I can't for the life of me remember where or what it was if not IE 5> PLEASE HELP

Bdubs
May 19, 2009 2:25 PM

Found it...(seems under FAVORITEs (at least in IE 6 - haven't checked IE7

Bdubs
May 19, 2009 2:27 PM

It cut my last comment off? Anyway Go to favorites find the site already have to have book marked it, then right click it, select make available offline and then select how many levels etc. GREAT!

Roger
October 4, 2009 8:03 PM

If you're actually trying to copy the entire web page, even the part that is not displayed on the screen unless you scroll down, there are programs [ie FireShot (IE and Firefox) and Screengrab (Firefox)] that will allow you to do this very easily.

Matt
November 4, 2009 4:14 PM

Use UnMHT with FireFox. Works EVERY time unlike the ever frustrating IE with the often messages of "this webpage could not be saved" with no reason why. I like IE - but MS is forcing me to use FF after not having solved this basic feature after all these YEARS.

Akash
February 12, 2010 8:12 AM

Hi Leo,

Is there any free software that allows me to save webpages as a virtual book? Which I can organize by chapters etc. Instead of copy pasting everything to word etc.

Thanks
Akash

Not that I'm aware of.
Leo
13-Feb-2010

Neville Franks
March 24, 2010 10:27 PM

Hi Leo,
The methods you mention here work, but leave a lot to be desired for anyone who is serious about capturing web content and using that information to research specific topics. For this you really need tools that not just let you capture and save content, but let you add notes, edit the content, organize it (trees, tags etc.), quickly find it again and potentially share it with friends and colleagues.

There are a variety of applications available which do this very well, including our product Surfulater. I won't start rattling off all if its features here but suggest that anyone who is interested grab our free trial from http://www.surfulater.com And please do contect me if I can help with anything.

Neville Franks

Garvey Liu
May 9, 2010 1:11 AM

Hi Leo,
Our product CaptureSaver is going to change all that. In fact, CaptureSaver will do you one better.

If you can see it on the internet, you can use CaptureSaver to store it in your own personal, off-line, information storage area -- or shall we say,library!

Please check out it at: http://www.capturesaver.com/

Shawn
May 25, 2010 1:13 PM

I am trying to copy a page I made myself so that I can read and edit the text areas later.

I made an interactive web page that has tons of text areas and boxes that can be edited from the same page. Is there anyway to actually copy the exact content?

P.S. I was successful in "saving as" with firefox, but the place I created it for uses Internet Explorer, where I haven't gotten to work yet.

Casey
May 26, 2010 11:09 AM

Thanks so much for the "print to pdf" tip. Much better than "Shift Print Screen". Thanks!

Charlie, Inside Deepest Central Maryland...
July 13, 2010 12:45 PM

After reading Leo's answers to an awesomely bewildering array of questions for a couple of years now, it seems to be a miracle that the machines we call "computers" actually work at all. These infinite variations of just about everything so far conceived.

I accept that timing the competition, turf guarding, copyright and such all enter into this complexity, but I wonder if there isn't a better way out of, and away from, all of these endless questions that are created each time something "new" is created?

These are such a Babel.

Yes, I've read the article at the top of the page, that's what has provoked this question.

Jim
September 12, 2010 2:52 AM

PicPick--Google it--It is a screen capture software. If all you are trying to do is save a statement for archival purposes you can use this to capture the page as an image. It will even auto scroll the page for you. You can save in multiple image formats as well. I use it regularly for bank statements, bills paid online, etc.

Zurdo
December 12, 2010 1:12 PM

thank you, the following worked for me:
CTRL A
CTRL C
CTRL V

robal
January 11, 2011 7:53 AM

tried the control + a on page i wanted to copy and THIS page and NEITHER time did it work - i have never ever worked on any computer where those types of commands actually ... like the oxymoron they are called ... function.

Arthur
January 23, 2011 1:41 PM

Not just helpful responses but in clear easy to follow language. Giving an idea of how things work. Even if the info is a bit off for a specific situation, you can now google for a better search and find what you exactly need. A very smart site. I'm very savvy, but couldn't match Leo.
Yet I have to add something to this post. Since 1995 I've been downloading everything on a site. css, pics, everything. Locally it looks the same and has the same code. Nothing works perfectly but these site down-loaders are a necessity. Do a search like "Save entire website", and find software like HTTrack or one of the many others.
Also this site would be friendlier if "preview" of a comment wasn't wiped. Not a biggie.
- Arthur

Leah
January 31, 2012 10:14 AM

I tried the save as pdf option and it worked great. I was flubbing around, wondering how to use the information stored somewhere/somehow for the purpose of printing to produce an image and since I never make PDFs, I wouldn't have thought of this. You're pretty much my hero. Thanks!

China
March 1, 2012 1:12 PM

I just found my entire website in PDF form on a site called Printfu.com. WTF? Isn't this a blatant invitation to copy my intellectual property?

Nocutename
March 7, 2012 5:26 PM

I am in need of saving a 100 + pages of a website before they pulled the plug. A quick internet search brought me here. What a simple and eloquent solution to print to PDF. Downloaded the driver and it worked like a charm.

Thank you!

Sunita
August 4, 2012 9:24 PM

Thanks Leo, I wanted to copy content from an html page and after googling a bit, found your advice and saved what I wanted!

Comments on this entry are closed.

If you have a question, start by using the search box up at the top of the page - there's a very good chance that your question has already been answered on Ask Leo!.

If you don't find your answer, head out to http://askleo.com/ask to ask your question.