|
Summary: Copying a web page to archive or read later is not terribly difficult, but getting everything to copy as you see it can be a challenge.
It depends a little on exactly what you're trying to do. I know you're trying to copy a web page, but why? So you can modify it, or just save a copy for your archives? There are several approaches. None of them are what I'd call really clean, but depending on your goal, one or more of them might work for you. • Print to PDF If all you're attempting to do is save a copy of the page for your records, this is my number one recommendation. I do it myself for almost all of my banking records. I visit my bank's web site every month, display the statement, and then "print" it to a PDF file which I then save. PDF files are great for several reasons. With the right software installed they're easy to produce, and PDF has become so ubiquitous that finding a PDF reader is almost trivially easy. Chances are you already have one downloaded on your machine. If you're running Windows XP or older I recommend PDF Creator. This is a free, open source utility for creating PDFs. Install it, and you'll get a virtual printer driver that you simply print to which produces PDFs as your "printed" results. PDFCreator appears to have difficulty in Windows Vista. Scan the discussion forums for some hints, if you'd like to try to get it working. Alternately Foxit Software, makers of the free Foxit Reader, also make their own PDF Creator. It's not free, but does apparently work under Vista. The highly regarded screen capture utility SnagIt also includes a PDF capture printer driver. And of course there's always Adobe Acrobat itself; it happens to be what I use on my Vista laptop since it came bundled. Print to Paper It's probably not what you were looking for, but it had to be said. Quite often for archival purposes actual hard copy is the way to go. Side note: some HTML pages will print differently than they appear on screen. This is actually under the control of the web page designer. If you print this page, for example, items such as the advertisements and menu bar will not be printed. Ideally printing will give you useful but not necessarily identical results. • Copy/Paste
"In general, copy/paste is a reasonable approach when you want to save
only a portion of text that you see on a web page."
OK, so if you still want to take the copy/paste route there are approaches, but there's almost no chance of getting exactly what you see in your browser. Depending on the page design and the program you're pasting into, there are many things that will not copy over or will copy over slightly differently. Consider that the same exact page viewed in two different browsers, for example Internet Explorer and FireFox, will look slightly different. You'll see the same exact page, and yet not the same exact results. If different browsers which are specifically designed for viewing web pages can't get it the same, then the chances of other programs such as Word doing so are basically zero. To start with, in your browser copy the document by doing this:
Now in Word, type CTRL+V to paste. If you do this with, say, the Ask Leo! home page you'll see it looks quite different than the original:
The content is there, but the formatting is gone. In fact, it appears that Word did not get the stylesheet that is associated with my pages. Stylesheets can control a tremendous amount of the content and formatting of web pages. In my case the results are still somewhat usable, but I can easily see that other sites which rely even more heavily on stylesheets might be more seriously affected. In general, copy/paste is a reasonable approach when you want to save only a portion of text that you see on a web page. Various limitations make it less than ideal for trying to save the entire page. • File Save Most people miss the fact that there's a "Save" item on the file menu in their browser. While viewing a web page you want to save, click on the File menu, and then the Save or Save As.... Make sure that the save type is a ".htm" or ".html", and you'll get a true copy of the web page saved to your local machine. Naturally, there are caveats here also. The web page may be saved as only the html. Meaning that all the images or other files referenced within the HTML page may not be saved. Depending on your browser when you the view that saved page later, these items may not display, or they may be fetched automatically from the web, assuming that they're still on the original web site. The web page may be saved with all the images and additional files. This is handy because it's as close to a snapshot of the web page as you can get. The problem is that it's not saved in a single file. You may find "mysavedfile.html" as your saved file, if that's what you called it, but then you'll also find a sub-folder called "mysavedfile_files" where all of the images and other components have also been downloaded. You'll need to keep both that ".html" file as well as the files that came with it to accurately save a copy of the page. Related:
• Recent Comments
Now that I downloaded PDF Creator, I just tried to save this page using it. All it got was down to the first paragraph of the copy/paste portion. It also didn't save ads, but that didn't hurt my feelings. However the lost information would have bothered me. I think saving to the archive is safer! Posted by: Judith Currier at May 26, 2007 07:26 AMA great subject, mini-tutorial reply by Leo and useful informed comments.Another simple consideration when using Copy & Paste from a web page, especially at your bank's site: highlight the material/info you want to save/copy type Ctrl +C, then go to the page where you're going to place the info . . . Click > 'Edit'> 'Paste Special' and click 'Unformated Text'. This will place the saved info on your page without any of the annoying formating from the web page this allows you to format it to match existing font, color, size etc.or just leave the copied results as they are. Posted by: Frank at May 26, 2007 10:24 AMAnother great pdf creator is Primo. It is a free tool and easy to install and us. Posted by: Neville Turbit at May 27, 2007 01:19 AM-----BEGIN PGP SIGNED MESSAGE----- For those mentioning "mht", or any other bundled-archive format: my concern is PDF has become such a defacto standard for document production and archival That being said, MHT and others are certainly viable alternatives as well if Leo -----BEGIN PGP SIGNATURE----- iD8DBQFGWcTqCMEe9B/8oqERAv9uAJ9B4rZZEkc6C/3tM2TYVSEx7HBTZgCfRjb9 I'm surprised noone's mention HTTrack, "HTTrack is a free (libre/open source) and easy-to-use offline browser utility. Source code is available for Windows and Linux/Unix/BSD." The website is here http://www.httrack.com/. Regards Thanks Leo.Very informative article. Some programs were mentioned for saving webcontent. I personally use the ScrapBook Addon with Firefox. It's an excellent Addon to capture the current level of the current page (text, images et al), multiple levels of the current page, a selected portion of the current page, and last but not the least, the ability to save all pages currently open in tabs. The import/export function of saved pages is neat. All in all, it's a very neat and handy tool for research work, where you need to save numerous web pages. Posted by: Cyber Dude at June 8, 2007 11:01 AMHow can I copy the following web site into word using IE 7.0? http://www.castis.com/english/ch0202.htm Something is disabling CTRL-A. I often want to save only a portion of text that I see on a web page, but on some web pages, I can't select the exact portion, it includes either section before or section after. Do you know why is that? Posted by: Jo at June 26, 2008 10:17 AMjust use the "print screen" key to paste t to paint Posted by: fiona at July 4, 2008 03:45 PMPost a comment on "How do I copy an entire web page?":
|
Archives Advertisers |
|