Helping people with computers... one answer at a time.
Web browsers can save the web page that you're currently viewing to your computer in a handful of formats. I'll look at each and what I do instead.
I have been using Opera as my web browser for about two months. I am running Windows 2000 on a Gateway P4-1300 and I am generally happy with it. Before Opera, I used Internet Explorer 6. I like to keep files with interesting text and graphics available for reference, thus I have downloaded a lot of ".htm" files. What IE6 downloads as .htm, Opera downloads as ".mht". When I send these to friends by attaching them to email, frequently my friends are unable to read them; that is, unable to read .mht files. At that point, I am not always able to send them a URL instead of the file. Moreover, when I am offline and I attempt to read one of these .mht files, I invoke Opera, which I do not want to do. I want to know two things: 1. Is there a way to set up Opera to download .htm instead of .mht? 2. Is there a way to convert all of my .mht to .htm and if so, what is it? I have nothing against using fourth-party software, but I prefer that it be free. Because of this problem, I will soon give up Opera and go on to something else. But, even if I buy a new machine, I will still want to convert .mht to .htm.
.mht and .htm files are two related, yet quite different things. Both contain the web page that you might be viewing, but only one contains all of the web page that you're viewing.
To understand why that is, and from that, understand what you might want to do, we need to look at how web pages are constructed and what happens when you try to save one.
A quick note about Opera: I'm afraid that I can't speak to getting Opera to "File" -> "Save as..." a different file format. I don't run Opera, but I suspect that if it's not an option on the Save As... dialog box, then a) I'd be surprised, and b) it wouldn't appear to be a supported option. Perhaps a reader will fill us both in with a comment.
I will, however, talk about conversion below.
What most people don't realize is that web pages are comprised of many files.
It starts with the fairly obvious base page, the file that you asked your browser to fetch. For example, if you visit the article:
the URL for that is:
which causes this single file to be fetched by your browser:
That base page is text only. In fact, right-click on either of those links above and choose Save As, and you'll get a text file that you can open in Notepad to see all of the gory HTML that makes up the basic page instructions.
But there's a lot that's missing. There are no pictures, such as my logo. There is additional style and scripting information that's also missing, which makes the page look funny when it's viewed.
Those are stored in other files which are referenced by, but not kept in, the base file. When your browser opens that file for display, it actually runs though what's coded therein and fetches the additional files as needed. It'll see the reference to http://img.askleomedia.com/askleonew.png, the site logo image, and go download that additional file. Similarly, it'll also see references to a file http://med.askleomedia.com/al.css, the style sheet which controls the look and feel of the site, and download that additional file as well.
A web page is much more than just a single file.
But Save As... is designed to save one single file. So what should it do?
As revealed by Internet Explorer's own "Save as type:" dropdown, there are several options.
Internet Explorer's Save-As Formats
Saving in this format saves only the base file as I described it above.
The result will be a single file, named with .htm or .html (the two are equivalent).
If you later open this saved .htm file in your browser, it will see the references to all of the other supporting files and attempt to connect to the internet to get them. If they're no longer available, or you have no internet connection, then the attempts will fail, and the page will probably be displayed improperly in one way or another.
Saving in this format saves the base file, as I described it above, but it also saved all of the files that are referenced by it. For example, saving Internet Safety: How do I keep my computer safe on the internet? in this format results in:
A File: "Internet Safety How do I keep my computer safe on the internet.htm", which is the base HTML file of the page.
A Folder: "Internet Safety How do I keep my computer safe on the internet_files", which contains all of the supporting files referenced by the page.
In the base file, all of the references to the supporting files have been changed to refer to the files located in the folder that was created.
That base file and the files contained in the accompanying folder together are the saved page.
The problem, of course, is that there are multiple files in a specific layout. In order to share them with someone, you'd have to share all of the files, or perhaps zip them into a single archive and then share that, which would then have to be unzipped in order to be seen.
That's inconvenient, to say the least.
A Web Archive takes everything in the "Webpage, complete", and places it into a single file that's then easy to share.
It's actually a pretty interesting solution.
The file is a pure text file. You can open it in Notepad if you're curious as to its contents. If you do so, I think you'll be surprised. I know I was:
From: <Saved by Windows Internet Explorer 7> Subject: Internet Safety: How do I keep my computer safe on the internet? Date: Sun, 24 Apr 2011 14:54:26 -0700 MIME-Version: 1.0 Content-Type: multipart/related; type="text/html"; boundary="----=_NextPart_000_0000_01CC028F.81907A50" X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.5994 This is a multi-part message in MIME format. ------=_NextPart_000_0000_01CC028F.81907A50 Content-Type: text/html; charset="utf-8" Content-Transfer-Encoding: quoted-printable Content-Location: http://ask-leo.com/internet_safety_how_do_i_keep_my_computer_safe_on_the_internet.html =EF=BB=BF<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" = "http://www.w3c.org/TR/1999/REC-html401-19991224/loose.dtd"> <HTML xmlns=3D"http://www.w3.org/1999/xhtml"><HEAD><TITLE>Internet = Safety: How do I keep my computer safe on the internet?</TITLE> <META content=3D"" name=3Dkeywords> . . .
Wait... what? "From:"? "Subject:"? What are those doing there?
Indeed. A MHT file is nothing more than a multi-part mime email message.
MHT is short for MHTML, which stands for Mime HTML.
HTML formatted email has long been able to include all of the resources needed within a single message. Saving from your browser into MHT simply leverages this.
Try it sometime: rename a ".mht" file to ".eml", and double-click on it. Chances are that it will open in your default email program rather than your browser.
Saving a page as a text file removes all of the HTML, graphics, and other formatting whatnot and saves only the text on the page. Nothing more, nothing less.
You'd asked about converting; it's either easy or impossible, depending on what you have.
For whichever of the two formats that you have on hand, if you can open that file in a browser so that it looks correct, you can then use Save As... again in a different format without trouble.
On the other hand, if you have just the base page as a .htm file and the browser cannot load the additional files that it references, then there's little you can do.
I don't do any of the above.
Instead, I have a "print to PDF" printer driver installed on my computer. I happen to use CutePDF right now.
After you install a print-to-PDF utility, a new printer driver appears on your computer. Instead of using Save As..., click Print and select the PDF printer as the destination. You'll be prompted to provide a filename, and the result will be a single .pdf file that contains the webpage contents.
I much prefer .pdf files for sharing things like this. They don't require browser support, and .pdf readers are plentiful, free and available for just about any platform that you might want to use.
It's simple and it just works.
Comments on this entry are closed.
If you have a question, start by using the search box up at the top of the page - there's a very good chance that your question has already been answered on Ask Leo!.
If you don't find your answer, head out to http://askleo.com/ask to ask your question.