Helping people with computers... one answer at a time.

Web browsers can save the web page that you're currently viewing to your computer in a handful of formats. I'll look at each and what I do instead.

I have been using Opera as my web browser for about two months. I am running Windows 2000 on a Gateway P4-1300 and I am generally happy with it. Before Opera, I used Internet Explorer 6. I like to keep files with interesting text and graphics available for reference, thus I have downloaded a lot of ".htm" files. What IE6 downloads as .htm, Opera downloads as ".mht". When I send these to friends by attaching them to email, frequently my friends are unable to read them; that is, unable to read .mht files. At that point, I am not always able to send them a URL instead of the file. Moreover, when I am offline and I attempt to read one of these .mht files, I invoke Opera, which I do not want to do. I want to know two things: 1. Is there a way to set up Opera to download .htm instead of .mht? 2. Is there a way to convert all of my .mht to .htm and if so, what is it? I have nothing against using fourth-party software, but I prefer that it be free. Because of this problem, I will soon give up Opera and go on to something else. But, even if I buy a new machine, I will still want to convert .mht to .htm.

.mht and .htm files are two related, yet quite different things. Both contain the web page that you might be viewing, but only one contains all of the web page that you're viewing.

To understand why that is, and from that, understand what you might want to do, we need to look at how web pages are constructed and what happens when you try to save one.

A quick note about Opera: I'm afraid that I can't speak to getting Opera to "File" -> "Save as..." a different file format. I don't run Opera, but I suspect that if it's not an option on the Save As... dialog box, then a) I'd be surprised, and b) it wouldn't appear to be a supported option. Perhaps a reader will fill us both in with a comment.

I will, however, talk about conversion below.

Web Pages

What most people don't realize is that web pages are comprised of many files.

It starts with the fairly obvious base page, the file that you asked your browser to fetch. For example, if you visit the article:

Internet Safety: How do I keep my computer safe on the internet?

the URL for that is:

http://ask-leo.com/internet_safety_how_do_i_keep_my_computer_safe_on_the_internet.html

which causes this single file to be fetched by your browser:

internet_safety_how_do_i_keep_my_computer_safe_on_the_internet.html

That base page is text only. In fact, right-click on either of those links above and choose Save As, and you'll get a text file that you can open in Notepad to see all of the gory HTML that makes up the basic page instructions.

"A web page is much more than just a single file."

But there's a lot that's missing. There are no pictures, such as my logo. There is additional style and scripting information that's also missing, which makes the page look funny when it's viewed.

Those are stored in other files which are referenced by, but not kept in, the base file. When your browser opens that file for display, it actually runs though what's coded therein and fetches the additional files as needed. It'll see the reference to http://img.askleomedia.com/askleonew.png, the site logo image, and go download that additional file. Similarly, it'll also see references to a file http://med.askleomedia.com/al.css, the style sheet which controls the look and feel of the site, and download that additional file as well.

Repeat that process a number of times for additional images, Javascript files, advertisements and more, and suddenly, it becomes very clear:

A web page is much more than just a single file.

But Save As... is designed to save one single file. So what should it do?

As revealed by Internet Explorer's own "Save as type:" dropdown, there are several options.

Internet Explorer's Save-As Formats
Internet Explorer's Save-As Formats

Webpage, HTML only (*.htm, *.html)

Saving in this format saves only the base file as I described it above.

The result will be a single file, named with .htm or .html (the two are equivalent).

If you later open this saved .htm file in your browser, it will see the references to all of the other supporting files and attempt to connect to the internet to get them. If they're no longer available, or you have no internet connection, then the attempts will fail, and the page will probably be displayed improperly in one way or another.

Webpage, complete (*.htm, *.html)

Saving in this format saves the base file, as I described it above, but it also saved all of the files that are referenced by it. For example, saving Internet Safety: How do I keep my computer safe on the internet? in this format results in:

  • A File: "Internet Safety How do I keep my computer safe on the internet.htm", which is the base HTML file of the page.

  • A Folder: "Internet Safety How do I keep my computer safe on the internet_files", which contains all of the supporting files referenced by the page.

In the base file, all of the references to the supporting files have been changed to refer to the files located in the folder that was created.

That base file and the files contained in the accompanying folder together are the saved page.

The problem, of course, is that there are multiple files in a specific layout. In order to share them with someone, you'd have to share all of the files, or perhaps zip them into a single archive and then share that, which would then have to be unzipped in order to be seen.

That's inconvenient, to say the least.

Web Archive, single file (*.mht)

A Web Archive takes everything in the "Webpage, complete", and places it into a single file that's then easy to share.

It's actually a pretty interesting solution.

The file is a pure text file. You can open it in Notepad if you're curious as to its contents. If you do so, I think you'll be surprised. I know I was:

From: <Saved by Windows Internet Explorer 7>
Subject: Internet Safety: How do I keep my computer safe on the internet?
Date: Sun, 24 Apr 2011 14:54:26 -0700
MIME-Version: 1.0
Content-Type: multipart/related;
        type="text/html";
        boundary="----=_NextPart_000_0000_01CC028F.81907A50"
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.5994

This is a multi-part message in MIME format.

------=_NextPart_000_0000_01CC028F.81907A50
Content-Type: text/html;
        charset="utf-8"
Content-Transfer-Encoding: quoted-printable
Content-Location: http://ask-leo.com/internet_safety_how_do_i_keep_my_computer_safe_on_the_internet.html

=EF=BB=BF<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" =
"http://www.w3c.org/TR/1999/REC-html401-19991224/loose.dtd">
<HTML xmlns=3D"http://www.w3.org/1999/xhtml"><HEAD><TITLE>Internet =
Safety: How do I keep my computer safe on the internet?</TITLE>
<META content=3D"" name=3Dkeywords>
.
.
.

Wait... what? "From:"? "Subject:"? What are those doing there?

Indeed. A MHT file is nothing more than a multi-part mime email message.

MHT is short for MHTML, which stands for Mime HTML.

HTML formatted email has long been able to include all of the resources needed within a single message. Saving from your browser into MHT simply leverages this.

Try it sometime: rename a ".mht" file to ".eml", and double-click on it. Chances are that it will open in your default email program rather than your browser.

Text File (*.txt)

Saving a page as a text file removes all of the HTML, graphics, and other formatting whatnot and saves only the text on the page. Nothing more, nothing less.

Conversions

You'd asked about converting; it's either easy or impossible, depending on what you have.

For whichever of the two formats that you have on hand, if you can open that file in a browser so that it looks correct, you can then use Save As... again in a different format without trouble.

On the other hand, if you have just the base page as a .htm file and the browser cannot load the additional files that it references, then there's little you can do.

An Alternative for Sharing Web Pages

I don't do any of the above.

Instead, I have a "print to PDF" printer driver installed on my computer. I happen to use CutePDF right now.

After you install a print-to-PDF utility, a new printer driver appears on your computer. Instead of using Save As..., click Print and select the PDF printer as the destination. You'll be prompted to provide a filename, and the result will be a single .pdf file that contains the webpage contents.

I much prefer .pdf files for sharing things like this. They don't require browser support, and .pdf readers are plentiful, free and available for just about any platform that you might want to use.

It's simple and it just works.

Article C4804 - April 27, 2011 « »

Share this article with your friends:

Share this article on Facebook Tweet this article Email a link to this article
Leo Leo A. Notenboom has been playing with computers since he was required to take a programming class in 1976. An 18 year career as a programmer at Microsoft soon followed. After "retiring" in 2001, Leo started Ask Leo! in 2003 as a place for answers to common computer and technical questions. More about Leo.

Not what you needed?

Recent Comments
19 Comments
John King
March 12, 2012 11:35 PM

I think Cliff is wanting to know if the PDF copy of the website is form only (image) or will retain the website links which can be clicked from within the PDF.

I personally want to retain the text, formating, images AND clickable links.

John King
March 13, 2012 4:51 AM

CutePDF Writer 2.8 has (2) major limitations:

1. Right-hand text is cut off on Websites printed to PDF. (major issue)

"Text characters are wrong or missing in generated PDF file" in FAQ does not resolve missing text.

Scaling down does not resolve the issue.

2. Website links are not clickable on web pages printed to PDF.

Screengrab 0.96.3 (discontinued) in Firefox 3.6.27 Portable captures the ENTIRE image in JPEG/PNG but it is not PDF nor are the links clickable but at least the page is not cutoff.

I'm still looking for Ie tool to capture an ENTIRE web page, preferably in PDF with clickable links.


CutePDF works for me in the conditions you mention - I regularly print websites all the time. More often it's a problem with the website itself that causes printing to PDF to mess up. (And you are correct, click-able links are NOT preserved through any kind of print solution.)
Leo
17-Mar-2012

Mark J
March 13, 2012 2:34 PM

@John
I believe that hyperlink preservation is a function of which pdf printer program you use. You'll have to check what the program's website says about it. I think very few or possibly none of the free pdf creators have this feature.

Armstrong
April 22, 2012 5:23 PM

I would like to convert mht files that show up when I try to save pictures from facebook in my pictures folder , but when I click on them I am returned to facebook. What is the answer to this.

Derek Sleno
August 27, 2012 7:09 AM

The relationship between MHT and EML is very close, yet often file attachments in an .eml file may become inaccessible. This was a problem for me. There's a mht file viewer you can download at http://www.mhtviewer.com that can make it easy to browse, search and export a directory of .eml or .mht or .mhtml files. It's commercial software, but if renaming the .eml file and opening it a browser isn't working for you, it might be a good solution for getting to those .eml email file attachments.