Helping people with computers... one answer at a time.

Web browsers can save the web page that you're currently viewing to your computer in a handful of formats. I'll look at each and what I do instead.

I have been using Opera as my web browser for about two months. I am running Windows 2000 on a Gateway P4-1300 and I am generally happy with it. Before Opera, I used Internet Explorer 6. I like to keep files with interesting text and graphics available for reference, thus I have downloaded a lot of ".htm" files. What IE6 downloads as .htm, Opera downloads as ".mht". When I send these to friends by attaching them to email, frequently my friends are unable to read them; that is, unable to read .mht files. At that point, I am not always able to send them a URL instead of the file. Moreover, when I am offline and I attempt to read one of these .mht files, I invoke Opera, which I do not want to do. I want to know two things: 1. Is there a way to set up Opera to download .htm instead of .mht? 2. Is there a way to convert all of my .mht to .htm and if so, what is it? I have nothing against using fourth-party software, but I prefer that it be free. Because of this problem, I will soon give up Opera and go on to something else. But, even if I buy a new machine, I will still want to convert .mht to .htm.

.mht and .htm files are two related, yet quite different things. Both contain the web page that you might be viewing, but only one contains all of the web page that you're viewing.

To understand why that is, and from that, understand what you might want to do, we need to look at how web pages are constructed and what happens when you try to save one.

A quick note about Opera: I'm afraid that I can't speak to getting Opera to "File" -> "Save as..." a different file format. I don't run Opera, but I suspect that if it's not an option on the Save As... dialog box, then a) I'd be surprised, and b) it wouldn't appear to be a supported option. Perhaps a reader will fill us both in with a comment.

I will, however, talk about conversion below.

Web Pages

What most people don't realize is that web pages are comprised of many files.

It starts with the fairly obvious base page, the file that you asked your browser to fetch. For example, if you visit the article:

Internet Safety: How do I keep my computer safe on the internet?

the URL for that is:

http://ask-leo.com/internet_safety_how_do_i_keep_my_computer_safe_on_the_internet.html

which causes this single file to be fetched by your browser:

internet_safety_how_do_i_keep_my_computer_safe_on_the_internet.html

That base page is text only. In fact, right-click on either of those links above and choose Save As, and you'll get a text file that you can open in Notepad to see all of the gory HTML that makes up the basic page instructions.

"A web page is much more than just a single file."

But there's a lot that's missing. There are no pictures, such as my logo. There is additional style and scripting information that's also missing, which makes the page look funny when it's viewed.

Those are stored in other files which are referenced by, but not kept in, the base file. When your browser opens that file for display, it actually runs though what's coded therein and fetches the additional files as needed. It'll see the reference to http://img.askleomedia.com/askleonew.png, the site logo image, and go download that additional file. Similarly, it'll also see references to a file http://med.askleomedia.com/al.css, the style sheet which controls the look and feel of the site, and download that additional file as well.

Repeat that process a number of times for additional images, Javascript files, advertisements and more, and suddenly, it becomes very clear:

A web page is much more than just a single file.

But Save As... is designed to save one single file. So what should it do?

As revealed by Internet Explorer's own "Save as type:" dropdown, there are several options.

Internet Explorer's Save-As Formats
Internet Explorer's Save-As Formats

Webpage, HTML only (*.htm, *.html)

Saving in this format saves only the base file as I described it above.

The result will be a single file, named with .htm or .html (the two are equivalent).

If you later open this saved .htm file in your browser, it will see the references to all of the other supporting files and attempt to connect to the internet to get them. If they're no longer available, or you have no internet connection, then the attempts will fail, and the page will probably be displayed improperly in one way or another.

Webpage, complete (*.htm, *.html)

Saving in this format saves the base file, as I described it above, but it also saved all of the files that are referenced by it. For example, saving Internet Safety: How do I keep my computer safe on the internet? in this format results in:

  • A File: "Internet Safety How do I keep my computer safe on the internet.htm", which is the base HTML file of the page.

  • A Folder: "Internet Safety How do I keep my computer safe on the internet_files", which contains all of the supporting files referenced by the page.

In the base file, all of the references to the supporting files have been changed to refer to the files located in the folder that was created.

That base file and the files contained in the accompanying folder together are the saved page.

The problem, of course, is that there are multiple files in a specific layout. In order to share them with someone, you'd have to share all of the files, or perhaps zip them into a single archive and then share that, which would then have to be unzipped in order to be seen.

That's inconvenient, to say the least.

Web Archive, single file (*.mht)

A Web Archive takes everything in the "Webpage, complete", and places it into a single file that's then easy to share.

It's actually a pretty interesting solution.

The file is a pure text file. You can open it in Notepad if you're curious as to its contents. If you do so, I think you'll be surprised. I know I was:

From: <Saved by Windows Internet Explorer 7>
Subject: Internet Safety: How do I keep my computer safe on the internet?
Date: Sun, 24 Apr 2011 14:54:26 -0700
MIME-Version: 1.0
Content-Type: multipart/related;
        type="text/html";
        boundary="----=_NextPart_000_0000_01CC028F.81907A50"
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.5994

This is a multi-part message in MIME format.

------=_NextPart_000_0000_01CC028F.81907A50
Content-Type: text/html;
        charset="utf-8"
Content-Transfer-Encoding: quoted-printable
Content-Location: http://ask-leo.com/internet_safety_how_do_i_keep_my_computer_safe_on_the_internet.html

=EF=BB=BF<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" =
"http://www.w3c.org/TR/1999/REC-html401-19991224/loose.dtd">
<HTML xmlns=3D"http://www.w3.org/1999/xhtml"><HEAD><TITLE>Internet =
Safety: How do I keep my computer safe on the internet?</TITLE>
<META content=3D"" name=3Dkeywords>
.
.
.

Wait... what? "From:"? "Subject:"? What are those doing there?

Indeed. A MHT file is nothing more than a multi-part mime email message.

MHT is short for MHTML, which stands for Mime HTML.

HTML formatted email has long been able to include all of the resources needed within a single message. Saving from your browser into MHT simply leverages this.

Try it sometime: rename a ".mht" file to ".eml", and double-click on it. Chances are that it will open in your default email program rather than your browser.

Text File (*.txt)

Saving a page as a text file removes all of the HTML, graphics, and other formatting whatnot and saves only the text on the page. Nothing more, nothing less.

Conversions

You'd asked about converting; it's either easy or impossible, depending on what you have.

For whichever of the two formats that you have on hand, if you can open that file in a browser so that it looks correct, you can then use Save As... again in a different format without trouble.

On the other hand, if you have just the base page as a .htm file and the browser cannot load the additional files that it references, then there's little you can do.

An Alternative for Sharing Web Pages

I don't do any of the above.

Instead, I have a "print to PDF" printer driver installed on my computer. I happen to use CutePDF right now.

After you install a print-to-PDF utility, a new printer driver appears on your computer. Instead of using Save As..., click Print and select the PDF printer as the destination. You'll be prompted to provide a filename, and the result will be a single .pdf file that contains the webpage contents.

I much prefer .pdf files for sharing things like this. They don't require browser support, and .pdf readers are plentiful, free and available for just about any platform that you might want to use.

It's simple and it just works.

Article C4804 - April 27, 2011 « »

Share this article with your friends:

Share this article on Facebook Tweet this article Email a link to this article
Leo Leo A. Notenboom has been playing with computers since he was required to take a programming class in 1976. An 18 year career as a programmer at Microsoft soon followed. After "retiring" in 2001, Leo started Ask Leo! in 2003 as a place for answers to common computer and technical questions. More about Leo.

Not what you needed?

19 Comments
Opera User
May 2, 2011 11:33 AM

I am a Opera user(ver. 11.10).
Yes Opera will give the user 5 options
2 are html.
1 - mht
1 - text

Chris
May 3, 2011 12:48 PM

Opera has a "save as" in the file menu. When the save dialogue box opens you have four choices. I just saved a page using Opera (which deserves greater success) and then opened it in Firefox.

Jim de Graff
May 3, 2011 2:37 PM

PDFCreator is also excellent (and free) and is available at SourceForge at

{URL removed}

I've retracted my recommendation of PDFCreator because they install additional un-asked for software now. More here.
Leo
03-May-2011

Jim de Graff
May 3, 2011 2:42 PM

I'd like to make another recommendation. Often, that online, interesting article is displayed in a small font, surrounded by a multitude of links to similar articles, ads, etc. Arc90 has a free plugin called Readability. Once you have the page displayed, click on the Readability icon and the page will be reformatted in a manner that is easier to read with the distracting stuff removed. Printing to a PDF printer saves the article in a nicer format. Currently, there is a minor problem printing via Firefox (inserts a mostly blank page at the start). Printing works much better from Chrome. Download Readability at

https://www.readability.com/addons

Andrew
May 3, 2011 6:03 PM

Microsoft Word can natively open mht files and save them as pdfs with no extra software.

Jonathan
May 3, 2011 9:48 PM

I, too use Opera (it loads more quickly than Firefox these days). The problem might be that the "save as" function has "Web archive" as default (which is MHTML).

Terry Hollett
May 4, 2011 2:43 AM

I'm a Opera fan. When you save a file, just click on the "Save as type" option to switch to "html" or "html with images".

Opera unfortunately defaults to mht. I tried to open a mht in Firefox at one time and the download dialog box opened. Even opening them in Opera gives mixed results, like opening it as a text version.

I hate this format and never use it.

Mark
May 4, 2011 4:39 AM

I also use Opera. You can save pages like Terry says, by changing the drop-down option box selection.

You can also just drag-and-drop MHT files into an open IE window to open them, or use the right-click Open With option and use IE again. (Firefox seems to lack the native ability to open MHT files.)

Leo Morgan
May 4, 2011 8:09 AM

The answer to the question is to use your .mht file as you currently do. On your friends' computers, right-click on the .mht file, and choose the 'open with' option. This will produce the list of programs able to open the file. Their browser should easily open the file. I just double checked on my machine, opening .mht saved from Opera with Internet Explorer.

Vivek Wilfred
May 5, 2011 7:49 AM

You could try this for a mass conversion of MHT to HTM. This may not work out smoothly if Opera is using it's own proprietary implementation of the MHTML format rather than the Microsoft proprietary implementation. Anyway you could give it a try.

Install Firefox 4 or an older version if Win 2000 is not an option for Firefox 4. Start Firefox and click 'Tools' > 'Add-ons'. In the add-ons window on the top right corner, search for 'Mozilla Archive Format' and install it. After installation and restarting (don't worry about the Welcome Page options, it doesn't matter), again go to Tools > Add-ons, click 'Extensions' on the left side and click the options for 'Mozilla Archive Format'. Click the 'Actions' tab on top and then 'Convert Saved Pages'. You'll have different options to try including different file types to convert from and to, and mass conversion of files in folders + sub folders in one go. (These steps should be almost the same between the different Firefox versions except for minor UI changes). With this add-on, Firefox can also be used to open MHT files.

PS: I agree with Leo that PDF is definitely a more universal way for exchanging electronic documents and also has options for security like copying or even viewing. But another thing is that an MHTML or a MAFF archive (Mozilla Archive Format) is much smaller than a PDF or zipped PDF of the same web page, while being clean and comprehensive and faithful to the original. Of course this wouldn't matter if you are not saving / archiving a large number of web pages.

John C.
May 9, 2011 8:57 AM

I would never email either an HTML page or a PDF of a web page. That's just way too much data to be putting on the network. The best choice should be to send just the URL to the web page. Do this by clicking on File -> Send Link.

Cliff
May 22, 2011 3:34 AM

I love PDFs, but how do you get links (like: "Internet Safety: How do I keep my computer safe on the internet?") to work. I have yet to find a PDF printer that allows these type of links to work.

I'm not really sure what you're after - a PDF printer *creates* a PDF from an application that prints. A link to a PDF is already a PDF - no PDF printer is required.
Leo
24-May-2011

Carlos R Coquet
June 2, 2011 8:20 PM

I don't have time to check my hypothesis but I smell that if the page being saved contains sub windows with panning controls (horizontal or vertical bars) the PDF file will NOT and the receiver will not see all the data.
As for the guy that suggested sending the URL, you are missing the fact that a URL is highly volatile. What you see now may not be what you see a minute from now. Case in point, a CraigsList ad. It may be taken down by the time someone uses the link you sent.
PRESERVING a point in time is often a reason to save a Web page in your own computer.

John King
March 12, 2012 11:22 PM

Thanks for renaming .mht to .eml and "print to PDF" hints. I will use EML for past saves and PDF for future saves. Thank you very much.

My Problem:
I save websites in MHT for all programs I download and then move everything to a different data hard drive. Unfortunately, MHT will not load after changing drive locations. Can anyone explain WHY. I do not see any reference to location when opening the .mht in text format.

John King
March 12, 2012 11:35 PM

I think Cliff is wanting to know if the PDF copy of the website is form only (image) or will retain the website links which can be clicked from within the PDF.

I personally want to retain the text, formating, images AND clickable links.

John King
March 13, 2012 4:51 AM

CutePDF Writer 2.8 has (2) major limitations:

1. Right-hand text is cut off on Websites printed to PDF. (major issue)

"Text characters are wrong or missing in generated PDF file" in FAQ does not resolve missing text.

Scaling down does not resolve the issue.

2. Website links are not clickable on web pages printed to PDF.

Screengrab 0.96.3 (discontinued) in Firefox 3.6.27 Portable captures the ENTIRE image in JPEG/PNG but it is not PDF nor are the links clickable but at least the page is not cutoff.

I'm still looking for Ie tool to capture an ENTIRE web page, preferably in PDF with clickable links.


CutePDF works for me in the conditions you mention - I regularly print websites all the time. More often it's a problem with the website itself that causes printing to PDF to mess up. (And you are correct, click-able links are NOT preserved through any kind of print solution.)
Leo
17-Mar-2012

Mark J
March 13, 2012 2:34 PM

@John
I believe that hyperlink preservation is a function of which pdf printer program you use. You'll have to check what the program's website says about it. I think very few or possibly none of the free pdf creators have this feature.

Armstrong
April 22, 2012 5:23 PM

I would like to convert mht files that show up when I try to save pictures from facebook in my pictures folder , but when I click on them I am returned to facebook. What is the answer to this.

Derek Sleno
August 27, 2012 7:09 AM

The relationship between MHT and EML is very close, yet often file attachments in an .eml file may become inaccessible. This was a problem for me. There's a mht file viewer you can download at http://www.mhtviewer.com that can make it easy to browse, search and export a directory of .eml or .mht or .mhtml files. It's commercial software, but if renaming the .eml file and opening it a browser isn't working for you, it might be a good solution for getting to those .eml email file attachments.

Comments on this entry are closed.

If you have a question, start by using the search box up at the top of the page - there's a very good chance that your question has already been answered on Ask Leo!.

If you don't find your answer, head out to http://askleo.com/ask to ask your question.