Helping people with computers... one answer at a time.

Domain names are simple in concept and yet can be constructed in ways that might fool you. I'll look at some examples, and discuss what's important.

Security when clicking onto a web site confounds me. Some sites put the section of the site you are wanting ahead of the web address. Example http://photos.kodak.com and some put the section after example http;//kodak.com/photos. These examples are just made up but I hope you understand what I'm saying. How do I know if I'm on the secure website I'm supposed to be on? At times I see other addresses flashing by on the toolbar that are not the site I clicked on before the actual site appears. I've never see anyone bringing up some of this query.

This simple little question opens up a veritable Pandora's box when it comes to URLS, and understanding what is and is not safe to click on.

The concepts are actually very simple, but the complexity in how those concepts can be combined is staggering. Particularly if someone is attempting to deceive you.

I'll try to make some sense of it all.

The Three Basic URL Components

"URL" is short for Uniform Resource Locator. The most common one we know of is the web address - something like "http://ask-leo.com /how_do_i_know_that_this_web_address_is_safe.html".

There are three primary components to a URL; let's start by looking at what those are. We'll use this URL as our example for discussion:

http://www.somerandomservice.com/folder/page.html?parameter1=value2&parameter2=value2
  • http://www.somerandomservice.com - Server. This identifies the protocol (http - the language of web pages) and the server to contact. www.somerandomservice.com identifies a specific server on the internet from which what follows will be requested.

  • folder/page.html - Page. The page specifies exactly what it is you are requesting from the server. Typically it's a web page - perhaps within a folder on that server, but it might be a program to run on the server or a file to be downloaded.

  • parameter1=value2&parameter2=value2 - Parameters. - Parameters are information that is being supplied to the page. Since "pages" can often be small (or large) computer programs, information from the parameters part of a URL can be given to those programs to as items for them to act on.

"It's unfortunate that something that's fairly simple is actually quite complex once you assume that people will attempt to deceive you."

URL Safety Rule #1: The Server specification ends at the first "/" that occurs after the "http://" start of the URL, and the Page specification ends at the first question mark after that. This rule is important to understanding whether a URL is valid, bogus or misleading.

What Matters is the Server - Part 1

I'll restate the first part of that rule to focus on what we care about:

The server being contacted begins after the "http://" and ends at the next "/".

Or, in this URL, the part that's highlighted:

http://www.somerandomservice.com/folder/page.html?parameter1=value2&parameter2=value2

That's the part that matters, because that's the part that tells your browser what server to connect to. Everything else is secondary. Important, yes, but not nearly as important.

Let's look at one of the ways that phishing attempts often try to fool you. Check out this URL:

http://www.somerandomservice.com/www.paypal.com

It might be tempting to look at that quickly and say "oh, that ends in paypal.com, therefore it's Paypal!"

No it's not. Look again:

http://www.somerandomservice.com/www.paypal.com

Actually that URL loads a page called "www.paypal.com" (a valid page name) from the server www.somerandomservice.com.

Now, my example is probably pretty lame, as "www.somerandomservice.com" is big and obvious at the front of that URL. But scammers use all sorts of variations on this theme to make it look like you're going to some place trusted, when you're not if you don't look closely.

What Matters is the Server - Part 2

For this we need to pick apart the way server names are created and used.

URL's are created from right to left, and the individual components are separated by a period. Consider "www.somerandomservice.com".

  • ".com" is the top level domain, and indicates which registry service is used to register the domain initially.
  • "somerandomservice" is the domain name. This is the part that you purchase (or rather "lease") when you buy and register a domain name.
  • "www." is the subdomain. No further registration is required, once you have the domain you can create as many of these subdomains as you like.

In general, fully qualified domain names like "www.somerandomservice.com" identify a server on the internet. "photos.somerandomserver.com" would typically be a different server, though it doesn't have to be.

The choice between using something like "photos.somerandomserver.com" versus "somerandomserver.com/photos" is purely one of site design and has no security implications. That's just how the person building the website chose to do it. There are geeky pros and cons to each, but for you as a typical web user it doesn't really matter.

What does matter is how subdomains can be abused. For example, it's perfectly possible for this to be a valid domain:

http://www.paypal.com.somerandomservice.com

Once again, with only a quick glance, you might think it was actually paypal.com since it started with "http://www.paypal.com".

In that example "www.paypal.com." is just a subdomain created by the owner of "somerandomservice.com".

Here's a worse example:

http://www.paypal.com------------------------------------------------------------.somerandomservice.com

Once again, it's designed to fool you into looking like paypal.com, but in fact it's not - especially if your browser happens to only show you the first part of the URL in your status bar since it's so long.

And once again, scammers often use many different variations on this technique to trick you.

A Slash is a Slash is a ... %2F?

This was brought up by a comment on this article (thanks Ken!), and is important enough to warrant an update.

Characters in URLs can be "encoded" with a special representation that acts the same as the character it encodes. The format is a percent sign followed by a two digit hexadecimal number (individual digits will be 0-9 or A-F). A space character, for example, is %20, and you'll actually see that in legitimate URLs from time to time since a an actual space character cannot be used.

%2F is the slash character "/".

So this rule:

The server being contacted begins after the "http://" and ends at the next "/".

Still applies, but %2F could be seen in place of "/". More correctly:

The server being contacted begins after the "http:", "/" or "%2F", "/" or "%2F" and ends at the next "/" or "%2F".

It gets ugly, but the thing to remember is just this: %2F is exactly the same as "/".

Here's an example of how it might be abused:

http://www.somerandomservice.com%2Fwww.paypal.com/

That is NOT Paypal. Replace the %2F with "/" and you'll see instead:

http://www.somerandomservice.com/www.paypal.com/

Clearly it goes to somerandomservice.com.

As Ken points out in his comment, any URL with a % notation in the server portion is suspect. % notation after the server portion (in the page or more commonly the parameters) is typically OK.

You Said Secure Website

All of the above is unrelated to what we normally think of as a "secure" website: namely the use of https (note the "s") as the protocol. Https does two important things:

  • It encrypts the data flowing between your computer and the server.

  • It validates that the server you connect to is, in fact, the server you requested.

Note that https doesn't validate you're connecting to the server you think you are, it validates that you're connecting to the server you requested. Those are two different things.

For example, let's say you fall for one of my lame examples above and click on a link like this:

https://www.paypal.com.somerandomservice.com

That's an https connection. It is very possible - not even all that hard actually - for the owner of somerandomservice.com to purchase and install a completely valid https certificate for www.paypal.com.somerandomservice.com.

Thus when you click on that link your browser will confirm that you are indeed connecting to what you asked for: www.paypal.com.somerandomservice.com. That might not be what you think you asked for, if you fell for a scammers trick, but that's all that https can validate for you: you got what you asked for.

Staying Safe

It's unfortunate that something that's fairly simple is actually quite complex once you assume that people will attempt to deceive you.

I'll sum it up with this:

Pay close attention to the domain name, that's everything between "http://" and the next "/", in any URL you are about to click on. Remember that domain names build from the right, so if it ends in, for example, ".paypal.com" you can be assured that it's a domain or sub-domain owned by paypal.com.

Article C4399 - August 15, 2010 « »

Share this article with your friends:

Share this article on Facebook Tweet this article Email a link to this article
Leo Leo A. Notenboom has been playing with computers since he was required to take a programming class in 1976. An 18 year career as a programmer at Microsoft soon followed. After "retiring" in 2001, Leo started Ask Leo! in 2003 as a place for answers to common computer and technical questions. More about Leo.

Not what you needed?

Recent Comments
20 Comments
Jenny G.
August 21, 2010 6:31 PM

I've found the WOT (Web of Trust) Firefox plug in very useful, and it's saved me from some bad sites.

The only problem is that sometimes a really nasty site can be rated with a green symbol in your Google search results. You have hover over the green rating symbol and click through to see the user ratings. Sometimes a site with lots of "red" (danger) ratings still shows up as a "green" (safe) site.

This is actually my biggest concern with validation sites that rely on user feedback for their ratings - they can be gamed so as to provide misleading results.
Leo
23-Aug-2010

William S. Shelton
August 22, 2010 12:14 PM

Hi.I saw an input from a reader that made this comment_"when I clicked on the link to this article from your page I got a warning from Avira, Malware found, "HTML/Spoofing.Gen" was found in a file. It happens every time. The page loads, despite. I click on remove, but concerned I may have something on my computer. This last time the warning popped back up again when I posted this comment."_ I tried and my Kaspersky warned me off; you may have a Trojan.
Trojan-Spy.HTML.Fraud.gen Best wishes. WSS

I don't have a trojan.

In order to discuss this topic this page has on it examples of misleading URLs. Some overly-sensitive anti-malware tools are throwing a false positive because of that.

Again, there is no malware here.
Leo
26-Aug-2010

DuLe
August 22, 2010 4:41 PM

Good article. Some of both the article and comments are over my head. But, bottom line, I never, EVER, click on a link in an email when going to my bank, paypay account, or any other site which requires name/rank/serial number/credit card/password.

I was hoping this article would answer a question I've had for some time but, unless I missed something, it didn't. My question isn't whether a particular website is "safe" but, rather, to which website will I be taken if I click on a link?

Admittedly, most of these links in question are in known spam emails. But I, occasionally, get curious and (knowing I am not going to ever enter any personal information at the site) will click on them.

Here is a made-up example. www.xyz.com/abc. Clicking on that link takes me to, say, adultfriendfinder.com. If my assumption is correct, "xyz" is the server and "abc" redirects to adultfriendfinder.

My question is: how can one determine the destination website in advance of clicking the link?

A concrete example might be hotmailtips.com which takes you to a completely different site. Smile

I actually know of no way for average users to determine the final destination of a URL that is being redirected without actually going to it. I tend to use a very geeky command line tool called "curl" which lists the domains it's accessing as they're redirected, but I don't expect the average user to want that. Perhaps someone knows of such a service and will leave a comment here.
Leo
26-Aug-2010

veenav
October 14, 2010 5:12 PM

i always use google cache for entering a website i don't know and don't accept any cookies from unfamiliar websites or website which i am not part of.

Dennis
February 17, 2012 5:34 PM

Eset 5 works to stop you going to bad sites.It terminates that site.If you try to dowload bad stuff it will auto clean.