Helping people with computers... one answer at a time.

The best way to stay safe online is to consider that anything you post can be cached, seen, and even copied. Don't post anything you want kept private.

I just learned the phrase "Deep web" today and I'm reading an excellent Wikipedia article explaining that it's not just for criminals and terrorists. A few years ago for personal reasons, I chose to write a memoir style blog on perhaps the most popular social networking site then used for people over the age of 50. I chose this site not only because it was even more popular than Facebook, I chose it specifically because it's content was dynamically generated. Finally, I chose it because I knew not only that I had ready-made readers waiting but an accessible list of those who'd be interested in what I wrote.

I contacted the website company at which I was a paid member and checked to see if any search engine could cache what I wrote. I was told no. What I wrote was a very personal, festivist style airing of grievances. I received tons of email responses from my readers about the various grievances I aired all positive and supportive.

Now, Leo, learning about the Deep Web, I'm wondering if what that website customer service rep told me that the site was cache proof was valid. Are there search engines on the Deep Web? If so, are those search engines tailored to find criminals and terrorists or even regular people?

In this excerpt from Answercast #21, I look at the idea of a "Deep Web" and what information you might find about yourself there.

Deep Web

So there's a number of interesting issues here. I too had not heard the phrase "Deep Web," although I've certainly been familiar with several of the concepts.

Ultimately, here's my take on it. If an individual can view your web page without going through any special hoops like say, connecting to a special network, or entering a username and password, or anything like that, then I'm convinced there's a search engine that has already spidered it and potentially cached it.

There's nothing to protect it from that kind of activity.

Limiting website spidering

What your admin may have been talking about is this thing called robots.txt. It's a file that website owners can place in their website that tells search engines what can and cannot be spidered, or can and cannot be cached. There's also information you can place on an HTML page that basically says the same thing. And that will keep the pages out of the popular search engines.

You will not find those pages in Google; you will not find those pages in Bing.

The problem is that techniques, like robots.txt, and the information that is placed in the HTML file is purely voluntary. It's not a technological solution. It's basically telling a search engine, "Hey, please don't index me. Please don't cache me."

Good search engines will do that; will respect that. Others may not. There's actually no requirement that they do so. It is simply a gentleman's agreement.

By the fact that your website, or your content, is available without any special hoops to jump through, I'm certain, I'm convinced that there are search engines, caching utilities, spiders out there that have and will continue to spider your site and cache its content.

Programs crawl the web constantly

There's really not a lot we can do about that.

We normally think of there only being a handful of search engines; you know, things like Google and Bing and maybe Yahoo back in the day. In reality, there are thousands and thousands of search engines and spiders crawling the web almost constantly.

Everything from:

  • Competitive search engines to
  • Government sponsored data collection utilities to
  • University research projects that are out there just sort of surfing everything they can find.

So, in reality, the fundamental rule of the internet still applies.

Once you post something that is publicly accessible, for all practical purposes, you have lost control over it. You can remove it, but you do not know if somebody hasn't already cached it, made a copy of it, or reposted it somewhere else. That's just the nature of how the public internet works; dark or otherwise, that's just how it works.

Article C5398 - May 28, 2012 « »

Share this article with your friends:

Share this article on Facebook Tweet this article Email a link to this article
Leo Leo A. Notenboom has been playing with computers since he was required to take a programming class in 1976. An 18 year career as a programmer at Microsoft soon followed. After "retiring" in 2001, Leo started Ask Leo! in 2003 as a place for answers to common computer and technical questions. More about Leo.

Not what you needed?

3 Comments
Carol Putman
May 29, 2012 12:04 PM

It was a surprise to me when I Googled my email address that comments and questions I had posed in the Google sites forums (which required that I sign in with a password) were showing up in the search results. It seems to me if you must sign in with an email address and a password, that comments and questions would be protected, but apparently they aren't.

If anyone can come along and read them without signing in, then the search engines can find them too. (Typically you need to sign in to comment.)
Leo
29-May-2012
Pollyanna
May 29, 2012 3:58 PM

@Carol Putman
Even if a group or site is called private and everyone has to sign in to view content, what's to keep a another member from copying what he reads there and re-posting it anywhere else on the internet? It doesn't have to be with nefarious intent. Maybe he re-posts your stuff as an example of writing he considers noteworthy - it's still out there now for anyone or anything else to find.

Robin Clay
June 4, 2012 3:27 PM

*Always* regard *anything* you send via the Internet (or via mobile telephone) as secure as a postcard you send by mail. Except (I suppose) anything sent through an https site after log-in (Bank details, etc.). But I use https for FaceBook, so I guess even https is not necessarily secure.

Comments on this entry are closed.

If you have a question, start by using the search box up at the top of the page - there's a very good chance that your question has already been answered on Ask Leo!.

If you don't find your answer, head out to http://askleo.com/ask to ask your question.