Helping people with computers... one answer at a time.
The best way to stay safe online is to consider that anything you post can be cached, seen, and even copied. Don't post anything you want kept private.
I just learned the phrase "Deep web" today and I'm reading an excellent Wikipedia article explaining that it's not just for criminals and terrorists. A few years ago for personal reasons, I chose to write a memoir style blog on perhaps the most popular social networking site then used for people over the age of 50. I chose this site not only because it was even more popular than Facebook, I chose it specifically because it's content was dynamically generated. Finally, I chose it because I knew not only that I had ready-made readers waiting but an accessible list of those who'd be interested in what I wrote.
I contacted the website company at which I was a paid member and checked to see if any search engine could cache what I wrote. I was told no. What I wrote was a very personal, festivist style airing of grievances. I received tons of email responses from my readers about the various grievances I aired all positive and supportive.
Now, Leo, learning about the Deep Web, I'm wondering if what that website customer service rep told me that the site was cache proof was valid. Are there search engines on the Deep Web? If so, are those search engines tailored to find criminals and terrorists or even regular people?
In this excerpt from Answercast #21, I look at the idea of a "Deep Web" and what information you might find about yourself there.
So there's a number of interesting issues here. I too had not heard the phrase "Deep Web," although I've certainly been familiar with several of the concepts.
Ultimately, here's my take on it. If an individual can view your web page without going through any special hoops like say, connecting to a special network, or entering a username and password, or anything like that, then I'm convinced there's a search engine that has already spidered it and potentially cached it.
There's nothing to protect it from that kind of activity.
What your admin may have been talking about is this thing called robots.txt. It's a file that website owners can place in their website that tells search engines what can and cannot be spidered, or can and cannot be cached. There's also information you can place on an HTML page that basically says the same thing. And that will keep the pages out of the popular search engines.
You will not find those pages in Google; you will not find those pages in Bing.
The problem is that techniques, like robots.txt, and the information that is placed in the HTML file is purely voluntary. It's not a technological solution. It's basically telling a search engine, "Hey, please don't index me. Please don't cache me."
Good search engines will do that; will respect that. Others may not. There's actually no requirement that they do so. It is simply a gentleman's agreement.
By the fact that your website, or your content, is available without any special hoops to jump through, I'm certain, I'm convinced that there are search engines, caching utilities, spiders out there that have and will continue to spider your site and cache its content.
There's really not a lot we can do about that.
We normally think of there only being a handful of search engines; you know, things like Google and Bing and maybe Yahoo back in the day. In reality, there are thousands and thousands of search engines and spiders crawling the web almost constantly.
So, in reality, the fundamental rule of the internet still applies.
Once you post something that is publicly accessible, for all practical
purposes, you have lost control over it. You can remove it, but you do not know
if somebody hasn't already cached it, made a copy of it, or reposted it
somewhere else. That's just the nature of how the public internet works; dark
or otherwise, that's just how it works.
Next from Answercast 21- If someone starts using my old email address could they find my information?
Comments on this entry are closed.
If you have a question, start by using the search box up at the top of the page - there's a very good chance that your question has already been answered on Ask Leo!.
If you don't find your answer, head out to http://askleo.com/ask to ask your question.