Kuro5hin.org: technology and culture, from the trenches
create account | help/FAQ | contact | links | search | IRC | site news
[ Everything | Diaries | Technology | Science | Culture | Politics | Media | News | Internet | Op-Ed | Fiction | Meta | MLP ]
We need your support: buy an ad | premium membership | k5 store

[P]
Google and the Mysterious Case of the 1969 Pagejackers

By kpaul in MLP
Sun Feb 06, 2005 at 07:44:29 PM EST
Tags: Internet (all tags)
Internet

Lost a lot of traffic from Google recently? Slipping in the SERPs? You've heard of the Google Bomb, Google Whacking, the Google Dance and Googlisms, but there's a new Google-word you might be interested in if you're losing your Google Juice. What is this new term? I call it 302 Googlejacking. The problem seems to have been around since at least August 2003, and is commonly known as the Google 302 Pagejacking issue. To be fair, it has affected other search engines as well.

If you're curious, there are some things you can check that will show whether or not you've been afflicted by what some people call a bug at Google. (Who knows, maybe it's just a 'feature' we don't understand. ;) Also included are some side notes on the 1969 cache bug/feature that might be related, Google Update Allegra, a poll on the big three and more. If you're into SEO, read on.


ADVERTISEMENT
Sponsor: rusty
This space intentionally left blank
...because it's waiting for your ad. So why are you still reading this? Come on, get going. Read the story, and then get an ad. Alright stop it. I'm not going to say anything else. Now you're just being silly. STOP LOOKING AT ME! I'm done!
comments (14)
active | buy ad
ADVERTISEMENT
Googlejack Test

Type the following into the nearest Google searchbox:

allinurl:yourdomain.com

Look through the results. If you see a Title and Description that are identical to your site with the URL for another site underneath it, you may have been pagejacked, or as I say it, Googlejacked.

Technical Details:

Two primary types of redirects are used on the web - 301's and 302's. A 301 redirect means 'moved permanently.' This is the type of redirect you should use most of the time if you care about search engines. The other type, a 302 redirect, basically means 'moved temporarily.' When someone redirects to your site using the 302 method, Google seems to be associating their website (i.e. their URL) with your page (i.e. your CONTENT.)

Why Googlejack?

Before you fire off a cease & desist letter, keep in mind that a lot of people are doing this unknowingly by using link directory software that utilizes the 302 redirect for some reason. They may not know they're potentially harming your site by linking to it.

However, it's entirely possible (imho) for a nefarious, blackhat SEO to use a 302 redirect from a throwaway domain to negatively influence your website rankings. Yes Virginia, SEO SPAM scum exist and they'll try all they can to make money, no matter who they hurt. They live to try to game the search engines.

What to Do?

I haven't seen a lot of specifics on how to solve this problem yet, but it seems some people have had their site recover from being 302 Googlejacked. Personally, I'd recommend emailing the kind folks over at Google to let them know about the potential problem. Second, keep building new, relevant and interesting content. It's what the web's about. Finally, don't put all your eggs in one basket - try optimizing for the other two big engines (Yahoo and MSN) as well.

1969 Cache Hippies

During my quest to find out more about the 302 Googlejacking problem, I ran across another bug/feature that might be related to the 302 redirect problem. If you check the cache dates of some of the pages in Google, you'll notice a 1969 last indexed date. I can hear you now. "What!? Google wasn't invented until at least the 80s, man!" Before you fire off a memo to CBS, though, notice that this is most likely a default Unix date put there as a placeholder. While I've noticed that my Googlejacked pages also have the screwy 1969 cache date, others say the two things are not related. The theory I've read that I agree with most is that the two problems are somehow interacting with each and making things worse for webmasters everywhere.

Update Allegra

Once again named by Webmaster World, Google Update Allegra started on Feb. 2. The thread there already has over 30 pages. A lot to wade through, no? Well, there wasn't a lot of new info that I saw, just more rehashing of the old; wait for MSN, Google is broke, I notice no changes, I hate Google, I love Google, hey where's GoogleGuy, this is all because they're a public company now, this is all because of Adwords, this is all because of Adsense, etc. ad nauseum. There are some good observations, though. Learn to skim faster, grasshopper. What I pulled from the first 20 or so pages is that this update is attempting to fix some things from last December's update.

SE Wars Episode V: The (MS) Empire Strikes Back

While not a big player yet (pre-Longhorn), Microsoft's new MSN search is out on the web. While I still get nowhere near the amount of traffic I receive from Google, the MSN piece of the pie is growing, along with Yahoo. It's going to be an interesting couple of years in the search engine world.

Exhaustive List of Webmaster World Links (Thanks to claus at Webmaster World for most of these links):

Sponsors

Voxel dot net
o Managed Hosting
o VoxCAST Content Delivery
o Raw Infrastructure

Login
Make a new account
Username:
Password:

Note: You must accept a cookie to log in.

Poll
favorite search engine
o google 97%
o yahoo 0%
o msn 1%
o other (see below) 1%

Votes: 68
Results | Other Polls

Related Links
o Yahoo
o Google
o traffic
o SERPs
o Google Bomb
o Google Whacking
o Google Dance
o Googlisms
o Google-wor d
o Google Juice
o 302 Googlejacking
o August 2003
o Google 302 Pagejacking
o other search engines
o bug at Google
o feature
o SEO
o following
o 301 redirect
o 302 redirect
o cease & desist
o throwaway domain
o SEO SPAM scum
o game the search engines
o Yahoo [2]
o MSN
o quest
o might be related
o default Unix date
o Google Update Allegra
o December's update
o pre-Longho rn
o MSN search
o claus
o Incorrect URLs and Mirror URLs
o MSN Search PageJacking Critical Flaw
o Dupe content checker - 302's - Page Jacking - Meta Refreshes
o Is there a new filter?
o Big problem with Yahoo
o PR 7 - 0 and Address Nightmare
o Problem with Googlebot and robots.txt?
o Meta Refresh leads to ...
o weird link showing up for my site in Web results
o Google indexing redirect pages
o free hosting sites banned from google?
o Is using a redirect to track outward bound links bad?
o Our company Lisiting is being redirected.
o 302 Redirects showing ultimate domain
o Strange results in Allinurl
o Domain name mixup
o Using Redirects
o redesigns, redirects, & google -- oh my!
o Google Partial Indexing?
o Not sure but I think it is Page Jacking
o Unindexed URL Google Ranking Trick
o http://cli ck.fastsearch.com....
o Duplicate content - a google bug?
o Banner ad redirect-page indexed as mirror site by Google
o Indexed AlltheWeb pages causing Google duplicates
o Also by kpaul


Display: Sort:
Google and the Mysterious Case of the 1969 Pagejackers | 56 comments (29 topical, 27 editorial, 0 hidden)
What's the difference (3.00 / 4) (#1)
by Psychopath on Sat Feb 05, 2005 at 06:42:52 PM EST

between inurl and allinurl? And where can I find a complete reference of such keywords Google accepts?
Thanks!
--
The only antidote to mental suffering is physical pain. -- Karl Marx
so kpaul was at a party enojying himself (1.28 / 14) (#23)
by circletimessquare on Sun Feb 06, 2005 at 12:23:28 AM EST

and circletimessquare came up to him and annoyed the fuck out of him with a self-serving threadjack...

dude: what's the one most important thing someone can do to increase their pageranking on google?

does adwords work or is it a waste of money?

i know the answer is 20 pages long, i'm just looking for the 2 sentence answer

and not the whole "fresh content, make sure people link to you" answer either

help me obiwan-kpaulbee, you're our only hope

I'm making a Low Budget HDV Filipino Horror Movie in NYC

+1 FP kpaul!!! (nt) (1.00 / 13) (#29)
by Danzig on Sun Feb 06, 2005 at 02:44:12 AM EST



You are not a fucking Fight Club quotation.
rmg for editor!
If you disagree, moderate, don't post.
Kill whitey.
So google has been around since 1969? (1.14 / 7) (#31)
by trezor on Sun Feb 06, 2005 at 07:10:00 AM EST

Thats what you're saying?

/too long to read without contact lenses on


--
Richard Dean Anderson porn? - Now spread the news

I like the idea of suing them. (2.50 / 2) (#35)
by ubernostrum on Mon Feb 07, 2005 at 12:45:41 AM EST

Yes, we're a lawsuit-happy society, but people have apparently been bugging Google about this for several years with no response. Thus, we have two facts:

  1. Google's practices aid and abet malicious attacks intended to cause loss of business revenue.
  2. Google has been made aware that its practices aid and abet said attacks, and has taken no steps to stop this.

IANAL but I think that's sufficient to go into court and make Larry and Sergey sit up and pay attention...




--
You cooin' with my bird?
This Happened To Me (3.00 / 6) (#37)
by gusnz on Mon Feb 07, 2005 at 05:27:42 AM EST

Well, in a manner of speaking. Good article BTW!

Way back around the year 2000 I made a website listing some scripts I wrote. Being just under 18 (therefore possessing no credit card) and not really treating it like anything serious, I hosted it on a free provider (Tripod, IIRC) and signed up with the URI redirection service CJB.NET. The idea was that mysitename.cjb.net would do a frameset redirect (a.k.a. "URI Cloaking") to the chosen hosting provider. If you're unfamiliar with the concept, the redirector writes out a FRAMESET tag with a full-screen frame pointing at your actual host, so the redirect URL remains in the address bar and visitors bookmark it by default.

Fast forward 18 months: I wisened up and got a real domain for myself. The old mysitename.cjb.net was set to redirect to the new domain, and I put in some frameset-busting JavaScript to put the real domain in the browser location bar. The question is: what does a search engine do? How does it know the primary URI for your site?

It's clear that in the first case I intended the external redirector to be my site's primary URI. Google dutifully reported it as so in the search results. However, shortly after my switch in late 2001, Google figured out that my new domain was the intended URI of my site and switched to listing it.

However, the interesting part: several times in the previous few years, Google has randomly switched back to listing the frameset redirect as the primary URI for my site. This isn't such a major problem as I still control it; however, the behaviour is by no means limited to honest uses, and I can see a situation like the 302 redirects in the article evolving, with a malicious user hijacking a site by serving a frameset to Googlebot and cheap viagra to all other visitors.

So I tried to teach Google. Emailing customer support produced a "sorry we don't manually alter the index" response. Next up, a BASE HREF tag in the HEAD of my frontpage document containing my real domain. No luck.

Lastly, I put in a referral detector into my frontpage file. If the referrer was the site containing the frameset, issue another redirect pointing at my actual frontpage. This was actually done by pointing the frameset at my domain and appending a ?referrer=foobar query string to its frame URI; the script detected this and redirected to the frontpage sans query string with a "301 Moved Permanently" redirect. I can't therefore vouch if this would work by simply sniffing the HTTP_REFERER value for the offending site (in theory it should with framesets, but not HTTP 30x referrals as they maintain the original referrer in the HTTP headers). This seems to have worked for me though; Google now lists my real domain in its search results and has done so consistently since implementation. If anyone's in a simliar situation, feel free to try this.

So, in conclusion, I can see the logic behind Google's (and presumably other search engines') behaviour. Sometimes listing a redirect as a primary URI is a desired result. However, search engines should provide some means of specifying that a site returning a HTTP/200 code and document content is a primary URI of its own. A possible solution would be a META tag indicating that redirects to the page (whether by HTTP 30x, framesets, refreshes or any other technique) are to be ignored and treated as separate sites. Another solution would be for Google to give you a random number to embed into your page as a HTML comment; Google would then re-spider the page, verifying you are the owner, and then allow you to specify a "no redirects" option through its site.

(Here's hoping some Google engineers read K5 :).


[ JavaScript / DHTML menu, popup tooltip, scrollbar scripts... ]

I've been dumped (3.00 / 3) (#42)
by danny on Tue Feb 08, 2005 at 04:25:32 AM EST

Google referrals to both dannyreviews.com and danny.oz.au are down maybe 15% of normal since the last big update.

I thought it could never happen to me. Mine are old sites (five years at their current locations, ten years in total) with thousands of (unsolicited) incoming links, from places like the National Library of Australia and the GNU Foundation, to mention just a few. And I've never done anything clever - it's almost entirely hand-written static HTML, built up a few pages a week over years.

As an example of just how screwed up things are, consider a search "gamelan in australia" - my Australian gamelan directory is now ranked behind four pages whose only relevance to gamelan in Australia is that they link to my site!

Danny.
[900 book reviews and other stuff]

1969 (3.00 / 4) (#46)
by munro on Wed Feb 09, 2005 at 05:11:57 PM EST

1969 was the year of Woodstock, the Beatles' last public performance, the creation of ARPANET, the Soyuz and Apollo moon missions, and the first flight of the Concorde. And I'm guessing that midnight on 1 January 1970 GMT (which is what a Unix timestamp of 0 means) was 4pm on 31 December 1969 PST in Silicon Valley, home of Google.

Strange google results from just one website (none / 1) (#48)
by British1500 on Fri Feb 11, 2005 at 11:11:19 AM EST

I remember several months ago, every google search I did(obscure ones, notably 80s bands), I noticed an odd trend. No matter what search I did, in the first 5 results ended up with the same website, with possibly the page title of what I was searching. of course, I go to the url in question, and there isn't diddly squat. Just some page with links to click on for searching, for what I was searching for, and a ton of ads. Anyone know what I'm talking about? My memory is really hazy, and it may have been yahoo, but apparently there's one website with just about damn near every search engine string submitted in history, and it's at the top of every search engine result. Go you!

howto: fix the problem? (none / 0) (#52)
by kpaul on Tue Mar 22, 2005 at 05:31:56 PM EST

details here...


kpaul media

Google and the Mysterious Case of the 1969 Pagejackers | 56 comments (29 topical, 27 editorial, 0 hidden)
Display: Sort:

kuro5hin.org

[XML]
All trademarks and copyrights on this page are owned by their respective companies. The Rest © 2000 - Present Kuro5hin.org Inc.
See our legalese page for copyright policies. Please also read our Privacy Policy.
Kuro5hin.org is powered by Free Software, including Apache, Perl, and Linux, The Scoop Engine that runs this site is freely available, under the terms of the GPL.
Need some help? Email help@kuro5hin.org.
The bridges burst and twist around.

Powered by Scoop create account | help/FAQ | mission | links | search | IRC | YOU choose the stories! K5 Store by Jinx Hackwear Syndication Supported by NewsIsFree