Web site spam: what can I do about it?

Question:

I have a personal web-site to help computer users that’s been running for
about 6 years. I have a guest book and people have been signing it for years.
Within the past year, though, I’ve been swamped with spammers signing my book.
I get about 6 to 10 spams each day. Each morning I delete them, but it is
getting worse by day.

I had tried to “hide” my guest book from the public and sacrifice the
ability to have people sign and my enjoyment in reading these. But even this
“hidden” page keeps getting spam.

How can I prevent spammers from signing my guest book? I’d appreciate your
comments and hopefully a solution to this annoying problem.

Oh, I have plenty of comments and opinions on this topic – it’s a problem I
face right here with Ask Leo!

But unfortunately, like spam in general, there’s no single answer – no magic
bullet.

Depending on your server and other specifics, there are several approaches
you can take.

Become a Patron of Ask Leo! and go ad-free!

Web spam, also known as “blog spam” or “comment spam”, is definitely on the
rise. Spurred by the popularity of Weblogs or blogs which allow people to post
comments, spammers are using these forms to post links back to their own sites.
The links aren’t really intended to server as advertising, per se, but rather,
to trick the search engines into thinking that the target site is more
important than it is, because of all the incoming links.

Regardless of why, it’s a mess.

There are two types of comment spam generation techniques: manual and
automated. Automated tools will scour the web looking for things that look like
comment or guest book forms, and automatically post their bogus content to
these forms. Manual tools involving hiring cheap labor overseas to do exactly
the same thing by hand.

While it started as comment spam on blogs, it’s most definitely no longer
limited to that. Almost any form that accepts input on the web is getting
hit.

As I said, there are various tools and techniques to combat comment or web
spam. Which technique might help you depends on how your form is set up, and on
what type of server, or publishing platform you might be running.

A very common technique is to use what’s called a “CAPTCHA” (“Completely
Automated Public Turing test to tell Computers and Humans Apart”). You’ve
probably seen them – they’re the often distorted characters that you’re asked
to re-type into the form before it will be accepted. As the name implies, it’s
a way to prevent automated tools from posting to your form. Unfortunately it
does nothing to stop actual humans.

If you’re running on a content management system like MovableType, WordPress
or others, then CAPTCHA may already be an option – either as a built-in
feature, or as a plugin for your platform. Unfortunately creating and using a
CAPTCHA test in the general sense is not all that trivial.

‘Which technique might help you depends on how your form
is set up, and on what type of server, or publishing platform you might be
running.”

However, if you’re using a standard HTML <form> to get your input, I
developed a technique that relies on JavaScript to throttle spam. In fact, it’s
a technique I use here on Ask Leo! with great success. It’s developed and
described for the MovableType publishing platform, but the technique is in fact
valid for any <form> based input. You can read more about it on my
MovableType Tips site: Dealing with Comment
Spam.

The drawback of this technique is that it requires that JavaScript be
enabled in order for people to post to your form. While most people do have it
turned on, there’s a percentage that do not, and you’ll have to decide if that
is important enough to you.

If you’re running an Apache-based web server and you have access to its
configuration, the mod_security module might be an
option. This module can be configured to monitor for terms and take action when
those terms are posted to your form. It’s something else I run on Ask Leo!’s
server, and as a result attempts to post a comment with certain
four-letter-words or certain spam-related phrases will simply be rejected.

Another technique I find myself using is for forms where I control the
script that processes the form input. Most notably, my ask a question page has been getting hammered of late with
various attempts at web spam. What I’ve done is simply make note of common
strings (typically the websites that are being linked to) and updated the code
to disallow posts containing those strings. (Apparently, being PHP based, it
bypasses mod_security.)

Both techniques that scan for strings require a certain amount of
maintenance. As spammers arrive attempting to promote new things, those things
need to get added to the disallowed list. However, if you’re willing
to completely disallow links in the content posted from valid users, then
disallowing the string “http:” would stop 99% of this type of spam.
Unfortunately that’s not something I can do, as many of the questions I get do
need to refer to specific web pages.

If you don’t have access to the levels of scripting or server configuration
that I’ve described here, then your next best bet is to investigate the
specific publishing platform you’re using. The spam problem is wide-spread, and
many of the popular platforms are implementing solutions of various types.

9 comments on “Web site spam: what can I do about it?”

Jon

September 4, 2006 at 7:49 pm

We had a similar problem with the Guest Book page my wife has on her web site – spammers got in with advertising material. We solved this problem (so far) by siging up for an e-Guestbook (Google them) account. Annual subscription is very low. Posters have to type in a “magic” number which beats automated posters. You are advised when a new post has been sent, and can vet it before allowing the post. As I said, it works so far.

We had a similar situation with the site’s discussion forum. We fixed this briefly be setting up a forum with phpBB, but recently, we started getting dozens of signups by “members” who are obviously pushing spam. More maintenance work to be done…
Mary

September 5, 2006 at 1:38 pm

Leo –
A few months ago I joined an online computer help forum sponsored by a major computer manufacturer. Within just a few days I began receiving (on average) 10 spam emails a day. Now it’s up to 20 a day. I’ve got my firewall, antivirus, and antispyware programs current and running. The spam has been directed to my bulk folder so apparently the filters are working and not sending the spam to my inbox.

But how did the spammers get *MY* email address in the first place? In order to access the forum I have to first go to http://www.computer company.com and then sign in from there. If members want to communicate privately, they can send messages via a separate link provided on the forum site. We never see each others actual email address. (Similar to how eBay allows people to communicate.) It’s not like my address is being posted by the computer manufacturer or the discussion forum… or is it?
Mary
Greg Bulmash

September 6, 2006 at 11:07 am

phpBB’s built-in CAPTCHA has either been cracked by spammers or the human-created phpBB spam is on the rise.

The problem with phpBB is that even if you require answering an e-mail to activate the account, or even if you go so far as to require manual administrator approval to activate an account, the moment someone signs up (BEFORE they’re activated), they end up in the member directory.

And if they’ve specified a homepage link in the form when they signed up, it’s linked from two places in the member directory. So they can give a fake e-mail address and never activate their account, but get linky goodness from just barraging your phpBB board with fake accounts.

This is why I have removed all my phpBB installations.
Mike

September 25, 2006 at 10:59 am

Maybe we’re going about this wrong? How does the spammers’ automated form search spider determine a page is a form they want to spam? Maybe there is a way to make the form page NOT look like a form they want to submit to. Are they looking for one which posts to a .cgi? In that case why not make the cgi extension .xyz and change your server .htaccess to execute .xyz like a .cgi?
Holly Wild

February 12, 2007 at 8:01 am

I get close to 100 spam e-mails from our comment forms on our site. How can I stop this …we use frontpage. The site is http://www.sjlounyinjurylaw.com its making us crazy!
Holly Wild

February 12, 2007 at 8:02 am

sorry the site is http://www.ajlounyinjurylaw.com…help please.
Leo Notenboom

February 12, 2007 at 9:49 am

—–BEGIN PGP SIGNED MESSAGE—–
Hash: SHA1

The article you commented on has my suggestions.

Good luck!

Leo
—–BEGIN PGP SIGNATURE—–
Version: GnuPG v1.4.6 (MingW32)

iD8DBQFF0KigCMEe9B/8oqERAuspAJ46l2DDKmqNMgJbc7ek/AvFhdzobgCfY2Er
kVSL6946NACOPC9+yXoZu1A=
=CYJd
—–END PGP SIGNATURE—–
David A

October 17, 2008 at 11:28 am

my very popular website (over 1 million viewers) is now getting sick sex postings (my site is a family site) how can I prevent it, my site is http://www.pennypincher.ca

The article you just commented on has my basic suggestions – they all involve modifying the website or website software to put up barriers to this type of thing. Unfortanately there’s no simply answer that just works – it depends heavily on the type of software that’s running the site.

– Leo
26-May-2008

audy

March 17, 2009 at 10:06 am

try spameat.com
it’s got a good concept.
everyone help each other to filter the web spam

Web site spam: what can I do about it?

Do this

9 comments on “Web site spam: what can I do about it?”

Leave a reply: