Filter Google Analytics Crawler Spam

Multiple data filters

Crawler spam is what happens when a bot crawls through your site and leave fake data. They might leave a fake referral just to get you to check out the referring website.

Social Network Traffic Spikes

Wowzors I must be improving my writing. Look at these spikes on social media!

In fact I've had this exact thing happen. I was looking at my analytics and thought someone mentioned one of my articles on Reddit because over a few day period I got several hundred visits. I wasted my time digging through analytics to try to find the exact post.

And in the end I came to this page:

Reddit Spam Article

A spammy link on Reddit that was listed as the referring page in GA

Reddit has already cleaned this up. They reported this as spam and it's pretty obvious now. But if you got hundreds of hits from this page wouldn't you be curious? Wouldn't you click the link to see how this relates to your site and the surge in traffic?

The whole goal is to leave fake referral data so you click on one of the spammy links. Crawler spam like this dirties your data and will influence any decisions you make based off of that data. We want to prevent any of this from actually being recorded so you don't click on spam links and so you can make smart decisions with your data.

Create a Filter

This process is going to be very similar to creating a filter for language spam. So I'll abbreviate the steps here:

  1. Log into Google Analytics
  2. Pull up a report for one of your views
  3. Click on Admin
  4. Click on Filters
  5. Click + Add Filter
  6. Fill in the following fields (see below for the Filter Pattern)Add Filter To View
  7. Click Save

Now the Filter Pattern is so long that we actually have to break this into two filters. So copy the first one in. And then repeat the steps for a second filter.

(best|dollar|success|top1)\-seo|(videos|buttons)\-for|anticrawler|^scripted\.|semalt|forum69|7makemon|sharebutton|ranksonic|sitevaluation|dailyrank|vitaly|profit\.xyz|rankings\-|dbutton|uptime(bot|check|\.com)

datract|hacĸer|ɢoogl|responsive\-test|dogsrun|tkpass|free\-video|keywords\-monitoring|pr\-cy\.ru|fix\-website|checkpagerank|seo\-2\-0\.|platezhka|timer4web|share\-buttons|99seo|3\-letter|top10\-way

I have to give kudos to Carlos Escalera for compiling this list.

The Challenge With Crawler Spam

Crawler spam is really hard to detect. It can look identical to a browser requesting information and they can send identical data to Google Analytics.

The only way to filter out this data to is use a list of known spam website referrers which is what we're doing above.

The downside of using known spam websites is that spammers can keep making new ones and your filter won't catch them. It can feel a bit like whack-a-mole.

The good news is that while crawler spam is hard to prevent it's less common than you think. It requires a lot more resources than ghost spam, where a program sends information directly to Google Analytics without actually crawling your website.

Don't worry about preventing 100% of crawler spam. It's impossible. But at least by filtering out the most common known sources you're going to drastically reduce it.

Verify Your Filter

It's always a good idea to verify your filter. Make sure you don't have a typo that will eliminate legitimate data.

You can do this before you press the save button. You should either see a list of spam being filtered out, or since the verify button uses a small subset of data you might see the following error message:

This filter would not have changed your data. Either the filter configuration is incorrect, or the set of sampled data is too small.

As long as you don't see legitimate data you're good to go.

Happy filtering!

4 thoughts on “Filter Google Analytics Crawler Spam

  1. have you ever seen GA spam that appends a prefix or suffix to a valid hostname?

    On my All Pages report, I’m seeing stuff like:

    cdc10-www.valid-hostname.com
    http://www.valid-hostname.com.pk

    Another view I have of the same site only shows the sub-directory of the URLs (starting with / ) and it isn’t showing any of these.

    Any help would be appreciated!

    Vic

  2. Hi Patrick, these posts you’re doing on analytics spam are very useful. With regards to crawler spam I’ve used a few 3rd party services in the past: referrer-spam.help, paveiq.com/referrer-spam-remover/ (both free), and also analytics-toolkit.com.

    You have to allow access to your account but then they add new filters as and when.

    Have you had any experience with these? Of course one is relying on them keeping their eye on the ball.

    • When I was helping WooCommerce with this we used a 3rd party to setup & configure our analytics. I like the idea of a service.

      I’m using these filters on a small number of accounts but if I wanted to update anything across all accounts that would be super useful. Honestly for $35 a month you only need 1 consistent client for something like analytics-toolkit to be worth it.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.