One of the most effective ways to block spam is also one of the most complex. You can setup a filter to only record data that uses a valid host name. Now a hostname is the name of the website you're visiting (aka your website domain) so you should be able to list them on your fingers.
I took a look at my hostnames over the last month and only had 5
And actually all of these look pretty legit. But if I go back and look at my traffic over the last year I do see some spam.
That's 125 visits using an obviously spammy host name. And we should try to remove those to get better data.
The Problem With Inclusive Filters
Some analytics experts recommend an inclusive filter. You only record traffic from valid hostnames. This is a very strong spam filter because you don't have to have a long list of spam links. You just have to have a handful of valid hostnames.
I get the desire to do this but it comes at the risk of accidentally filtering good data. Google Analytics is a tool that I use monthly. I don't open it every day. So if we embed an e-commerce solution into our site like Gumroad or Shopify those would have different hostnames and wouldn't be included in our analytics. And for e-commerce this is disastrous because we wouldn't record any sales.
Knowing myself I have a 99% chance of forgetting to add a host name to Google Analytics.
And when you look at the cost benefit of doing so it doesn't make sense. We're only excluding 125 visits. That's 0.06% of my traffic.
So I'm going to go with an exclusive filter. It's not bullet proof and a tincy amount of spam can get through but you also won't shoot yourself in the foot. If I do notice more hostname spam I can always add more spam hostnames.
I'm much more comfortable with this.
Creating a Regular Expression for Hostname Spam
We're going to create a filter. But before we do that we need to create a regular expression to matches our spam hostnames. A regular expression if you've never used one is basically pattern matching. They can be hard to read but they're very powerful.
I use a tool called RegExr. I put in some valid hostnames and some spammy ones. And then you want to make sure that RegExr only highlights the spam host names.
You can also use my regular expression as a starting point and test with your own hostnames.
Here's what I ended up using:
Creating an Exclusive Hostname Filter
Before creating any filter I always recommend creating a backup view in your Google Analytics. That way even if you complete mess up something you'll have an untouched backup view. It takes 5 minutes and doesn't cost you anything so it's totally worth it.
Similar to the language filter let's set up a filter. We can do this by:
- Log into Google Analytics
- Pull up a report for one of your views
- Click on Admin
- Click on Filters
- Click + Add Filter
- Give your filter a name.
- Set the type to Custom and then Exclude
- Filter Field should be set to Hostname
- Copy and paste the regular expression you created earlier into the Filter Pattern field
- Click Verify this filter to make sure no actual traffic is accidentally filtered. You should only see spam or a message saying there's no difference (which is fine since it only uses a subset of data)
- Click Save
Now just in case you have to it's always smart to leave an annotation in your Analytics so you know when you implemented this
And your done. Now Google Analytics will filter your hostname spam and you'll have cleaner and more useful data.
Happy filtering. 🙂