P1WS Internet Marketing Blog

Remove Google Analytics Spam Once and For All

google-analytics-spam.png

Accurate Google Analytics metrics are crucial to any online organization, yet often times filtering out Google Analytics spam is either done ineffectively, partially, or even worse; not at all. This leads to inaccurate traffic data, bounce rates, page views, etc. With that said, there are a lot of myths floating around out there and partial solutions that often result in wasted time and more headaches down the road. I’ve created this straightforward guide with tips on successfully removing several different types of spam with a relative set it and forget approach.

As a starting point, it’s important to undo any incorrect filters you may have put in place previously. These include simple exclude filters for each spammer, server -side solutions, or referral exclusion lists for spam. These methods are extremely ineffective and can complicate things even further! Today I’ll show you an efficient way of dealing with the spam, but even more importantly, doing it in a way that won’t harm your data. It should be noted that as a best practice it’s a good idea to have at least 2 views for your campaign, one that is an unfiltered view, aka, a Do Not Touch view as well as a Master view. If you want to be super cautious you can also create a Test view to safely ensure you’ve set these filters up correctly.

The 3 main spam issues we will address are fake language spam, crawler spam, and the infamous ghost spam. Before we get started it’s important to note that you might not see these take full effect until about 24 hours. Let’s start with the big issue, ghost spam.

GHOST SPAM

This filter is the most fun to implement, as it will permanently remove all spam from fake cookie sites, sharebutton, spammers impersonating real sites like Google and USA Today, and most of the language spam. Two important things to note about this type of spam is that it actually never visits the site, secondly, it almost always hides under a fake hostname or not one at all which you can see in the respective reports as hostname ‘not set’. The unique part of setting up this filter is rather than an Exclude rule you’re going to set up an Include rule based on your valid hostname(s). The goal in setting up this filter is to only allow traffic from valid hostnames so that all ghost traffic will automatically be excluded. This is much more efficient than setting up a separate filter for each hostname and alleviates the need to constantly update your filters with newly developed ghost spam hostnames. To do this follow these three easy steps and make sure you’re selecting hostnames and not source.

  • Find Your Hostname(s) Typically your hostname will be your domain and any associated subdomains. To access a list of these go to the Network report under Technology and select Hostnames in blue in the top left corner of the report. Set the date back a few months as well. If you have more than one valid hostname you’ll want to create a list. Don’t worry about various versions with www versus non www as they are one in the same.
  • Set Your Hostname Expression Using the list of hostnames you’ve compiled create a regular expressions (REGEX) that contains all of them. Make sure to capture all your hostnames, otherwise you may lose valuable data. There are a few guidelines for building your expression:
      • Separate each hostname with a pole symbol | but do not include at the beginning or end of the expression. I’ve included an example here for reference.
      • Dots and hyphens are considered special character in REGEX so you’ll want to add a backslash right before them
      • Don’t leave any spaces
      • There is a 255 character limit with REGEX
  • Create Hostname Filter Once you’re sure you’ve set up the REGEX correctly it’s time to set up the filter and rid yourself of annoying ghost spam once and for all!
    • Go to Admin - Select view you want (Master or Test view)
    • Select Filters under the View column and ‘Add filter’
    • Create the Filter Name “Valid Hostnames”
    • Filter Type - Custom
    • Chose “include” and select hostnames
    • Copy and paste the hostname expression that you built
    • Verify filter (you may get a warning during this process that says there wasn’t enough data, double check your expression and filter settings are correct and proceed with saving)

CRAWLER SPAM

The main difference between ghost spam and crawler spam is that crawler spam is difficult to detect as it uses a valid hostname so you’ll need to set up a different filter with an expression that matches all know crawler spam. After some extensive research, I’ve been able to identify the most current and common crawler spam from the last few years and put together an optimized REGEX. With that said, it’s important to stay on top of these every now and again to be sure you’re continuing to filter out newly developed crawler spam. This time we are creating an Exclude filter as outlined below.

  • Create Exclude Filter
      • Under Admin tab - Select View - Filters - Click Add Filter
      • Create the Filter Name “Crawler Spam 1”
      • Filter Type - Custom
      • Filter Field - Campaign Source
  • Set Filter Pattern Copy and paste the following. You’ll need to create 2 different filters due to the REGEX character limits.
      • (best|dollar|success|top1)\-seo|(videos|buttons)\-for|anticrawler|^scripted\.|semalt|forum69|7makemon|sharebutton|ranksonic|sitevaluation|dailyrank|vitaly|profit\.xyz|rankings\-|dbutton|uptime(bot|check|\.com)
      • Datract|hacĸer|ɢoogl|responsive\-test|dogsrun|tkpass|free\-video|keywords\-monitoring|pr\-cy\.ru|fix\-website|checkpagerank|seo\-2\-0\.|platezhka|timer4web|share\-buttons|99seo|3\-letter|top10\-way
      • Verify filter (you may get the same warning during this process that says there wasn’t enough data. Don’t worry, if you set it up correctly it will work.)
  • Save

FAKE LANGUAGE SPAM

This is a fairly new form of Google Analytics spam and the hostname filter set previously should help with most of these; however, a few might get through. A simple secondary measure is to create another expression to filter any language that doesn’t have a proper format. Filtering out this spam is almost the same process as the crawler spam filtering, except you’ll want to change the filter name so you can differentiate between the various filters. Now you’re ready to enter the following expression:

  • Create Exclude Filter
      1. Under Admin tab - Select View - Filters - Click Add Filter
      2. Create the Filter Name “Language Setting”
      3. Filter Type - Custom
  • Set Filter Pattern Copy and paste the following:
    • \s[^\s]*\s|.{15,}|\.|,
    • Verify filter - during this verification you should be able to confirm immediately as the data set should be there. If done correctly you will see only fake languages on the left side of the preview table.

CONCLUSION

Cleaning up your analytics is a very important element in obtaining reliable data and key metrics, and should be a top priority for anyone responsible for online marketing initiatives. It should be noted that you may notice an artificial drop in traffic once these filters are applied, but remember, much of this so-called traffic wasn’t even hitting your site, it was just skewing your Analytics metrics behind the scenes. By applying these filters and other best practices as they relate to Google Analytics, you will be on the road to clean and trustworthy data!

  Koren   Posted in: Google Analytics
Comments(0)