TABLEAU THURSDAYS – The Google Analytics Sampling Googly!

TABLEAU THURSDAYS – The Google Analytics Sampling Googly!

Facing weird / random numbers while connecting to a Google Analytics data source from Tableau and don’t have a clue why? Or have you gone one step further and know that it’s a sampling issue when you try to pull >500K records from GA? At your wits’ end on what to do and looking for a workaround? Read on!

As part of continuing enhancements to a web analytics dashboard we built for a global hardware communications technology client, I recently tried connecting directly to GA using the Tableau Google Analytics data connector. We were populating a 6 month view of Sessions across multiple Channels and try as I might, I could not get the numbers exported from the GA site and stored in an excel sheet on a monthly basis to match the numbers I was pulling through the Tableau connector. I triple checked the dimensions and measures but no go – oh, Tableau! My heart skipped a few beats, wondering how a direct connector introduced in Tableau 8 (http://www.tableausoftware.com/solutions/google-analytics) could be failing to return the correct values!

Interestingly, the numbers were only correct when I pulled exactly 1 month’s data. A half-hour of research later, it was clear that the problem wasn’t at the Tableau end. Unless you are using a Google Analytics Premium Account, GA does not allow you to pull >500K records of unsampled data. Sampling in Google Analytics (or in any analytics software) refers to the practice of selecting a subset of data from your traffic and reporting on the trends detected in that sample set.
(Source: https://support.google.com/analytics/answer/1042498?hl=en)
To know more about how sampling in Google Analytics works, read: https://support.google.com/analytics/answer/2637192.

The business implications of this cannot be taken lightly – while the numbers might vary in 10s/100s sometimes, there are many occasions where the results returned from GA post sampling are completely incorrect.

So what can you possibly do to ensure that your dashboard reflects the correct values while not using a Premium account? We simply pulled the data for each month and created an extract for each month’s data – these extracts were then appended to a “Master Extract” that was used by our dashboard. The steps we followed are detailed below:

1. Connect to GA using the Tableau connector and pull 1 month’s data, making sure to include all Dimensions and Measures required for your visualization(s).Tableau Google Analytics - diagram1

2. An extract is automatically created – this is a temporary extract that lives as long as your workbooks is open and can be found either at

  • C:\Users\<user>\Documents\My Tableau Repository\Datasources
    or in
  • Temp folders that can be found by searching for .tde files in C:\Users\<user>\

 

3. Copy this extract to a separate folder and rename it as “Source_Month1.tde”

4. Repeat Steps 1-3 for the 2nd

 

5. Connect to the “Source_Month1.tde” and “Source_Month2.tde” Data Extracts.Tableau Thursdays - diagram 2

6. Navigate to the “Add Data from File” option for the “Source_Month1” data source (to which all worksheets point to), and add the “Source_Month2.tde” extract file.diag3

7. You should receive a xxxxxx data rows successfully added message and the “Source_Month1” data source now has data for both Months 1 and 2. Rename this data source as the “Master Source.”

8. Repeat steps 1-3 again and save the extract as a “Master Extract” to which all other subsequent months data can be added (repeat Steps 5, 6)

9. The “Master Source” (and consequently the “Master Extract”) now has data for all months for analysis, and more importantly, is UNSAMPLED!

The caveat with this method of course is that you have to repeat the process for every month of your analysis / reporting. One of the things I love about Tableau is that there are a number of unique ways you can achieve the same results. Feel free to leave a comment below if this technique helped, the steps were difficult to understand or if you have employed your zen master skills to use a better technique 🙂

This blog is written by Farid Jalal, Analytics Project Manager at BRIDGEi2i

About BRIDGEi2i: BRIDGEi2i provides Business Analytics Solutions to enterprises globally, enabling them to achieve accelerated business impact harnessing the power of data. Our analytics services and technology solutions enable business managers to consume more meaningful information from big data, generate actionable insights from complex business problems and make data driven decisions across pan-enterprise processes to create sustainable business impact. To know more visit www.bridgei2i.com

Connect with us:
facebook BRIDGEi2i on twitter BRIDGEi2i on LinkedIn BRIDGEi2i on Google+ BRIDGEi2i on YouTube

 

The views and opinions expressed in this article are those of the author and do not necessarily reflect the official position or viewpoint of BRIDGEi2i.

 

 

Related Posts

Comments (1)

[…] the last blog post dealing with Google Analytics and Tableau, we identified random numbers while connecting to a Google Analytics data source from Tableau as a […]

Leave a comment