Advanced segments and YoY comparison = wonky data output in GA.

I recently came across an odd thing in Google Analytics which, although only a minor frustration, took me a moment to figure out… not immediately obvious.

I was comparing year on year data using four advanced segments and looking at the headline change in share of visits for each segment.

 

 

When applying segments in Google Analytics it clearly shows what the corresponding splits are as a percentage of all site visits, something one would naturally want to see when segmenting data… you’re bound to want to know what the weight is of each segment and if it’s something worth worrying about.

Sample data above demonstrates how clearly this info. is displayed. So far so good.

Then came the moment when I wanted to see if the weight of each segment had changed from one year to the next.

 

Looking at the second sample data set above all seems fine, for example UK visits as a percentage of all site traffic has decreased by -10.72%… Europe excluding UK has increased by 9.38%. Not so.

At the time I was checking this data I happened to have already done these calculations in a dashboard for a client and my YoY variation data looked quite different. After a bit of brow rubbing followed by two minutes in excel the obvious finally occurred to me, Google Analytics is not showing the percentage change expressed as it normally does by calculating one as a percentage variance of the other but in fact it is simply subtracting one from the other, in other words it’s giving the change in percentage points. Something very different.

Looking at the percentage point change the difference is comparatively slight and if you were using the method (i.e. advanced segments) to look at year on year change in something like organic traffic landing on a particular page that you had spent time optimising you might think your effort (and money) had been wasted and give up or worse still change what you are doing.

The lesson to me… it always pays to take a second look at your GA data when you’re using advanced segments.

Posted in Analytics | Tagged , , | Leave a comment

Why I’m still a little sceptical about attribution modelling in Google Analytics (and elsewhere)

Recently I’ve been experimenting with the attribution modelling feature which is steadily being rolled out across the free version of Google Analytics. I’ve been doing this with a particular task in mind and, as I’ve been working with it, there’s been  a nagging feeling in the back of my mind that all is not as (I think) it should be.

Attribution is of course all about understanding the true contribution of each referring source of traffic to a website in the context of a specific goal or objective.

 

A little background
GA’s attribution modelling tool provides ‘out of the box’ models based on first click, last click, linear, time decay, last non direct click, last adwords click and so on. It also allows users to create their own custom models based on any of the above. These models can be used to compare sources of traffic to see which deliver the most conversions based on the type and strength of credit value assigned to them (I’ll assume a basic level of knowledge on the part of the reader around the structure of those standard models).

 

A word on credit and value
GA’s various attribution models assign value to each channel according to the amount of credit due based on the model in use. So if you have a multi channel conversion funnel like this…

Paid search > organic search > email > affiliate (conversion step)

…and you were comparing the last click model with the first click and linear models then last click would assign 100% of all credit for a conversion to the affiliate channel, first click would assign 100% of all credit for a conversion to paid search and linear would assign an even 25% of credit to each of the four channels.

The role of credit issuance (for want of a better way to put it) is important because it can also be applied to custom models in two different ways:

  1. Using the position based model and assigning a percentage of credit across the first, last and middle positions in the conversion journey
  2. Assigning a credit level to a custom rule.

 

A sample problem
Say your marketing mix comprises both paid search and affiliate channels (and no doubt much more); you want to understand the true value of your affiliate marketing but at the same time you’re concerned about double paying for sales leads that include both sources in the acquisition funnel. In theory attribution modelling should help shed light where there was once darkness, the trouble is I’m not sure it does, at least not when using Google Analytics’ AM feature.

By comparing data from up to three different models against your preferred dimension, let’s say by medium, you can see how each source performs against the base model.  The base model can be changed (as in the example below – first click) but by default it’s last click in GA.

By looking at the two comparative models it becomes possible to understand which model delivers best in terms of conversion volume compared to the base. The data below shows three models being compared side by side; from left to right the base is first click and the two comparative models are the standard linear model and a custom linear model where affiliate traffic has been given a value of zero i.e. it has been completely devalued.

The data above shows that paid search (cpc) is a gainer with the custom linear model showing a smaller decline in the volume of conversions against the base compared to the standard linear model i.e. -8.86% compared to -12.03% respectively. In a sense it’s saying that without affiliate traffic in the marketing mix we can expect paid search to do a better job based on a standard linear model of attribution.

 

The fishy part
The trouble is this assumes that while completely devaluing the contribution from the affiliate source of traffic we will still harvest exactly the same volume of conversions and in doing so all that’s happening is the credit, which was once ascribed to affiliate traffic, is now being reallocated to other sources. In reality however, if we completely devalue the contribution from affiliate we might expect the actual volume of conversions to reduce  on the basis that affiliate must to some extent be responsible for some incremental conversions, at the very least in situations where affiliate was the ONLY referring channel – see below….

…unfortunately Google Analytics doesn’t seem to account for this. A quick tally of the total volume of conversions against the two comparison models shows the numbers are exactly the same, this logic doesn’t make sense. Where the affiliate channel was the only channel responsible for the referral (and arguably even in situations where it was the first touch channel) the associated conversions should be subsequently excluded from the dataset. How else could they have occurred?

Additionally, what we cannot tell from attribution modelling is sentiment i.e. which channel, source, medium or combination thereof was responsible for tipping the consumer into making a purchasing decision. It’s the age old problem of analytics data, we have the ‘what’ – sort of in this case – but not the ‘why’. Was it the initial tag line in the paid search add, was it the meta description copy in the organic listing or was it the discount voucher offer from the affiliate link?

Currently there doesn’t seem to be a clear answer but as things stand I would caution against using insight from GA’s attribution modelling tool to completely overhaul the marketing mix.

Posted in Analytics | Tagged | Leave a comment

Google’s multi armed bandit.

Google’s switch from Website Optimiser (GWO) to Content Experiments (GCE) has made the testing process significantly easier and, in so doing, the effort of site optimisation more fun and, importantly, more productive.

Until now the level of effort required to set up an experiment using Website Optimiser was a barrier to action for many, by contrast Content Experiments makes use of the standard Google Analytics tag that is already set and as a result cuts out much of the configuration hassle associated with GWO.

However, there are differences between the two systems. One which has drawn some attention is the way in which GCE manages tests as they progress. In the example data below the split of visits between the control page and the variation page is roughly 74/26 in favour of the control page, when it started out it was a straight 50/50 split. There has been some anguish from other GCE users who have experienced the same issue.

Why is this happening?

Google have evolved their testing methodology from a straight A/B test in which, throughout the duration of the test, each of the two variants is given an equal measure of the sample traffic to a methodology referred to as a Multi Armed Bandit approach. Without going into the origins of the name (read about it here) suffice it to say this methodology effectively looks to optimise for performance as the experiment progresses based on accumulated results.

GCE will take a snapshot of test performance every day and adjust the sample traffic based on performance against the designated goal. When one or other of the pages starts to show itself as the better performer GCE will tweak the split of sample visits to favour the better performing page.

 

The trended sample data above shows how this appears to be working. On the more extreme days which have been highlighted with black arrows, where conversion falls against one variant and rises against the other it is easier to see how the following day reflects with a corresponding shift in apportioned visits. In this example there is still the obvious shift which occurred 10 days in where the sample split changed radically. It would seem that there are two likely reasons for this.

  1. GCE needs to accumulate a statistically significant set of test data before making any changes to sample split.
  2. In this particular case the sample size was only 5% at the start of the test but it was increased to 10% on the second day and then 25% on day seven. By increasing the sample size this will impact statistical validity.

Accepting (for now) that the approach has been correctly implemented by Google, the benefit to this approach is that downside potential is reduced as sample traffic is diverted away form the under performing variant and over to the better performing variant.

 

 

Posted in Analytics, Testing | Tagged | Leave a comment

Designing for the iPad

In most cases a “full version” website looks basically the same on an iPad as it does on a desktop, notebook or netbook PC, flash notwithstanding. Evidence to support that the iPad screen format is not really an issue comes from the fact that conversion rates for iPad tend to be the same or sometimes even a little better than the site average.

Navigation is one of the most important aspects of site design and conversion rate optimistation and it is here that a site can let down iPad users. Take www.forbes.com; …works well on a PC and also defaults nicely to a mobile site for browsers using a phone but the iPad sits somewhere in between the two and as with many other sites, the Forbes.com default for browsers using an iPad is the main site not the mobile site.

Forbes also use drop down menus in the global navigation bar that reveals when a cursor is hovered over the top, visitors then click on the link they want. iPad users do not have a cursor and they will be waiting a long time if they go for the finger hovering option. They must by default tap on an ara of the screen where they want to take action. Because, in the case of the Forbes site, the main menu only reveals when a cursor is hovered over it and because a tap on the relevant section of the menu bar only serves to refresh the page (see below), users with an iPad will be left frustrated.

This is not to say that the hover option is generally bad, just that for the benefit of visitors using an iPad when a navigation option that leads to an expanded menu is tapped on it should do just that, i.e. expand the menu and not refresh the page.

Posted in Design | Tagged | Leave a comment

Read this if you run an international site and you use Google Translate on it.

If you use Google Analytics (GA) and you use Google’s Translate tool on your site to make the user experience easier for visitors from non English speaking countries then it’s just possible that the translator is skewing some of the output data in your GA account.

When a visitor selects the Google Translate tool on your site it will refresh the page and in the process of doing so it’ll cause the original referrer data to be stripped out of the GA cookie and replaced with the ‘Direct’ referrer source, this is what happens to GA when it can’t identify a referring sources which is effectively what it thinks is happening here.

To be sure, trend your data back to a month or so prior to the date when Google Translate was installed on the site, look at the Direct source of visits and see if it increased on the same day, equally check conversion, if there’s an increase in Direct visits then there’ll probably be a decrease in conversion. It’s also conceivable that the increase in Direct visits is actually artificially adding more visits to your overall visit count, i.e. for every visitor that uses the the translate tool they will register two visits in GA instead of one.

All this means that the conversion for your non English speaking traffic may be understated.

This analysis was carried out in collaboration with Digital Nation.

Posted in Analytics | Tagged | Leave a comment

Don’t hide behind data

I recently read a quote as follows: “We are making fact-based decisions in less time with more accuracy and less emotion….” I think its all good apart from that last bit which makes me shiver …and less emotion… It’s as if in order to be good business people emotion has to be scrubbed clean from the system in a way that turns us into automatons that hide behind data.

Google (not the source of the quote I should add) must be one of the leading “poster child” companies that relies very heavily on data to make decisions and it’s clear that Google is a hugely successful organisation, but at the same time its had some spectacular flops, er… Wave.

One would assume that in making the business decisions about which products to pursue Google relies heavily on the data provided in the business case and yet here they are having retrenched on many of their products (Google Places being the latest casualty) in order to focus on a few.

Apple, by contrast, paired back  to only a few products long ago because Steve Jobs wanted to on the basis that he thought it was the right thing to do, and as is now fairly well known, Mr Jobs was no fan of focus groups. No doubt he also relied on data but in addition he used his years of experience and his instinct to deliver [almost universally desirable] products that turned Apple into the one of the world’s most valuable companies. As Henry Ford commented, if he had asked his customers what they wanted they would have answered “a faster horse”.

For a company that sells web analytics services this may seem and odd thing to write about , but web analysts should do more than just gaze at data; it’s their responsibility to do as much as possible to understand the businesses they are working in or contracted to, by which I mean the operational, economic and environmental issues surrounding it. They should imagine themselves as the customer and actually use the product in as far as it is practical to do so. Then they should take this experience and filter it back through all the data channels they use and in doing so combine the subtlety of human understanding with the harder facts revealed in the data to hopefully come out with a more rounded response.

Posted in Business culture | Tagged | Leave a comment

Measuring social and why it should still come back to something tangible.

I recently heard an advertisement on the radio entreating  listeners to “Like” the advertiser’s Facebook page, in exchange for doing so the listener stood the chance to win something – or they got a reward of some form or other. How odd!!

Facebook’s like button is a coveted prize amongst marketeers within some companies. Presumably, the theory being that the more “Likes” the greater the exposure across the Facebook network and the greater the brand advocacy. This seems like an anathema; to advertise a promotion to promote a promotional opportunity through advocacy… it doesn’t even sound right when I write it down. The issue is people will do things if they result in a desirable reward for very little effort. The point about brand advocacy and social networks is that friends trust the judgement of each other precisely because they share the same values. If those values can be bought by a canny marketeer then, over time, the value of that person’s judgement will decline.

“Liking” or “+1ing” is still relevant but perhaps more so when applied universally around the web and not just on Facebook pages. Don’t just view a “Like” or a “+1″ as an end goal in itself and don’t ask somebody to “Like” your thing in exchange for something else because in the end the whole system of advocacy will become compromised.

I’ve been asked by non ecommerce businesses how they can measure the various blog posts and articles that they carry on their sites, this is where these buttons come in to their own. Track them using analytics and correlate the data with other business data, i.e. does an increase in the number of Likes, +1s and Tweets on articles and blog posts correlate with an increase in new customer contacts.

Posted in Analytics | Tagged | Leave a comment

Getting more out Google Analytics’ commerce module

The ecommerce module in Google Analytics has a specific set of parameters which offer the analyst a greater level of insight if they are configured. They include product, product category and SKU,….. there are others for delivery address etc but they aren’t relevant here.

Not all businesses sell products that have a SKU code, also GA does not actually require a SKU code to be added to this field, in fact SKU is really just a label applied to this variable, almost anything can be put in it.

For a couple of our clients, Imagine Ireland and Luton Airport SKU codes are not relevant but there are other things we’d like to know. Imagine Ireland rent holiday cottages in Ireland and London Luton derive revenue from car parking, both of these products are purchased on a date range basis i.e. a start and finish date for parking and rental. We used the SKU code variable to bring in the start date of the cottage rental or parking period and by doing this we are now able to see what kind of lead times customers are working to when making their bookings which in turn helps with our email re-marketing strategy.

This is just one example of how we’ve used variables in the ecommerce module differently.

Posted in Analytics | Tagged | Leave a comment

Analytics data makes its way into the c-suite

Burberry recently announced that its digital commerce sales had continued to “out perform” in the last financial year with overall retail sales, which include online, up by 31%. It credited much of this performance to innovation in its marketing.

At a recent event in London the CEO of Burberry, Angela Ahrendts, started quoting some basic web metrics such as average time on the Burberry site, she even knew that it went up for visitors that viewed a specific area of content. It doesn’t matter that the data she was quoting was the most simplistic of metrics, what matters was that she was aware of them at all.

Web analytics and digital performance optimisation is 50% data and analysis and 50% taking action. In companies of any size where CEOs have the even the most basic web analytics data at their fingertips or in their heads, there is clearly a top down culture of using this data to take action; the results at Burberry speak for themselves and hopefully demonstrate to others in the same position as Ms Ahrendts how relevant this stuff is.

Posted in Business culture | Leave a comment

Google Analytics’ Performance View – Beware.

For anybody using the Performance view charting function in Google Analytics, here is something that you should know: if you are using segments or filtering with your data there is a very high chance the Performance charting percentages will be inaccurate and misleading

Take this example. The data below shows New and Returning visits expressed as percentages of all site visits, so far so good, nothing wrong with that. The percentages all look fine.

Now see what happens when this same data is segmented based on ‘non-paid search traffic’:

You don’t need a calculator to see that 260 is not 40.94% of 332 or that 72 isn’t 11.34% of 332. In this case the difference is easy to spot, but when the difference is slightly more marginal it might take a second glance and therein lies the issue.

Google are generally good at visualising data and they continue to develop Google Analytics as a tool that is easy to use and easy to extract insight from; they have done a lot to democratise web analytics and, as a result, it has become accessible to both analysts and non-analysts alike.

The performance chart is very helpful in attributing importance based on weighting, so it’s useful to be able to glance at it and get a quick indication of, for example, which landing pages or which referring sources of traffic are most significance. But, if the relevant data visualisation feature stops working in certain circumstances then the benefit is lost and at worst the wrong insight taken.

Fair enough, it’s a free tool and one shouldn’t carp on too much, but excusing this kind of bug on the grounds that its free is a bit like lending somebody your car and forgetting to tell them that the headlights don’t work properly and then excusing it by telling the borrower that at least s/he managed to save money.

The good news is that Google know about this, the not such good news is that it appears to be quite a long way down their list of things to fix on GA, so for those who like this feature, just beware.

Posted in Analytics | Tagged , | Leave a comment