Working with Digital Data Part 2 – Observational data

Inputting credit card data onto a laptop

One of the most important changes brought about by the digital age is the availability of observational data.  By this I mean data that relates to an observation of actual online consumer behaviour.  A good example would be in tracing the journey a customer takes when buying a product.

Of course, we can also find a lot of online data relating to attitudes and opinions but that is less revolutionary.  Market Research has been able to provide a wealth of that kind of data, more reliably, for decades.

Observational data is different – it tells us about what people actually do, not what they think (or what they think they do).  This kind of behavioural information was historically very difficult to get at any kind of scale without spending a fortune.  Not so now.

In my earlier piece I had a look at attitudinal and sentiment related digital data.  In this piece I want to focus on observational behavioural data, exploring its power and its limitations.

Memory vs reality

I remember, back in the 90s and early 2000s, it was not uncommon to be asked to design market research surveys aimed at measuring actual behaviour (as opposed to attitudes and opinions). 

Such surveys might aim to establish things like how much people were spending on clothes in a week, or how many times they visited a particular type of retail outlet in a month, etc.  This kind of research was problematic.  The problem lay with people’s memories.  Some people can recall their past behaviour with exceptional accuracy.  However, others literally can’t remember what they did yesterday, let alone recall their shopping habits over the past week.

The resulting data only ever gave an approximate view of what was happening BUT it was certainly better than nothing.  And, for a long time, ‘nothing’ was usually the only alternative.

But now observational data, collected in our brave new digital world, goes some way to solving this old problem (at least in relation to the online world).  We can now know for sure the data we’re looking at reflects actual real-world consumer behaviour, uncorrupted by poor memory.

Silver Bullets

Alas, we humans are indeed a predictable lot.  New technology often comes to be regarded as a silver bullet.  Having access to a wealth of digital data is great – but we still should not automatically expect it to provide us with all the answers.

Observational data represents real behaviour, so that’s a good starting point.  However, even this can be misinterpreted.  It can also be flawed, incomplete or even misleading.

There are several pitfalls we ought to be mindful of when using observational data.  If we keep these in mind, we can avoid jumping to incorrect conclusions.  And, of course, if we avoid drawing incorrect conclusions, we avoid making poor decisions.

Correlation in data is not causation

It may be an old adage in statistics, but it is even more relevant today, than ever before.  For my money, Nate Silver hit the nail on the head:

“Ice cream sales and forest fires are correlated because both occur more often in the summer heat. But there is no causation; you don’t light a patch of the Montana brush on fire when you buy a pint of Haagan-Dazs.”

[Nate Silver]

Finding a relationship in data is exciting.  It promises insight.  But, before jumping to conclusions, it is worth taking a step back and asking if the relationship we found could be explained by other factors.  Perhaps something we have not measured may turn out to be the key driver.

Seasonality is a good example.  Did our sales of Christmas decorations go up because of our seasonal ad-campaign or because of the time of year?  If our products are impacted by seasonality, then our sales will go up at peak season but so will those of our competitors.  So perhaps we need to look at how market share has changed, rather than basic sales numbers, to see the real impact of our ad campaign.

Unrepresentative Data

Early work with HRT seemed to suggest that women on HRT were less susceptible to heart disease than other women.  This was based on a large amount of observed data.  Some theorised that HRT treatments might help prevent heart disease. 

The data was real enough.  Women who were on HRT did experience less heart disease than other women.

But the conclusion was utterly wrong.

The problem was that, in the early years of HRT, women who accessed the treatment were not representative of all women. 

As it turned out they were significantly wealthier than average.  Wealthier women tend to have access to better healthcare, eat healthier diets and are less likely to be obese.  Factors such as these explained their reduced levels of heart disease, not the fact that they were on HRT.

Whilst the completeness of digital data sets is improving all the time, we still often find ourselves working with incomplete data.  Then it is always prudent to ask – is there anything we’re missing that might explain the patterns we are seeing?

Online vs Offline

Naturally, digital data is a measure of life in the online world.  For some brands this will give full visibility of their market since all, or mostly all, of their customers primarily engage with them online.

However, some brands have a complex mix of online and offline interactions with customers.  As such it is often the case that far more data exists in relation to online behaviour than to offline.  The danger is that offline behaviour is ignored or misunderstood because too much is being inferred from data collected online.

This carries a real risk of data myopia, leading to us becoming dangerously over-reliant on insights gleaned from an essentially unrepresentative data set. 

Inferring influence from association

Put simply – do our peers influence our behaviour?  Or do we select our peers because their behaviour matches ours?

Anna goes to the gym regularly and so do most of her friends.  Let’s assume both statements are based on valid observation of their behaviour.

Given such a pattern of behaviour it might be tempting to conclude that Anna is being influenced by ‘herd mentality’. 

But is she? 

Perhaps she chose her friends because they shared similar interests in the first place, such as going to the gym? 

Perhaps they are her friends because she met them at the gym?

To identify the actual influence, we need to understand the full context.  Just because we can observe a certain pattern of behaviour does not necessarily tell us why that pattern exists.  And if we don’t understand why a certain pattern of behaviour exists, we cannot accurately predict how it might change.

Learning from past experiences

Observational data measures past behaviour.  This includes very recent past behaviour of course (which is part of what makes it so useful).  Whilst this is a useful predictor of future behaviour, especially in the short term, it is not guaranteed.  Indeed, in some situations, it might be next to useless. 

But why?

The fact is that people (and therefore markets) learn from their past behaviour.  If past behaviour leads to an undesirable outcome they will likely behave differently when confronted with a similar situation in future.  They will only repeat past behaviour if the outcome was perceived to be beneficial.

It is therefore useful to consider the outcomes of past behaviour in this light.  If you can be reasonably sure that you are delivering high customer satisfaction, then it is less likely that behaviour will change in future.  However, if satisfaction is poor, then there is every reason to expect that past behaviour is unlikely to be repeated. 

If I know I’m being watched…

How data is collected can be an important consideration.  People are increasingly aware their data is being collected and used for marketing purposes.  The awareness of ‘being watched’ in this way can influence future behaviour.  Some people will respond differently and take more steps than others to hide their data.

Whose data is being hidden?  Who is modifying their behaviour to mitigate privacy concerns?  Who is using proxy servers?  These questions will become increasingly pressing as the use of data collected digitally continues to evolve.  Will a technically savvy group of consumers emerge who increasingly mask their online behaviour?  And how significant will this group become?  And how different will their behaviour be to that of the wider online community?

This could create issues with representativeness in the data sets we are collecting.  It may even lead to groups of consumers avoiding engagement with brands that they feel are too intrusive.  Could our thirst for data, in and of itself, put some customers off?  In certain circumstances – certainly yes.  This is already happening.  I certainly avoid interacting with websites with too many ads popping up all over the place.  If a large ad pops up at the top of the screen, obscuring nearly half the page, I click away from the site immediately.  Life is way too short to put up with that annoying nonsense.

Understanding why

By observing behaviour, we can see, often very precisely, what is happening.  However, we can only seek to deduce why it is happening from what we can see. 

We might know that person X saw digital advert Y on site Z and clicked through to our website and bought our product.  Those are facts. 

But why did that happen?

Perhaps the advert was directly responsible for the sale.  Or perhaps person B recommended your product to person X in the bar, the night before.  Person X then sees your ad the next day and clicks on it.  However, the truth is that the ad only played a secondary role in selling the product – an offline recommendation was key.  Unfortunately, the key interaction occurred offline, so it remained unobserved.

Sometimes the only way to find out why someone behaved in a certain way is to ask them.

Predicting the future

Forecasting the future for existing products using observational data is a sound approach, especially when looking at the short-term future.

Where it can become more problematic is when looking at the longer term.  Market conditions may change, competitors can launch new offerings, fashions shift etc.  And, if we are looking to launch a new product or introduce a new service, we won’t have any data (in the initial instance) that we can use to make any solid predictions.

The question we are effectively asking is about how people will behave and has little to do with how they are behaving today.  If we are looking at a truly ground-breaking new concept then information on past behaviour, however complete and accurate, might well be of little use.

So, in some circumstances, the most accurate way to discover likely future behaviour is to ask people.  What we are trying to do is to understand attitudes, opinions, and preferences as they pertain to an (as yet) hypothetical future scenario.

False starts in data

One problematic area for digital marketing (or indeed all marketing) campaigns is false starts.  AI tools are improving in their sophistication all the time.  However, they all work in a similar way:

  • The AI is provided with details of the target audience.
  • The AI starts with an initial experiment,
  • It observes the results,
  • Then it modifies the approach based on what it learns. 
  • The learning process is iterative, so the longer a campaign runs, the more the AI learns, the more effective it becomes.

However, how does the AI know what target audience it should aim for in the initial instance?  In many cases the digital marketing agency determines that based on the client brief.  That brief is usually written by a human which should (ideally) provide a clear answer to the question “what is my target market?”

That tells the Agency and, ultimately, the AI, who it should aim for.

However, many people, unfortunately, confuse the question “what is my target market?” with “what would I like my target market to be in an ideal world?”  This is clearly a problem and can lead to a false start.

A false start is where, at the start of a marketing campaign, the agency is effectively told to target the wrong people.  Therefore, the AI starts by targeting the wrong people and has a lot of learning to do!

A solid understanding of the target market in the first instance can make all the difference between success and failure.

Balancing data inputs

The future will, no doubt, provide us with access to an increased volume, variety, and better-quality digital data.   New tools, such as AI, will help make better sense of this data and put it to work more effectively.  The digital revolution is far from over.

But how, when, and why should we rely on such data to guide our decisions?  And what role should market research (based on asking people questions rather than observing behaviour) play?

Horses for courses

The truth is that observed data acquired digitally is clearly better than market research for certain things. 

Most obviously, it is better at measuring actual behaviour and using it for short-term targeting and forecasting. 

It is also, under the right circumstances, possible to acquire it in much greater (and hence statistically reliable) quantity.  Crucially (as a rule) it is possible to acquire a large amount of data relatively inexpensively, compared to a market research study.

However, when we are talking about observed historic data it is better at telling us ‘what’, ‘when’ and ‘how’ than it is at telling us ‘why’ or ‘what next’.  We can only look to deduce the ‘whys’ and the ‘what next’ from the data.  In essence it measures behaviour very well, but determines opinion, as well as potential shifts in future intention, poorly. 

The role of market research

Question based market research surveys are (or at least should be) based on structured, representative samples.  It can be used to fill in the gaps we can’t get from digital data – in particular it measures opinion very well and is often better equipped to answer the ‘why’ and ‘what next’ questions than observed data (or attitudinal digital data). 

Where market research surveys will struggle is in measuring detailed past behaviour accurately (due to the limitations of human memory), even if it can measure it approximately. 

The only reason for using market research to measure behaviour now is to provide an approximate measure that can be linked to opinion related questions measured on the same survey.  To be able to tie in the ‘why’ with the ‘what’

Thus, market research can tell us how the opinions of people who regularly buy products in a particular category are different from less frequent buyers.  Digital data can usually tell us, more accurately who has bought what and when – but that data is often not linked to attitudinal data that explains why.

Getting the best of both data worlds

Obviously, it does not need to be an either/or question.  The best insight comes from using digital data in combination with a market research survey.

With a good understanding of the strengths and weaknesses of both approaches it is possible to obtain invaluable insight to support business decisions.

About Us

Synchronix Research offers a full range market research services and market research training.  We can also provide technical content writing services.

You can read more about us on our website.  

You can catch up with our past blog articles here.

If you like to get in touch, please email us.

Sources, references & further reading:

Observational Data Has Problems. Are Researchers Aware of Them? GreenBook Blog, Ray Poynter, October 2020

Leave a Comment

Your email address will not be published.

Scroll to Top