Data

Inputting credit card data onto a laptop

Working with Digital Data Part 2 – Observational data

One of the most important changes brought about by the digital age is the availability of observational data.  By this I mean data that relates to an observation of actual online consumer behaviour.  A good example would be in tracing the journey a customer takes when buying a product.

Of course, we can also find a lot of online data relating to attitudes and opinions but that is less revolutionary.  Market Research has been able to provide a wealth of that kind of data, more reliably, for decades.

Observational data is different – it tells us about what people actually do, not what they think (or what they think they do).  This kind of behavioural information was historically very difficult to get at any kind of scale without spending a fortune.  Not so now.

In my earlier piece I had a look at attitudinal and sentiment related digital data.  In this piece I want to focus on observational behavioural data, exploring its power and its limitations.

Memory vs reality

I remember, back in the 90s and early 2000s, it was not uncommon to be asked to design market research surveys aimed at measuring actual behaviour (as opposed to attitudes and opinions). 

Such surveys might aim to establish things like how much people were spending on clothes in a week, or how many times they visited a particular type of retail outlet in a month, etc.  This kind of research was problematic.  The problem lay with people’s memories.  Some people can recall their past behaviour with exceptional accuracy.  However, others literally can’t remember what they did yesterday, let alone recall their shopping habits over the past week.

The resulting data only ever gave an approximate view of what was happening BUT it was certainly better than nothing.  And, for a long time, ‘nothing’ was usually the only alternative.

But now observational data, collected in our brave new digital world, goes some way to solving this old problem (at least in relation to the online world).  We can now know for sure the data we’re looking at reflects actual real-world consumer behaviour, uncorrupted by poor memory.

Silver Bullets

Alas, we humans are indeed a predictable lot.  New technology often comes to be regarded as a silver bullet.  Having access to a wealth of digital data is great – but we still should not automatically expect it to provide us with all the answers.

Observational data represents real behaviour, so that’s a good starting point.  However, even this can be misinterpreted.  It can also be flawed, incomplete or even misleading.

There are several pitfalls we ought to be mindful of when using observational data.  If we keep these in mind, we can avoid jumping to incorrect conclusions.  And, of course, if we avoid drawing incorrect conclusions, we avoid making poor decisions.

Correlation in data is not causation

It may be an old adage in statistics, but it is even more relevant today, than ever before.  For my money, Nate Silver hit the nail on the head:

“Ice cream sales and forest fires are correlated because both occur more often in the summer heat. But there is no causation; you don’t light a patch of the Montana brush on fire when you buy a pint of Haagan-Dazs.”

[Nate Silver]

Finding a relationship in data is exciting.  It promises insight.  But, before jumping to conclusions, it is worth taking a step back and asking if the relationship we found could be explained by other factors.  Perhaps something we have not measured may turn out to be the key driver.

Seasonality is a good example.  Did our sales of Christmas decorations go up because of our seasonal ad-campaign or because of the time of year?  If our products are impacted by seasonality, then our sales will go up at peak season but so will those of our competitors.  So perhaps we need to look at how market share has changed, rather than basic sales numbers, to see the real impact of our ad campaign.

Unrepresentative Data

Early work with HRT seemed to suggest that women on HRT were less susceptible to heart disease than other women.  This was based on a large amount of observed data.  Some theorised that HRT treatments might help prevent heart disease. 

The data was real enough.  Women who were on HRT did experience less heart disease than other women.

But the conclusion was utterly wrong.

The problem was that, in the early years of HRT, women who accessed the treatment were not representative of all women. 

As it turned out they were significantly wealthier than average.  Wealthier women tend to have access to better healthcare, eat healthier diets and are less likely to be obese.  Factors such as these explained their reduced levels of heart disease, not the fact that they were on HRT.

Whilst the completeness of digital data sets is improving all the time, we still often find ourselves working with incomplete data.  Then it is always prudent to ask – is there anything we’re missing that might explain the patterns we are seeing?

Online vs Offline

Naturally, digital data is a measure of life in the online world.  For some brands this will give full visibility of their market since all, or mostly all, of their customers primarily engage with them online.

However, some brands have a complex mix of online and offline interactions with customers.  As such it is often the case that far more data exists in relation to online behaviour than to offline.  The danger is that offline behaviour is ignored or misunderstood because too much is being inferred from data collected online.

This carries a real risk of data myopia, leading to us becoming dangerously over-reliant on insights gleaned from an essentially unrepresentative data set. 

Inferring influence from association

Put simply – do our peers influence our behaviour?  Or do we select our peers because their behaviour matches ours?

Anna goes to the gym regularly and so do most of her friends.  Let’s assume both statements are based on valid observation of their behaviour.

Given such a pattern of behaviour it might be tempting to conclude that Anna is being influenced by ‘herd mentality’. 

But is she? 

Perhaps she chose her friends because they shared similar interests in the first place, such as going to the gym? 

Perhaps they are her friends because she met them at the gym?

To identify the actual influence, we need to understand the full context.  Just because we can observe a certain pattern of behaviour does not necessarily tell us why that pattern exists.  And if we don’t understand why a certain pattern of behaviour exists, we cannot accurately predict how it might change.

Learning from past experiences

Observational data measures past behaviour.  This includes very recent past behaviour of course (which is part of what makes it so useful).  Whilst this is a useful predictor of future behaviour, especially in the short term, it is not guaranteed.  Indeed, in some situations, it might be next to useless. 

But why?

The fact is that people (and therefore markets) learn from their past behaviour.  If past behaviour leads to an undesirable outcome they will likely behave differently when confronted with a similar situation in future.  They will only repeat past behaviour if the outcome was perceived to be beneficial.

It is therefore useful to consider the outcomes of past behaviour in this light.  If you can be reasonably sure that you are delivering high customer satisfaction, then it is less likely that behaviour will change in future.  However, if satisfaction is poor, then there is every reason to expect that past behaviour is unlikely to be repeated. 

If I know I’m being watched…

How data is collected can be an important consideration.  People are increasingly aware their data is being collected and used for marketing purposes.  The awareness of ‘being watched’ in this way can influence future behaviour.  Some people will respond differently and take more steps than others to hide their data.

Whose data is being hidden?  Who is modifying their behaviour to mitigate privacy concerns?  Who is using proxy servers?  These questions will become increasingly pressing as the use of data collected digitally continues to evolve.  Will a technically savvy group of consumers emerge who increasingly mask their online behaviour?  And how significant will this group become?  And how different will their behaviour be to that of the wider online community?

This could create issues with representativeness in the data sets we are collecting.  It may even lead to groups of consumers avoiding engagement with brands that they feel are too intrusive.  Could our thirst for data, in and of itself, put some customers off?  In certain circumstances – certainly yes.  This is already happening.  I certainly avoid interacting with websites with too many ads popping up all over the place.  If a large ad pops up at the top of the screen, obscuring nearly half the page, I click away from the site immediately.  Life is way too short to put up with that annoying nonsense.

Understanding why

By observing behaviour, we can see, often very precisely, what is happening.  However, we can only seek to deduce why it is happening from what we can see. 

We might know that person X saw digital advert Y on site Z and clicked through to our website and bought our product.  Those are facts. 

But why did that happen?

Perhaps the advert was directly responsible for the sale.  Or perhaps person B recommended your product to person X in the bar, the night before.  Person X then sees your ad the next day and clicks on it.  However, the truth is that the ad only played a secondary role in selling the product – an offline recommendation was key.  Unfortunately, the key interaction occurred offline, so it remained unobserved.

Sometimes the only way to find out why someone behaved in a certain way is to ask them.

Predicting the future

Forecasting the future for existing products using observational data is a sound approach, especially when looking at the short-term future.

Where it can become more problematic is when looking at the longer term.  Market conditions may change, competitors can launch new offerings, fashions shift etc.  And, if we are looking to launch a new product or introduce a new service, we won’t have any data (in the initial instance) that we can use to make any solid predictions.

The question we are effectively asking is about how people will behave and has little to do with how they are behaving today.  If we are looking at a truly ground-breaking new concept then information on past behaviour, however complete and accurate, might well be of little use.

So, in some circumstances, the most accurate way to discover likely future behaviour is to ask people.  What we are trying to do is to understand attitudes, opinions, and preferences as they pertain to an (as yet) hypothetical future scenario.

False starts in data

One problematic area for digital marketing (or indeed all marketing) campaigns is false starts.  AI tools are improving in their sophistication all the time.  However, they all work in a similar way:

  • The AI is provided with details of the target audience.
  • The AI starts with an initial experiment,
  • It observes the results,
  • Then it modifies the approach based on what it learns. 
  • The learning process is iterative, so the longer a campaign runs, the more the AI learns, the more effective it becomes.

However, how does the AI know what target audience it should aim for in the initial instance?  In many cases the digital marketing agency determines that based on the client brief.  That brief is usually written by a human which should (ideally) provide a clear answer to the question “what is my target market?”

That tells the Agency and, ultimately, the AI, who it should aim for.

However, many people, unfortunately, confuse the question “what is my target market?” with “what would I like my target market to be in an ideal world?”  This is clearly a problem and can lead to a false start.

A false start is where, at the start of a marketing campaign, the agency is effectively told to target the wrong people.  Therefore, the AI starts by targeting the wrong people and has a lot of learning to do!

A solid understanding of the target market in the first instance can make all the difference between success and failure.

Balancing data inputs

The future will, no doubt, provide us with access to an increased volume, variety, and better-quality digital data.   New tools, such as AI, will help make better sense of this data and put it to work more effectively.  The digital revolution is far from over.

But how, when, and why should we rely on such data to guide our decisions?  And what role should market research (based on asking people questions rather than observing behaviour) play?

Horses for courses

The truth is that observed data acquired digitally is clearly better than market research for certain things. 

Most obviously, it is better at measuring actual behaviour and using it for short-term targeting and forecasting. 

It is also, under the right circumstances, possible to acquire it in much greater (and hence statistically reliable) quantity.  Crucially (as a rule) it is possible to acquire a large amount of data relatively inexpensively, compared to a market research study.

However, when we are talking about observed historic data it is better at telling us ‘what’, ‘when’ and ‘how’ than it is at telling us ‘why’ or ‘what next’.  We can only look to deduce the ‘whys’ and the ‘what next’ from the data.  In essence it measures behaviour very well, but determines opinion, as well as potential shifts in future intention, poorly. 

The role of market research

Question based market research surveys are (or at least should be) based on structured, representative samples.  It can be used to fill in the gaps we can’t get from digital data – in particular it measures opinion very well and is often better equipped to answer the ‘why’ and ‘what next’ questions than observed data (or attitudinal digital data). 

Where market research surveys will struggle is in measuring detailed past behaviour accurately (due to the limitations of human memory), even if it can measure it approximately. 

The only reason for using market research to measure behaviour now is to provide an approximate measure that can be linked to opinion related questions measured on the same survey.  To be able to tie in the ‘why’ with the ‘what’

Thus, market research can tell us how the opinions of people who regularly buy products in a particular category are different from less frequent buyers.  Digital data can usually tell us, more accurately who has bought what and when – but that data is often not linked to attitudinal data that explains why.

Getting the best of both data worlds

Obviously, it does not need to be an either/or question.  The best insight comes from using digital data in combination with a market research survey.

With a good understanding of the strengths and weaknesses of both approaches it is possible to obtain invaluable insight to support business decisions.

About Us

Synchronix Research offers a full range market research services and market research training.  We can also provide technical content writing services.

You can read more about us on our website.  

You can catch up with our past blog articles here.

If you like to get in touch, please email us.

Sources, references & further reading:

Observational Data Has Problems. Are Researchers Aware of Them? GreenBook Blog, Ray Poynter, October 2020

Covid in Numbers – why have some countries suffered more than others?

As vaccinations roll out, we are beginning to see some light at the end of the covid pandemic tunnel.  It will take a few months yet, but it seems almost unreal to think that by the end of 2021 we may finally be back to some kind of post-pandemic normality.

Now seems like an appropriate time to take stock.  What might we learn from the traumatic events of the past year?  We might ask ourselves the question – why is it that some countries appear to have faired so much worse with Covid than others?  How have some countries experienced relatively low death rates, whereas others have experienced such tragically high numbers?

The Worst Hit

If we take a look at the numbers, the worst hit of the larger countries include many European nations (eight from the top ten worst affected) as well as the USA and Mexico.  All ten have experienced more than 150 deaths per 100,000 population.  The worst affected at the time of writing is the Czech Republic, with over 230 deaths per 100,000.

Other countries have escaped relatively lightly.  Amongst the other European nations Germany has suffered significantly less – ie, experienced a death rate less than half that of countries like the UK, Belgium and Hungary.

Healthcare Quality

One thing we might look at is the quality of healthcare.  More developed countries generally have more established, advanced and comprehensive healthcare. That being the case, such nations should be better placed to deal with a pandemic such as covid.  Unfortunately, it is plain to see that there must be a lot more to it than this; with countries like the USA, UK and Italy all suffering badly despite their relatively advanced healthcare systems.

India has a comparatively small proportion of deaths (under 12 per 100,000 on official figures).  Despite this, India’s healthcare system is ranked only the 112th most efficient healthcare system in the world according to the WHO.  The USA is ranked 37th, the UK 18th and Italy 2nd.  Clearly there must be other factors at play.

One factor is potentially under-reporting.  One source estimated that this could mean that the true level of covid deaths is as much as five times larger than the official numbers in India.  However, even taking that into account, India’s death rates have still been significantly lower than those of the ten hardest hit nations.

Whilst the standard of healthcare has no doubt played some role here, there are clearly other aspects involved.

Population Demographics

One factor is population demographics.  Older patients are much more likely to become seriously ill and die from covid than younger ones.  Here India’s age demographics counts in her favour. 

Only 6% of India’s population is aged over 65.

Compare this to most European countries and the difference is striking – with around 20% of population in the hardest hit European countries being aged over 65.  Italy was the most vulnerable in this sense, with 23% aged over 65 before the pandemic hit.

Of the 10 hardest hit countries, 8 were nations where 19% or more of their populations were aged over 65.  The USA has a slightly younger demographic (16% over 65s) which would help to limit its vulnerability a little but is still clearly more exposed than somewhere like India.

Mexico represents the odd one out here.  Only 7% of Mexicans are aged over 65, giving the country a youthful demographic that is closer to that we see in countries like India.  We must therefore look for other explanations as to why Mexico has suffered so badly.

Urbanisation

Covid spreads best in environments where people live in close proximity to each other and, in general, people living in towns and cities are more likely to live in closer proximity with others.  Indeed, although India in general has seen lower death rates, it has nevertheless suffered more in major urban centres like Mumbai.

Many of the countries that have been worst hit have high levels of urbanisation which has likely contributed to higher death rates.  Belgium has an particularly high level of urbanisation (with 98% of its population living in urban environments), making it especially vulnerable in this sense.  Several other countries on the list have high urbanisation levels (80%+), namely the UK, USA, Mexico and Spain.  A country like India has much lower level of urbanisation overall (36%), which means its population is more widely dispersed and people in more rural environments are therefore less likely to come into frequent contact with others who might be affected.

Lockdowns

The lockdown measures taken by different countries at different times would also have an impact.  However, as these measures are often taken in response to the pandemic getting out of control in the first place, it is no surprise to find that many of the countries with the worst rates have had to impose longer and stricter lockdowns.

According to the Oxford Covid Government Response tracker those countries on our list that have taken the strictest measures for the longest periods of time over the course of the pandemic would include the UK and Italy.  This has not prevented either country from registering high rates however, although it has no doubt helped to prevent the problem getting even worse.

Based on these measures, those countries which have been laxer from this list would include Bulgaria (most notably), the Czech Republic, Hungary and Belgium.  So, it is possible that in these cases a more relaxed approach has contributed to a higher death rate.

Test and Trace

Another factor would be the efficiency of a country’s testing and tracing regime.  On this measure Mexico does especially badly, having only managed to test 41 out of 100,000 people in its population to-date – far fewer than any other country listed.

Nevertheless, the UK has now tested 1,585 people out of 100,000 – more than any other country on the table.  Despite this, the UK still has the third worst death rate overall.  But here the devil lies in the detail.  The UK has massively improved its testing regime over the course of the pandemic but, initially, the UK lagged behind somewhat.  During the first 60 days after the first five UK deaths the British managed to test just 23 people in 100,000.  This compares poorly to a number of other affected countries. 

Germany’s lower death rate overall is partly down to its test and trace efficiencies, especially during the early phase of the pandemic.  The Germans managed to test 37 people in every 100,000 during the first 60 days after their fifth death.

Of all the countries on the list, Mexico stands out as the most behind on test and trace at every stage of the pandemic.  No doubt this is a major reason as to why the country now ranks so highly in terms of death rates.

International Travel

Another factor is the level of international travel.  Countries that experience a large volume of people travelling through their airports and transport hubs are more likely to import covid from overseas. 

Of course, travel restrictions now apply across many nations but this was not always the case.  The UK and the USA would, under normal circumstances, see significantly more international traffic than most other countries.  And so they, along with Hungary, would have been most exposed to importing infection in the absence of strict border controls and quarantine measures.

The Czech situation

It is worth taking time to consider the Czech situation, since this country has experienced the most serious problems to-date. 

In terms of many of the risk factors nothing immediately stands out that would explain why it tops the list.  The level of urbanisation is high but not unduly so at 73%.  Likewise, its population demographic is not notably different from many other European countries (20% aged 65 plus).  It also receives limited international traffic compared to many other countries.

However, over the course of the pandemic its lockdown measures have been the second laxest of the ten worst affected countries.  It is also the case that its figures for test and trace do not appear as comprehensive as many others (although it appears to be testing a reasonable amount of people now).

According to Dr. Rastislav Maďar, the dean of the University of Ostrava’s medical school, the Czech situation can be attributed to three key mistakes.  The first of these was a failure to make mask wearing mandatory, the second a decision to open shops in the run up to Christmas and the third a failure to react quickly enough to the presence of new strains in the new year.

Key lessons learnt

Hopefully, it is clear to see that no single factor or measure can in and of itself entirely explain why any particular countries experiences a high death rate.  There are many factors working together in combination. 

However, the nature of the pandemic is such that it is clear that just a few missteps at any stage can very quickly lead to the situation rapidly deteriorating.  Hopefully, we can all learn from that and avoid making any future silly mistakes in the final stages of the pandemic.

About Synchronix

Synchronix is a full-service market research agency.  We believe in using market research to help our clients understand how best to prepare for the future.  That means understanding change – whether that be changes in technology, culture, attitudes or behaviour. 

We provide a wide range of market research and data services.  You can learn more about our services on our website.  Also, please check out our collection of free research guides for more information on specific services offers.

Sources

John Hopkins University https://coronavirus.jhu.edu/map.html

United Nations https://population.un.org/wup/Download/

Our world in data: https://ourworldindata.org/grapher/covid-stringency-index

Worldbank: https://data.worldbank.org/indicator/SP.POP.65UP.TO.ZS?name_desc=false&view=chart

ITV https://www.itv.com/news/2020-12-09/is-indias-covid-19-death-rate-five-times-higher-than-official-figures-suggest

CNN https://edition.cnn.com/2021/02/28/europe/czech-republic-coronavirus-disaster-intl/index.html

Scroll to Top