Working with Digital Data Part 1 – Sentiment and Opinion

Pictorial representation of digital data

Our world is changing faster than ever before.  The digital data revolution is transforming the way we communicate, the way we shop and the way we live. 

One consequence of this is an explosion of data.  We’re collecting more and more of it with each passing month.  Data about our online behaviour, our interests, our opinions and just about every imaginable aspect of our daily lives. When cleverly used it provides a powerful and timely understanding of customer needs, leading to more effective decision making.

However, all this is not a silver bullet.  Understanding the challenges and limitations of digital data as well as its benefits is critical for getting the most out of it.

Differentiating between data types

Whether we are talking about digital data or data collected offline, we should start by drawing a distinction between observational data and attitudinal data.

  • Observational data:  This comes from the observation of behaviour.  In that sense it is an actual record of genuine consumer behaviour in a real-world setting.
  • Attitudinal data:  This encompasses any form of stated opinion or sentiment.  It represents the expressed preferences, likes and dislikes of customers.  It may or may not reflect actual behaviour, but it does represent a customer’s articulation of their view of the world.

What vs Why

In the past (especially pre-digital) measuring actual consumer behaviour through observation was very challenging.  Most brands only possessed partial data of this sort, perhaps only on an unrepresentative sub-set of their customers.  Such gaps could only usually be plugged (if at all) with expensive market research surveys.  Even then, the foibles of human memory limited the reliability of such information.

But now observational data collected digitally provides far greater granularity in terms of what consumers are doing online, as well as when, and how.

However, to fully understand such behaviour, we usually need to additionally explore why people are behaving in a certain way.  Sometimes observational data can help shed light on this.  We can deduce some of the reasons for specific decisions based on the relationships we find in the data.  However, sometimes this is not possible.  And sometimes it might be difficult to tell whether we are seeing causation or just correlation. 

This is where attitudinal data becomes important – where customers overtly tell us the ‘why’ behind their decisions.  This kind of data also has its limitations.  However, when used in combination with observational data, it can provide us with powerful insight.

Observational and attitudinal data are clearly quite different.  Each offers us different forms of insight.  Indeed, there is a lot to be said about both.  So, in this blog piece, I’ll be focusing mainly on attitudinal data (and leaving a discussion of observational data to my next piece).

Opinion and sentiment

In a digital context, attitudinal data will consist of consumer opinions expressed on social media, product reviews and recommendations and so forth.

Social media listening tools of various kinds can trawl through the web looking for this kind of data.  This builds up a picture of sentiment and draws attention to key opinions about brand image and performance. 

It can be very useful, but it clearly has potential drawbacks:

  • It may not be a representative view of the market.  Most social media platforms will have their skews and biases in terms of who engages with them.
  • There is a danger we’re listening to the opinions of a vocal minority.  But, what about the silent majority?
  • As a significant amount of information is not volunteered anonymously (as it would be on a survey) many people are only willing to say what they are happy to put their names to.  They will often avoid volunteering opinions they feel might be unpopular or controversial.

Despite this, it nevertheless offers clear advantages in terms of ease of access, data volume, and the relatively low cost of data acquisition. 

Using digital data to measure sentiment

But just how accurate is sentiment analysis using digital data of this sort?  How can we gauge its reliability?

For those on a tight budget who want some answers fast it might be the only option.  For this reason, it becomes more important to understand just how reliable it is.  And critically, how can we use it wisely?

The answer here is a complex one.  It rather depends on what digital data we are talking about and how we want to use it.  It would be as naïve to dismiss it as entirely as it would be to accept it unquestioningly.

Let’s take one simple question as an illustration: “Can sentiment data on social media be used to predict an election result?”  And, if we attempt to do so, is it any more (or less) accurate than an opinion survey?

Can Twitter predict elections?

In one German study published in Social Science Computer Review, in 2013, the conclusion was that Twitter was a poor predictor of overall population sentiment.  The Twitter population was found to contain significant biases that made it unrepresentative.  That, coupled with the crude nature of sentiment analysis tools at that time, rendered it a poor predictor for an election.

A more recent and comprehensive study (2020) has shown that things have moved on since then.  As sentiment analysis tools have improved and by using them in combination with socio-economic modelling, it was possible to predict the 2016 US election results in a single state with 81% accuracy. 

That’s an improvement but, as the study observed, there’s still a way to go.  Accuracy only became reasonable with the application of more advanced tools and modelling techniques.  Standard solutions available at the time of the study would not cut it.

The population biases in the Twitter community are still not fully understood and the data that can be harvested from it is incomplete.  It was also noted that sentiment analytics still struggles to cope with more nuanced comment.  So digital analytics and data harvesting is getting better but still has some way to go.

The value of attitudinal digital data

Although we might have a way to go before we can realistically use digital data to predict elections, that does not mean it has no merit (or that it won’t eventually get there).

It can still tell us a lot about what people are thinking and what they like/dislike about a particular product or service.  It can still help us form hypotheses about our brand image or the reasons behind customer satisfaction / dissatisfaction. 

Its value is even greater if we bear in mind that we are dealing with inherently anecdotal data.

But how can such a large volume of information be inherently anecdotal you may well ask? 

The reality is that opinions about products and services expressed online are representative only of the opinion of a vocal minority.  That does not mean it is not useful.  It can provide us with insight into the spectrum of different views and opinions about a particular brand or service.  In that sense it can do for us a similar job to a focus group (although it might potentially be less representative). 

So we can still make use such data, if we bear in mind that we are dealing with a self-selected sample and treat it accordingly.

Self-selection

One of the problems identified by studies of social media platforms is that active users on these platforms represent a sub-set of the population.  Potentially (but not necessarily always) this is a highly unrepresentative sub-set. 

To use information from such an audience, we need to understand the composition of the population and mitigate any built-in biases (a potentially complex task).  If we cannot, then it may well be misleading to attempt to use the data quantitatively.  But we can still use it qualitatively. 

By using it qualitatively, I mean it can give us a good idea of the spectrum of different opinions that exist out there in the market.  It cannot tell us how commonly held these opinions are in relation to each other in the wider population.  But it can give us a sense of the kind of things people are thinking and saying.

This, in fact, may well give us enough information to design a market research survey to quantify the prevalence of these opinions with a truly representative sample.

The digital data elephant in the room

A growing challenge in the digital world (and often the elephant in the room) is the impact of bots and false actors.  This might affect everything from measures of hits to websites, through to opinion spam, fake recommendations, and bogus ratings. 

As consumers become more aware of this phenomenon their behaviour will change accordingly.  Already some are now more wary of glowing product reviews.  Someone recently told me they now ignore five-star reviews as they are “probably written by employees of the company”.

Elon Musk’s recent controversial exit from his Twitter acquisition was ostensibly driven by his belief that up to 20% of Twitter accounts are fake (Twitter would argue it’s under 5%).  The true number is extremely difficult to measure and, of course, this problem does not only affect Twitter but all social media.

Bots and false actors

However, the absolute number of bots on a social media platform is not the key point here.  It is what the bots are doing that really matters.  For example, one analysis showed that the proportion of bots actively disseminating information about cryptocurrencies was far higher than the numbers engaged in discussions about cats.

We are not just talking about bots here.  There are also false actors to consider.  Such accounts are real people, but the identities are false (or misrepresented).  A simple example would be someone writing a review pretending to be a customer when they are the seller using a false identity. When it comes to matters of political opinion, false actors of this sort already represent a significant problem.

Distinguishing genuine public opinion from false actors (bots or human) is a challenge that needs continual vigilance.  It can muddy the waters of any attitudinal data appearing online.  My guess is that tackling this challenge will be one of the most important tasks facing the digital world over the next decade.

Missing information

One of the limitations of attitudinal data available online is simply that we are restricted to using what is available.  Here there are four specific constraints that brands may encounter:

  • People create the content they feel is important – not necessarily what brands need to know.  The fact that this content is a spontaneous offering of customer opinion is a real boon.  However, sometimes brands want to know things about their markets that few people are openly discussing.
  • Large, well-known, brands are likely to attract significant online comment.  They will therefore have access to a large pool of data from which they can draw insight.  However, smaller or more niche brands are less likely to be discussed online and therefore have access to less data.
  • Most online discussions concern the here and now and the immediately foreseeable future.  People aren’t going to comment on future products and services they are not yet aware of. 
  • Finally, people like to discuss things that interest and engage them online.  Hence there is more discussion of cats than there is of insurance policies.

Depending on the market, the brand and the category, the amount of valuable content available to harvest will clearly vary.  Sometimes there will be gaps.  Sometimes those gaps will be significant.

When we need to turn to market research

Despite its limitations, attitudinal digital data certainly has its place.  It may not be perfect but sometimes we don’t need perfect.  Sometimes getting a rough idea cheaply and quickly will give us 80% of what we need.

However, sometimes attitudinal digital data can’t give us what we need.  Sometimes it is simply too unrepresentative.  Sometimes it is too incomplete.  And sometimes the issues we need to know about are simply not being discussed.  That’s when we need to turn to market research.

When it comes to measuring sentiment and opinion, a market research survey provides the best way to fill in such gaps. 

Both market research and digital analytics have their place, and it is not an either/or choice.  Sometimes using both in combination will deliver the best results.

The power of ‘observed’ digital data

However, the real strength of the digital information available to us today lies in data that is purely observational rather than attitudinal. 

Measurement of actual consumer behaviour online, that leads directly to an online purchase, represents an irrefutable record of genuine buying decisions.  Such direct observation of buyer behaviour at scale provides us with a wealth of insight that was simply unavailable prior to the digital age.

Next time, I’ll be taking a closer look at observational data.  I’ll be considering what it can tell us, its strengths and the potential pitfalls we should look out for when working with it.

About Us

Synchronix Research offers a full range market research services and market research training.  We can also provide technical content writing services.

You can read more about us on our website.  

You can catch up with our past blog articles here.

If you like to get in touch, please email us.

Sources, references & further reading:

Can We Forecast Presidential Election Using Twitter Data? An Integrative Modelling Approach – Ruowei Liu

Trying to predict the election? Forget about Twitter, study concludes, Guardian 2016

How many bots are on Twitter? The question is difficult to answer and misses the point.  May 2022. The Conversation.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top