Dirty Social Media Data Is Misguiding Brands Tracking Consumer Behaviour [REPORT]

Must Read

Apple Inc. (AAPL) And Samsung Group (005930) Combined Smartphone Market Share Shrunk To Below 50% In Q1 2014: Huawei and Lenovo Rising Fast!

Apple Inc. (NASDAQ:AAPL) and  Samsung Electronics Co. Ltd. (KRX:005930) are the undisputed heavyweights of the smartphone world. Together, both brands accounted for 50%...

Apple is Being Evil For 1.5 Billion iPhone, iPad Users Worldwide

Last year when Apple Inc. (NASDAQ:AAPL) was accused of recording everything that iPhone users were talking through...

Uber India Layoffs 25% Workforce Citing Tough Market Conditions

After laying off thousands of employees in the US, Uber is scaling down its operations in India...

Brands today have to make data from social media inevitably a part of the marketing strategy. Every brand is aware that social listening is key to their success and profitability. Hence, brands are always listening social media and processing the humongous amount of data that they get from their social channels to analyze sentiment, traction, loyalty and many such factors that have come to define brands now. Whilst listening to social channels is key, pre-processing the big data that social media feeds channels is more critical. This is due to the fact that all data is not useful or valid and if used without filtering, can potentially pollute the sentiment analysis and lead to drastically misguided branding decisions. Let us take a look at what is the “dirt” that pollutes social data and try to analyze methods to cleanse data.

So where does the dirt come from? Based on a recent analysis of social media data by Networked Insights, nearly 10% of total the data from social media posts that brands analyze to understand their consumer’s behavior are not actually coming from real consumers. They come from non-consumers, these include social bots, celebrities, brand handles and inactive accounts. Spam is a particularly major concern with forums, which report up to 28% of all posts are from non-consumers.

percent of non-consumer type

Bots are scripts or programs that behave like persons posting on social media, but a closer study of their posting frequency and repetitive message content being dominated by links will reveal the truth about them. Sometimes celebrities are brand ambassadors and get paid to talk positively about brands on social media. Their accounts will have massive following and significant influence, but we can not add their posts into valid brand data. They are paid to post. Similarly, brand handles that belong to the company will post for the brand and competitors will post against the brand. These posts are also considered spam.


Social spam is a huge and considerably complicated problem when listening in on brand conversations; social media spamming grew by 658% in the last one year, some brands have reported that more than 90% of their recorded social media posts can be classified as spam. This is a very high percentage, given the sheer frequency and size of conversations on social media. Brands today are employing sophisticated methods and tools for analyzing social media to discover consumer insights and then make them into actionable marketing and branding decisions. But, if social data contains a large amount of spam, then the brands’ analyzes will not be accurate or actionable.

According to a recent New York Times article, 50% to 80% of a data scientist’s time now involves cleaning data. And really complex tools using Artificial Intelligence and Natural Language Processing are at the forefront of technologies employed by brands to clean data. Machine learning algorithms are used to identify spam. Networked Insights’ models with NLP capability can identify social spam with an accuracy of greater than 80% and have the ability to process millions of data points quickly.

percentage of total spams on social media

We also need to remember that Social Spam also includes posts, reviews or blog comments containing:

  1. Coupons – coupons, product listings, contests and giveaways
  2. Adult Content – adult or pornographic content
  3. General Spam – posts which contain gibberish or nonsense

Shopping, Finance and Technology have been identified as the top categories that contain maximum spam ranging from 13% to 10% of all conversations. While Sports, Science and Religion are the categories that contain less than 1% spam. Although the overall spam percentages are less than 10% across social media platforms, conversations for some brands are dominated by non-consumer data. And these brands have to employ more complicated methods to filter out spam.

So the conclusion is that spam and non-consumer generated posts are problems that cannot be ignored by brands. Doing so will skew data and give erroneous results in sentiment analysis. Important brand to brand comparisons can have unknown results due to differing amounts of spam occurring among brands and hence a right combination of Machine Learning, Natural Language Processing and Networked Learning algorithms need to be employed for cleaning out the dirt from the data. Sometimes we will see that data granularity and actionable trends improve greatly after cleaning. If you are a brand listening on Social Media, get a laundromat with the right algorithms before analyzing data.




Please enter your comment!
Please enter your name here

Latest News

Uber India Layoffs 25% Workforce Citing Tough Market Conditions

After laying off thousands of employees in the US, Uber is scaling down its operations in India...

Apple is Being Evil For 1.5 Billion iPhone, iPad Users Worldwide

Last year when Apple Inc. (NASDAQ:AAPL) was accused of recording everything that iPhone users were talking through Siri, the company was quick...

Is Bill Gates Developing Covid-19 Vaccine To Track Billions Of Users Worldwide?

Bill Gates is once again at the centre stage of controversy related to novel Coronavirus. Battling falsehood and paranoid...

Darkest Before Dawn: Can India Survive Its Worst Ever Recession?

Bolt your doors, batter down your hatches, brace yourselves. Recession is about to make landfall. According to Goldman Sachs, a...

Crisis Of Trust: The Glue Between Brands, Customers And Employees!

The COVID-19 crisis, which shows no sign of stopping any time soon, has left no life untouched in terms of impact. It...

WeWork Valuation: $2.9 Billion, Way Below Than Estimated $47 Billion 6 Month Ago

If you are thinking it is some kind of clickbait, you are highly mistaken. The debate on the valuation of WeWork once...

In-Depth: Dprime

Fantastic 4: Four Day Work Week A Flashpoint Of Innovation?

It has been an idea that has been mooted by many, perhaps also somewhat sceptically. From being a dark horse to becoming...

TikTok Is Facing The Wrath Of People Who Love It The Most

Ever since the popular social media app TikTok entered India, it has been growing very aggressively in terms of users. Within a...

Facebook Shops: Looking Beyond Ad Dollars!

Amid this global pandemic, when companies are struggling to find new verticals to pivot towards in order to maintain their revenue and...

More Articles Like This