Dirty Social Media Data Is Misguiding Brands Tracking Consumer Behaviour [REPORT]

Must Read

Apple iPhone 12: Not For India And You Must Not Fall Prey To Apple’s Marketing Machine

The cat is out from the bag, finally! Apple iPhone 12 has launched in the most sophisticated...

Musk Slashes Tesla Car Price Twice in One Week, Served With A Side of His Wacko Humour

Time is witness that Elon Musk and eccentricities come along as a combo package.

Micromax is Back, Sets Eyes On Xiaomi’s Crown

A fallen pioneer, banished from its own motherland by collective foreign forces, has finally roused itself up!

Brands today have to make data from social media inevitably a part of the marketing strategy. Every brand is aware that social listening is key to their success and profitability. Hence, brands are always listening social media and processing the humongous amount of data that they get from their social channels to analyze sentiment, traction, loyalty and many such factors that have come to define brands now. Whilst listening to social channels is key, pre-processing the big data that social media feeds channels is more critical. This is due to the fact that all data is not useful or valid and if used without filtering, can potentially pollute the sentiment analysis and lead to drastically misguided branding decisions. Let us take a look at what is the “dirt” that pollutes social data and try to analyze methods to cleanse data.

So where does the dirt come from? Based on a recent analysis of social media data by Networked Insights, nearly 10% of total the data from social media posts that brands analyze to understand their consumer’s behavior are not actually coming from real consumers. They come from non-consumers, these include social bots, celebrities, brand handles and inactive accounts. Spam is a particularly major concern with forums, which report up to 28% of all posts are from non-consumers.

percent of non-consumer type

Bots are scripts or programs that behave like persons posting on social media, but a closer study of their posting frequency and repetitive message content being dominated by links will reveal the truth about them. Sometimes celebrities are brand ambassadors and get paid to talk positively about brands on social media. Their accounts will have massive following and significant influence, but we can not add their posts into valid brand data. They are paid to post. Similarly, brand handles that belong to the company will post for the brand and competitors will post against the brand. These posts are also considered spam.


Social spam is a huge and considerably complicated problem when listening in on brand conversations; social media spamming grew by 658% in the last one year, some brands have reported that more than 90% of their recorded social media posts can be classified as spam. This is a very high percentage, given the sheer frequency and size of conversations on social media. Brands today are employing sophisticated methods and tools for analyzing social media to discover consumer insights and then make them into actionable marketing and branding decisions. But, if social data contains a large amount of spam, then the brands’ analyzes will not be accurate or actionable.

According to a recent New York Times article, 50% to 80% of a data scientist’s time now involves cleaning data. And really complex tools using Artificial Intelligence and Natural Language Processing are at the forefront of technologies employed by brands to clean data. Machine learning algorithms are used to identify spam. Networked Insights’ models with NLP capability can identify social spam with an accuracy of greater than 80% and have the ability to process millions of data points quickly.

percentage of total spams on social media

We also need to remember that Social Spam also includes posts, reviews or blog comments containing:

  1. Coupons – coupons, product listings, contests and giveaways
  2. Adult Content – adult or pornographic content
  3. General Spam – posts which contain gibberish or nonsense

Shopping, Finance and Technology have been identified as the top categories that contain maximum spam ranging from 13% to 10% of all conversations. While Sports, Science and Religion are the categories that contain less than 1% spam. Although the overall spam percentages are less than 10% across social media platforms, conversations for some brands are dominated by non-consumer data. And these brands have to employ more complicated methods to filter out spam.

So the conclusion is that spam and non-consumer generated posts are problems that cannot be ignored by brands. Doing so will skew data and give erroneous results in sentiment analysis. Important brand to brand comparisons can have unknown results due to differing amounts of spam occurring among brands and hence a right combination of Machine Learning, Natural Language Processing and Networked Learning algorithms need to be employed for cleaning out the dirt from the data. Sometimes we will see that data granularity and actionable trends improve greatly after cleaning. If you are a brand listening on Social Media, get a laundromat with the right algorithms before analyzing data.




Please enter your comment!
Please enter your name here

Latest News

Tesla First Cancels Return Policy And Now Cuts Warranty Period

Just last week Elon Musk surprised everyone by cutting the price of Tesla Model S twice in...

Personalization Is The Secret Sauce Behind A Successful E-Commerce Business

E-commerce personalization offers an exclusive experience to consumers by showing them product recommendations, content catered to their interests, and offers based on...

Reliance Jio Set To Blitz The 5G Smartphone Market With Jaw-Dropping Price

Cometh the revolution, cometh Reliance. This time the price of 5G smartphones under the radar of Reliance. The trailblazer’s...

Micromax is Back, Sets Eyes On Xiaomi’s Crown

A fallen pioneer, banished from its own motherland by collective foreign forces, has finally roused itself up! Micromax has...

The Future of The Workplace And Retraining in 2020 And Beyond

The pandemic has upturned businesses, lives, and even the outlook of our future. It has caused millions to lose their jobs, and...

Musk Slashes Tesla Car Price Twice in One Week, Served With A Side of His Wacko Humour

Time is witness that Elon Musk and eccentricities come along as a combo package. Wednesday’s announcement was no different....

In-Depth: Dprime

Will ‘TikTok By Microsoft’ Be A Winner?

For the last two years, TikTok has been in the public eye for all sorts of reasons. First, it was the exploded...

Facebook Subscription Model: Looking Beyond Ad Dollars?

Seldom do job listings create a stir this gripping. However, when the job listing in question is a stealth post from Twitter,...

Will The Online Food Delivery Market in India End Up Becoming A Two-Horse Race?

It's pretty much evident that the food delivery space in India is all set to get riled up soon enough as one...

More Articles Like This