Dirty Social Media Data Is Misguiding Brands Tracking Consumer Behaviour [REPORT]

Must Read

Facebook Reveals Big Plans Behind The WhatsApp JioMart Integration In India

After Facebook Inc. (NASDAQ:FB) acquired a minority stake of 9.99% in Jio Platforms for a whopping $5.7 billion in...

Can CBD Gummies Be Beneficial For College Students?

College students often feel more pressure than they have ever—or will ever—feel in their entire lives. Many...

Facebook Launches Music Videos to Eat into YouTube’s Market

With the launch of Music Videos, Facebook has made another competitive move against its biggest opponent, Google.

Brands today have to make data from social media inevitably a part of the marketing strategy. Every brand is aware that social listening is key to their success and profitability. Hence, brands are always listening social media and processing the humongous amount of data that they get from their social channels to analyze sentiment, traction, loyalty and many such factors that have come to define brands now. Whilst listening to social channels is key, pre-processing the big data that social media feeds channels is more critical. This is due to the fact that all data is not useful or valid and if used without filtering, can potentially pollute the sentiment analysis and lead to drastically misguided branding decisions. Let us take a look at what is the “dirt” that pollutes social data and try to analyze methods to cleanse data.

So where does the dirt come from? Based on a recent analysis of social media data by Networked Insights, nearly 10% of total the data from social media posts that brands analyze to understand their consumer’s behavior are not actually coming from real consumers. They come from non-consumers, these include social bots, celebrities, brand handles and inactive accounts. Spam is a particularly major concern with forums, which report up to 28% of all posts are from non-consumers.

percent of non-consumer type

Bots are scripts or programs that behave like persons posting on social media, but a closer study of their posting frequency and repetitive message content being dominated by links will reveal the truth about them. Sometimes celebrities are brand ambassadors and get paid to talk positively about brands on social media. Their accounts will have massive following and significant influence, but we can not add their posts into valid brand data. They are paid to post. Similarly, brand handles that belong to the company will post for the brand and competitors will post against the brand. These posts are also considered spam.


Social spam is a huge and considerably complicated problem when listening in on brand conversations; social media spamming grew by 658% in the last one year, some brands have reported that more than 90% of their recorded social media posts can be classified as spam. This is a very high percentage, given the sheer frequency and size of conversations on social media. Brands today are employing sophisticated methods and tools for analyzing social media to discover consumer insights and then make them into actionable marketing and branding decisions. But, if social data contains a large amount of spam, then the brands’ analyzes will not be accurate or actionable.

According to a recent New York Times article, 50% to 80% of a data scientist’s time now involves cleaning data. And really complex tools using Artificial Intelligence and Natural Language Processing are at the forefront of technologies employed by brands to clean data. Machine learning algorithms are used to identify spam. Networked Insights’ models with NLP capability can identify social spam with an accuracy of greater than 80% and have the ability to process millions of data points quickly.

percentage of total spams on social media

We also need to remember that Social Spam also includes posts, reviews or blog comments containing:

  1. Coupons – coupons, product listings, contests and giveaways
  2. Adult Content – adult or pornographic content
  3. General Spam – posts which contain gibberish or nonsense

Shopping, Finance and Technology have been identified as the top categories that contain maximum spam ranging from 13% to 10% of all conversations. While Sports, Science and Religion are the categories that contain less than 1% spam. Although the overall spam percentages are less than 10% across social media platforms, conversations for some brands are dominated by non-consumer data. And these brands have to employ more complicated methods to filter out spam.

So the conclusion is that spam and non-consumer generated posts are problems that cannot be ignored by brands. Doing so will skew data and give erroneous results in sentiment analysis. Important brand to brand comparisons can have unknown results due to differing amounts of spam occurring among brands and hence a right combination of Machine Learning, Natural Language Processing and Networked Learning algorithms need to be employed for cleaning out the dirt from the data. Sometimes we will see that data granularity and actionable trends improve greatly after cleaning. If you are a brand listening on Social Media, get a laundromat with the right algorithms before analyzing data.




Please enter your comment!
Please enter your name here

Latest News

After Facebook, Now Twitter is Caught Abusing Phone Number For Ads

Thanatophobia, or fear of death, is a relatively complicated phobia. Similarly, fear of losing your digital accounts to...

Warren Buffett’s Stake In Apple Is Worth More Than Combined Valuation of All Startup Unicorns in India

The Oracle of Omaha has got an eye for a good bet and it's proven time and again. Once a popular critic...

Microsoft Aims Global Acquisition Of TikTok, Including India!

It seems like TikTok can finally shed its Chinese origin from all over the world. It has recently...

Will Google’s Move To Delete 2,500 YouTube Channels Add Fuel To The Fire With China?

The US-China trade war has started rearing its ugly head. Both sides are now turning to extreme measures on the digital front...

Will You Buy iPhone 12 Pro At US$20,500?

if you are suspecting any typo error here, you are highly mistaken! The price of iPhone 12 Pro is US$ 20,500 now...

Google Is Shutting Down Google Play Music

Google Play Music will soon be buried in Google's graveyard. The company has announced that by December the service will be completely...

In-Depth: Dprime

Facebook Subscription Model: Looking Beyond Ad Dollars?

Seldom do job listings create a stir this gripping. However, when the job listing in question is a stealth post from Twitter,...

Will The Online Food Delivery Market in India End Up Becoming A Two-Horse Race?

It's pretty much evident that the food delivery space in India is all set to get riled up soon enough as one...

Fantastic 4: Four Day Work Week A Flashpoint Of Innovation?

It has been an idea that has been mooted by many, perhaps also somewhat sceptically. From being a dark horse to becoming...

More Articles Like This