This application is developed for PORTRAIT orientation.
Please turn me.

Cut Out the Noise – How Clean Data Delivers Better Than More Data

25 May 2018

The nature of the internet is such that every single activity performed on it is trackable. In such a day and age when everything can be quantified, big data has emerged as a goldmine for marketers. But there’s a catch. However vast, plenty, and comprehensive this data might be, the most glaring fact is that the data in its rawest form is an unintelligible aggregation which cannot be used to anyone’s advantage unless organised and made sense of.

More often than not, big data is a scrambled mass of inconsistencies, inaccuracies, inadequacies and incoherence. Even in something as basic as form filling, someone might have input the address erroneously, someone might have forgotten to mention the email-address. Someone has multiple unverified digital identities. At the end of the day, the bigger the data set, the larger is its deviation from uniformity. It remains unchecked and hard to trust.

But this is where finding the proverbial ‘needle from the haystack’ is what ultimately pays dividends for brands and marketing platforms alike. After all, a digital insight is only as good as the data set it is derived from.

Brands may have bought into the notion that ‘more data equals better results’, which means that they are diverting majority of their digital marketing spends on acquiring even more data sets from various sources such as e-mail, WhatsApp groups, Point of Sale data, mobile databases, credit and debit card data and the like.

But this generalisation doesn’t take into account the quality of data. Shifting investments towards derivation of insights from clean data leads to much more precise customer targeting and better E-Commerce brand campaigns – all in lesser time, at a lesser cost, and with significantly reduced margins of error.

The Repercussions of Having Dirty/Unclean Data

While more data is not necessarily a bad thing since the sample size will increase, it also equates to having more unclean data on hand. When the data is ‘dirty’, so to speak, it could lead to a myriad possibilities that could derail a brand’s online advertising efforts quite easily. The brand risks inconsistent and irrelevant insights, false targeting, and misinterpretation, to loss-incurring marketing campaigns, poor customer retention, analytical redundancy, and inventory mismanagement. Misinterpreted data cannot differentiate between relevant and irrelevant parameters. Unclean data, especially when used in campaigns on E-Commerce, can lead to marketing a product to the wrong demography, advocating it at an inconsequential time, or reaching out to a user whose user journey has been incorrectly mapped. Another example could be of e-commerce platforms failing to predict the required inventory for a big sale day, thanks to their scourge of unclean data. In essence, more unclean data equals more noise and therefore, a failed marketing effort on E-Commerce platforms.

How Important is Clean Data in E-Commerce?

When it comes to consumer shopping behaviour online, the data points need to be highly accurate to allow E-Commerce platforms to create concrete personas and a comprehensive marketing strategy. This gives rise to the need for clean data: a data set that is uniform, consistent and complete. Through the method of data cleansing and analytics, data scientists funnel down massive user information into clean, actionable, and highly accurate data. This is where brands should concentrate and market on E-Commerce platforms that guarantee clean data sources at the analytics level instead of claiming access to larger inflated data pools.

Take, for instance, the case of the data set of Indian addresses needed to ensure the smooth last man delivery of products. The country’s postal addresses are highly complex and inconsistent. There are various names and spellings for the same place and the PIN codes are often disputed, especially in rural areas. Consequently, incorrect addresses result in failed or delayed deliveries which have a massive effect on customer satisfaction levels and revenue. At Flipkart, we solved this problem by nipping it the bud via a robust machine learning setup through probabilistic separation of compound words, data-dependent dictionary models, and methods of detecting and eliminating fraudulent addresses. Through conversion of unclean into clean data, we were able to record 98% accuracy in address identification and classification, and put an end to address fraud, re-selling and return scams.

In E-Commerce marketing, data is available to marketers in diverse formats like databases, unstructured text docs, emailers, etc. and that makes it easy to misinterpret available data and generate incorrect insights. This becomes even more critical when one realises that the marketing data accumulated consists of duplicate, incorrect or simply redundant information.

Take the case of consumer contact data. Experts suggest that 25% of this data expires annually. Moreover, leads generated online could be up to 15 months old at the time of generation or when the customer is ready to make a purchase. E-Commerce marketing campaigns based on such faulty data are bound to be unsuccessful. This dynamic nature of the E-Commerce industry is what makes it absolutely necessary to have clean data in the segment. Thus, having a robust machine learning, AI and analytics program to sift through the mess at the earliest stage of data acquisition is what brands should target as their number one requirement from an E-Commerce advertising platform.

Another critical aspect of E-Commerce marketing lies in effective OTT advertising. With global audiences increasingly consuming streamable, subscription-based content, marketers have set their sights on achieving addressability in their consumer targeting. This means that instead of creating ads based on the most likely audience to watch a particular content genre, marketers want to provide different ads to different audiences watching the same program. This is where services like Flipkart Shopper Audience (FSA) rely on clean, triangulated data based on viewership, audience type, content, and IP addresses to create unique audience segments and personas for delivering tailored OTT ad campaigns. The more accurate and precise the database is, the more refined will be the audience segments, and the better will be the chances of reaching out to consumers with advertisements on an individual level – all in real time.  

Further, E-Commerce portals have established the right online advertising strategy for brands by beginning with filtering a lot of unclean consumer data. This is performed through a lot of innovative tools: data profiling, data wrangling, batch processing, AI, and machine learning.

At Flipkart Ads, the prerogative is always to rely on the cleanest of data instead of hoarding it endlessly. Our Flipkart Shopper Audience (FSA) tool relies on over a decade of customer shopping data repository that has been sorted, parsed and validated to generate meaningful insights for customer targeting. After all, mapping and targeting customers, especially those with a higher intent to purchase is what can ultimately yields higher sales.

Flipkart Ads Manager helps marketers track their brand’s campaign performance and analyse data real time, take insight-backed decisions, and keep a check on their media spends. This clean data-driven system is completed by the dashboard. As part of its high-value service offering, it enables marketers to measure the campaign metrics and gauge performance.


In the world of e-commerce marketing, data is critical. But data handled and analysed without an effective approach turns into noise. Advocating your brand on insights drawn from dirty data is akin to shooting in the blind. Platforms such as Flipkart Ads have been built precisely to fix this problem.

With advancements in big data analytics and AI, machine learning is gradually evolving to intercept unclean data at the very initial stages of capture. Indian e-commerce advertising industry is poised to reach spends of Rs.2500 crore by 2020 at an estimated 30% CAGR. Thus, with so much at stake, brands and businesses that equip themselves with clean data will take the chunk of the returns.