In an ideal world, you’d bolt an AI to your enterprise data, switch it on and let it “do its thing”. Your new ERP platform would just suck in your old database and pop the information into all the right places. Or to switch over to your hybrid cloud, you’d have everything hooked up in, like, a half hour or so.
But this is the real world and here, data doesn’t play well with other systems. So any data-driven project is like several billion-piece jigsaw puzzles mixed up, then you get told to build one from all the bits. You’d need to sort through the pieces, remove the wrong ones, identify all the sides and group the rest by colour before you could even think about starting. Like this allegory, your data also needs to be fixed and enhanced to fit your intended purpose.
In short, it needs to be identified, categorised and cleaned. Here’s the bad news, though: if your digital journey is not fueled by a constant supply of clean, refined data, you’re not going anywhere.
Strategy and discovery
Just to be clear: “cleaning” is a subjective term. It can refer to fixing data that has real defects e.g. a record dated 30th February, or data that is in the wrong format for its purpose, e.g. two decimal places instead of the required three. Maybe we also need to account for missing records or remove outliers that skew trends significantly.
However, before you can decide whether your data is clean or not, you have to determine if you have enough of the right data to begin with. So you must have a business purpose and know why you need the data and build a strategy for getting it to that purpose.
Although data scientists spend only a short time on discovery, it’s a critical step in any data-driven project. Here, they determine what data (of the correct quality) you have, if it exists in sufficient quantities to meet your needs, how it could be enriched by other external sources, and where additional data will come from. Only then can data cleaning begin.
Clean data
Data is the new oil, they say. It has to be mined and refined before it can be used to fuel your digital transformation. It’s no surprise then that data scientists spend a majority of their time cleaning data.
In today’s digital world, data doesn’t just come from your enterprise database. It’s also extracted from a myriad of online systems like email servers, social media accounts, third party web services like Google Maps, public industry databases, and a host of other sources. Such massive amounts of information (collectively known as Big Data), give your data-driven project unlimited potential. Yet, the sheer granularity of the digital world means that any errors in data are magnified exponentially and could lead to vast deviations from the truth.
So those defects have to be removed first to create reliable input for AI, ERP, CRM, analytics and other systems that will use the data down the line. You can’t skip this step and expect to succeed. Yet, so many digital transformation projects do, and end up failing miserably.
Developing a data quality culture
What’s missing is a corporate culture that puts data quality first. No amount of technology will create that culture; it has to start at the top and work its way down through every level of the organisation. It comes from acknowledging that data is the driving force of the digital enterprise. That there’s really only good data and bad data, nothing in between. Once you accept this and opt for clean data, your digital transformation will take off and you’ll start seeing the returns you envisioned.