The dangers of dirty data — How to do a quick and easy spot-check

Data is not my responsibility or a priority as a CPO” … Yes it is!

Data should be everyone’s responsibility, from the bottom of an organisation right to the very top.  Currently, across many organisations, data is the responsibility of a person or department, and everyone trusts them to make sure the data is accurate.

But they are specialists in data, analytics and coding, not procurement. They don’t have the experience to know when DHL should be classified as a courier or warehousing/logistics services, or what is direct, indirect or tail spend is and its importance or priority.

They can apply the business rules you request, automate the process or follow your guidance but they don't know if the rules are working properly or not. That’s when the procurement team need to take up the baton and verify and QA the data.  It’s also important that any errors are fed back to the data team so that rules/scripts/codes can be amended in order to avoid repeatedly correcting the same mistakes.

But what is dirty data?

It can be defined very differently depending on who you speak to. At its most basic level, dirty data is anything incorrect. In detail, it could be misspelt vendors, incorrect Invoice descriptions, missing product codes, lack of standard units of measure (e.g. later, l, litres), currency issues, duplicate invoices or incorrect/partially classified data. All very familiar to most who work in procurement.

What are the consequences of dirty data?

There are a number of areas that could be affected by this, the most significant being reporting and decision making. You get regular dashboards from your team, and these are used to make decisions such as cost savings, supplier negotiations, supplier rationalisation or forecasting.

I often refer to the real-life IBM example, where there’s £25k of spend classified as cleaning. When it’s lower value it can slip under the net, but decisions are being made on this misplaced spend.

A more subtle example is DHL, which provides a range of services from postal and courier up to warehousing and distribution. If you’re a manufacturer and some of your warehousing spend is incorrectly classified as courier services, this could have a huge impact when looking at cost savings, supplier negotiations, monitoring contract compliance or forecasts for the year.

The larger the dataset you have, the easier it is for these mis-classfications to hide in plain sight unless someone accidentally discovers them, and this could have a knock-on effect on things like demand planning, budgets, sales, marketing and financial decisions.

Technology implementation can also be affected. Data preparation or cleansing before the implementation of any new software or system is an area that’s often neglected, and by the time it’s discovered there are errors in the data, staff have lost faith in using the software, are disengaged, claim it doesn’t work, or they don’t trust it because “it’s wrong.”

At this point, it either costs a lot of money to fix and you have to hope staff will adopt the software again, or the project is abandoned. In either case, this can take months and cost tens of thousands in abandoned software or reparation work.

You might also be considering AI, some form of automation, or a third-party supplier that offers this service.  As with technology implementation, this can potentially cause lots of problems. The data must be cleansed and prepared before being used for any type of AI or automation.

Think about the IBM example, each quarter the data is refreshed automatically with the cleaning classification, that £25k becomes £50k, then £75k the following quarter, it’s only when the value becomes significant that someone notices the issue.  By this stage, how many decisions have been based on this incorrect information?

How can I fix this and ensure data accuracy?

There is no quick fix, magic button, or software that can resolve these issues. To improve data accuracy, the initial piece of work has to be done by a human – the automation or system/software implementation will inevitably fail without it. Get everyone at every level to engage and take responsibility for your organisation's data, and communicate/share when things are wrong and need amended.

Not an easy task, but if your team understands the impact the data they work on has within the organisation, and that it’s not just the responsibility of “Bob in the corner” or “The IT department” it makes all the difference.

Consistency is also extremely important. Define rules and processes, classification is very subjective and quite often there’s more than one right answer.  As long as everyone’s working to the same standards, it’s much easier to change if it’s wrong later on.

Maintain your data. If it’s not maintained it will slowly become unusable over time. Either monthly or quarterly depending on the volume is recommended to keep on top of any issues, otherwise you’ll have to pay a large sum to fix the same issues all over again.

Spot check your data regularly, regardless of who you are. Using the guide below you can easily and quickly spot-check your organisation's data without any experience.

How to spot check your data

  1. Select the data and create a pivot table. Choose ‘Supplier Name’ or ‘Normalised’ if available, and the levels of classification.
  2. Change the report layout to show in tabular form, this will list by supplier, by classification.  From this you’ll be able to pick out any lines that stand out.
  3. If you have a supplier with a large number of rows, you can view it separately by copying the data into a new tab and creating a new pivot table from that.

Try it for yourself.

Data accuracy is an investment, not a cost. Address the issues at the beginning - while it might seem like a costly exercise, you will undoubtedly spend less than if you have a to resolve an issue further down the line with a time-consuming and costly data clean-up operation.

This post comes courtesy of Susan Walsh, aka The Classification Guru, find her here.

Disclaimer: the opinions expressed in this article are those of the author.

First Voice

  1. Steph Shrader:

    This is a fantastic article. I am so interested in this for help in my current position. I will reach out over LinkedIn.

Discuss this:

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.