This three-part series covers three spend analysis lessons. I start (today) with lesson 1, which is a personal experience that taught me the importance of having a robust process to guarantee the integrity of the data you use.
Lesson 1: An Introduction to Data Handling — Cleansing, Enrichment and Classification
My first exposure to anything that we might remotely describe as spend analytics — attempting to enhance manual Excel- and Access-based efforts to manipulate and analyze spending data — was approaching two decades ago. I was a manager at FreeMarkets working on product strategy initiatives as part of a large sourcing engagement involving dozens of potential categories, tens of thousands of line items and hundreds of millions of dollars in overall spend.
Previously, like other service providers and internal sourcing teams working on similarly complex datasets, we had focused efforts such as these on one-time initiatives designed to showcase the best opportunities for savings or cost recovery (e.g., invoice auditing).
But after listening to the needs of various clients, we saw an opportunity to build a solution that would include an initial effort to develop a wave-driven sourcing strategy, broken into immediate, mid-term and longer-term priorities, as well as an ongoing program that would update and refresh the same datasets to show spending execution results and compliance as well as new opportunities.
The execution of this would not be easy. Our initial solutions would amount to more of a patchwork of capabilities rather than a truly integrated solution. And one competing vendor shot out of the gate not so long thereafter with a full “auto-classification” capability built on a rules-based foundation — far superior to what we had at the time.
But alas, the experience taught me a valuable lesson: Data matters above all else. Yet getting to accurate data is not as easy as it might seem (at first).
Why Data is at the Core of the Challenge
The initial effort that my team was able to build was essentially a largely manual driven ETL (extract, transform, load) data management process that loaded information into a relational database, on top of which sat a Cognos platform (i.e., business intelligence) with a number of canned reports. Our approach, like many others at the time, was not elegant, but it worked in helping answer a number of basic questions that we could use to put in place and execute on a cost reduction strategy.
Getting at data is only the first step to spend analysis success. But don’t assume it’s an easy one. As I found, basic accounts payables systems data (typically coming out of ERP tools,) as well as general ledger information coming from our own systems is just the start.
The Real World is Not the Procurement Data World — At Least Not Yet
Most of the data available to us comes from internal systems and, especially in the case of indirect and services spend, lacks line-level detail that only comes from specialized procurement tools and supplier-provided invoice data.
Of course there are other sources of data as well. Purchasing card files, archived EDI information, supplier networks, banking data, group purchasing organization information, trade financing archives and other third parties can often provide data as well, although it’s usually in pieces, not complete.
As I found, piecing all of this data together to create a holistic spend picture is not a walk in the park. But even before that, we need to “clean” and enrich it by making sure all the header and line-level names and details are accurate and to a specific naming standard. Data is often missing specific fields and misspellings and abbreviations are common — as are incorrectly coded fields.
In summary, the most common theme that comes out of initial spend analysis exercises is that data quality is poor — and our suppliers have better data about us than our own systems can provide. Ironic, eh? But this is where the right technologies can come into play to help us piece together scraps from both our own systems and supplier data, too.
Best practices (of which there are many, beyond this list) here include:
- Creating efficiencies of scale in data loading and normalization (especially for repeat/refresh situations)
- Pursuing a permanent (sustainable) solution versus one-time efforts — with at least quarterly data refreshes, but ideally monthly or even weekly
- Having an automated approach to cleansing and classification (machine learning/AI, rules, etc.) with human expertise layered on top of it (if the technology is not sufficient alone to get to the level of accuracy required, which is the case in over 90% of spend analysis efforts). This includes being able to address naming inconsistencies and conventions; missing data; incorrectly entered data (e.g., “Yin” for “Pin”); applying the right taxonomic structure to underlying datasets (e.g., “mouse” as a computer peripheral vs. a laboratory consumable), etc.
- Getting to line-level detail
- Offering predictive coding/analysis — even (potentially) classifying data in real time at the point of requisition
- Providing standard data enrichment options (for company level fields — think “Hoovers” type information)
- Providing secondary/advanced data enrichment (for supply risk, corporate social responsibility data, supplier diversity, etc.)
If we carry out these activities at the proper level, we’ll be able to generate a timely, core spend master data set that is continuously updated and available. But then we must begin to do something with it and encourage our peers to use and interrogate the data as well to identify opportunities and take action. Which is what brings me to the next post in this series: the visual display of spend information.