Making Sense of Dirty Data and Spend Classification Components: With Oracle and Others

Next week, I'm participating in a webinar, Oracle and Procurement Analytics: A Deep Dive Into Oracle’s Spend Analysis and Data-Driven Procurement Solutions, with the Data Intensity team. For those that do not know Data Intensity, the firm acquired — and has since built on the Oracle consulting, systems integration and application enhancement “surround” strategy of — Enrich, which it purchased in 2016.

In various discussions with the Data Intensity team in preparation for the webinar — as well as in looking over the shoulder of some of my Spend Matters colleagues reviewing Oracle’s capabilities for an upcoming PRO Vendor Snapshot series — I realized that Oracle has largely flown under the radar in spend analytics compared with many other procurement technology providers, especially independent specialist vendors. Still, Oracle has quietly built a solid set of capabilities in the spend classification area (and, more broadly, spend analytics).

Granted, we’ll need to hope for Oracle’s participation in an upcoming Spend Matters Strategic Procurement Technologies SolutionMap (whether this year or next) to understand how its capabilities specifically compare to others in the spend classification (and broader spend analytics) markets from solution and customer perspectives. But at the very least, for organizations with an Oracle-led IT environment that want to take greater control of their spend data and overall spend, as well as supplier and supply master data, the technology giant is more than worth shortlisting as a potential option to manage procurement data.

Managing Spend Data: How It Fits as a Component of Spend Analytics

If we look at the broader category of spend analytics as comprising a seven-step process, the core enabling elements of spend classification capabilities (and Oracle’s own solution in the area) fall into the first four buckets below: load and integrate data, cleanse, classify and enrich.

  1. Load and integrate data — Get data in the system
  2. Cleanse — Fix the errors
  3. Classify — Map to a categorization scheme or multiple schemes
  4. Enrich — Add additional data from other data sources
  5. Cube — Build data cubes for OLAP, ROLAP and real-time cube construction
  6. Visualization, analysis and reporting — Include both “2-D” and “3-D” analytics and reports to help procurement organizations (and broader stakeholders) make sense of the data
  7. Integration — “Push” spend visualizing and reporting data into other modules within a suite or third-party solutions

Oracle’s approach to spend classification (areas 1–4 above in our general process flow/framework) is somewhat atypical in the market, as it is one of a small set of providers that sell only technology to support these areas (versus delivering a managed service in support of them).

This is where Oracle partners such as Data Intensity come in.

What are the Steps of Core Classification?

My colleague and spend analytics analyst expert Michael Lamoureux summarizes the process of cleansing and classifying/mapping data as a foundation for analytics and actionable intelligence that comprises three key steps. These are:

  • Duplicate detection
  • Auto-fill
  • Multi-criteria classification

Let’s briefly explore each of these components below in as succinct a way as possible (with a continued hat tip to Michael).

The premise behind duplicate detection is that data comes from multiple sources, and records/transactions will be duplicated. Moreover, different encodings, supplier IDs and the like make duplicates hard to detect, and missed duplicates lead to incorrect totals, trends and prescriptions. Fields such as supplier, invoice ID and payment should be auto-detected as part of the spend classification process. As Michael and I have observed before, duplicate detection is actually a harder, and more involved, problem than many people think. But it’s critical to solve, as you can’t even get proper control totals until this is done correctly.

The reason auto-fill is necessary is that the majority of data records will be incomplete. This is in part because many fields can be completed based on other fields (e.g., supplier ID, invoice ID, etc.). Simple rules can often “auto-complete” the majority of missing data (but these are required and need to be supported by the solution). Basic classification can complete the majority of the remaining fields (these rules should also be supported). We believe that auto-fill is an often overlooked feature of spend cleansing/classification and more important than one may think.

Multi-criteria classification includes three distinctive efforts: classification on fields, classification on derived fields and classification on enriched fields

While these are the common steps, there are actually many approaches to classification (e.g., rules-based, artificial intelligence (AI)/machine learning, hybrid machine learning, etc.). It’s difficult to say which is best, as each has its strengths and weaknesses — and where one works poorly on a given data set, another will work well. In general, the best solutions support multiple approaches and use them as appropriate.

Summing Up the Challenge (and Opportunity)

Making spend analysis simple is one of the most complex challenges in all of enterprise software today. And for indirect spend (and often broader data sets), spend classification is at the very core of the efforts.

Not only does spend analysis require (and involve) multiple technologies, but the process requires “power users” and often technical resources (on the back end) with the goal of an eventual hand-off to business users — and this must be repeated again and again. It also, of course, requires the right underlying solutions.

Deeper analysis often relies on broader datasets that bridge spend/supply and make/sell, as well as finance and operational inputs. This is a key for broader stakeholder involvement and often the largest data-driven opportunities — whether you use Oracle (or another underlying technology) at the core of your efforts or not.

Curious to learn more? Join me (along with the Data Intensity team) on next week’s webinar.

Special thanks Michael Lamoureux for providing some of his excellent summations on the underlying components of spend classification and spend analytics overall.

Share on Procurious

Discuss this:

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.