Artificial Intelligence in Contract Management (Part 4: Natural Language Processing and Machine Learning)

NLP agsandrew/Adobe Stock

In this final installment of this series on artificial intelligence in contract management, we turn our attention to natural language processing (NLP).

In earlier posts in this series, I mentioned chatbots and the Turing test, both of which require NLP. I also mentioned machine learning and the process of classifying text to the domain-specific ontologies that model commercial knowledge. At this point, you may have wondered, “Yeah, but how the heck do I actually model and extract all of this knowledge out of our existing contract documents in the first place?”

This is a great question, and the answer is it’s not easy, because you need to solve the multilingual syntax and semantics problem before you get to higher-level knowledge ontologies. This is where NLP comes in. It’s needed for data mining of external “big data” sources and for addressing the legacy contract encoding problem.

The latter is the biggest short-term problem, especially if your firm has grown through acquisition or has not been rigorous in establishing a standardized clause library. But there are contract analytics providers (tools and/or services) that can help depending on your situation. It’s a thorny issue because of the dense “legalese” written in various languages (which have multiple semantic and legal interpretations) that must be classified and mined to extract the atomic-level insights about obligations, rights and risks of interest. This NLP of “natural” commercial language can’t be scalably addressed by human coded, rules-based approaches.

And that’s where machine learning comes in.

Machine Learning

The term machine learning refers to computers that “learn” from the data they process rather than relying on humans for rules-based procedural programming to act upon that data. It not only discovers patterns in data but also specifically helps correlate various data inputs and key data outputs, which helps enable predictive analytics.

In a “supervised learning” approach, human experts determine the outputs and the system “learns” how to mimic the human experts, as well as uncover latent variables and interactions that humans wouldn’t have spotted on their own. “Unsupervised learning” doesn’t rely on humans for direct training and stretches into the realm of deep learning, which is beyond the scope of this series.

Note that rules-based logic and more sophisticated algorithms can also be layered on top to help improve effectiveness of the overall predictive analytics. This overall approach is proven in the area of spend analytics but also similar in legacy contract conversion, where the input data is raw contract text and the output data is classified, abstracted and harmonized contract clauses. The trained system can also be used to help classify user intent in a CLM workflow (e.g., the “guided contracting” scenario discussed earlier in the series).


Raw human intelligence is meaningless without knowledge. The same parallel applies with AI-enabled systems. Organizations must begin to look at knowledge management not as an Intranet for document management but rather a capability at the intersection of human knowledge and emulated knowledge via AI-enabled computer systems.

Within CLM, this activity can begin now. Building such a commercial knowledge base, however, simply can’t be supported by older-generation CLM approaches and systems (e.g., relational databases that store contract document metadata and attachments). The ability to build such an enterprise knowledge base that can form a foundation of AI-enabled commercial information requires high quality contracts and high quality processes that manage those contracts through their lifecycle.

As such, leading CLM apps are a great place to start building the groundwork for AI-based CLM. I started this series with three simple steps in CLM:

  1. Build a high-level knowledge base about all your contracts in the form of a contract repository.
  2. Derive key intelligence from within your contract clause data/metadata to identify risks and rewards.
  3. Begin to leverage your augmented commercial intelligence within upstream CLM processes and other commercially informed enterprise processes

The application of AI to contract management is evolutionary. And evolution spans multiple generations, including technology generations. As next-generation cloud CLM providers run their apps for multiple companies, their systems (“the machine,” if you will) will also increasingly learn from the end users. This is not just within a single company but across hundreds or even thousands of companies using a true SaaS-based solution. The system will learn “at scale.”

The takeaway? Get smart and procure proper CLM capabilities before you start using “smart CLM.” Before you supervise any learning and start training the machine in any area, it’s good to get smart on the topic of AI to help you along. Hopefully this paper has been useful to that end. I encourage all procurement practitioners to keep learning, experimenting and sharing your results. We’d love to hear your thoughts, experiences and opinions in this area.

Share on Procurious

Voices (5)

  1. Jayant Mukherjee:

    Hi Pierre, Excellent set of articles around how AI can be applied in Contract Management. I liked the point you highlighted about how AI systems can learn “at scale” in a SaaS based implementation. We are currently working on a prototype to develop a SaaS based CLM solution to address the challenges complex IT Services/Outsourcing arrangements. Would really appreciate if you can review and give us feedback when the prototype is ready. Would love to stay connected. Best Regards, Jayant

  2. Nikhil:

    Nice one. At CloudMoyo, we also use ML & NLP to extract insights from contracts & other legal documents

  3. Pierre Mitchell:

    As a follow-up, I received an answer to Yogesh’s question below from my colleague Dr. Michael Lamoureux….

    “Ontology development relies on a complex, descriptive model that defines the domain, the relationships, and a methodology for interpreting the data while and machine learning relies on numerical representation of data. Machine learning can be built on statistical algorithms, pattern recognition, and other knowledge discovery techniques.

    While machine learning can use statistical techniques on fingerprints and feature encodings to identify likely instances of relevant semantic concepts, only a true semantic algorithm can extract semantic knowledge using the domain model.

    The tool that is working best in CLM implementations for extracting semantic concepts is that of Seal Software, which has recently created the ability for expert users to expand the domain model to increase detection accuracy and relevance.

    It’s hard to say which parser works best — as all need to be defined on a relevant ontology and trained to the data, just like a neural network needed to be trained to the data to increase high levels of accuracy. In other words, there’s no reason that the open source Stanford parser can’t work as good as any patented / proprietary parser.

    As for which machine learning algorithms would suit word analysis, you would be best suited with hybrid fingerprint / feature extraction techniques that encapsulate word distance and use advanced statistical techniques (clustering, kernal machines, etc.) to identify relevant data points.”

  4. Pierre Mitchell:

    Hi Yogesh,
    From what I can tell, it’s a combination of machine learning (e.g., support vector machines to help classify clause language into specific clause types) AND ontology-specific knowledge modeling regarding specific contract domains (e.g., options contracts) and more general ones. Different startups (and a few established players) are working as much on these ontological data frameworks as the analytic tools.
    We’ve not run comparative benchmarks, but we see Seal Software out there the most in corporate deployments, but there are tools from other firms like Kira, Ravn, Recommind (Opentext), eBrevia, LegalRobot, Counselytics, and others (who will likely comment on this blog post!). If you want to check out a cool vendor (and site) and see a good summary dump of all things legal tech, go to
    Also check out the blog ArtificalLawyer – it’s excellent.
    Thanks for writing in.

  5. Yogesh:

    Ontology development relies on Description Logic while Machine learning relies on Statistical representation of data. Can you please explain how semantic aspects of contract will be extracted and the tools that work well ? Which parser has the best accuracy which disambiguation techniques will work well, which Machine learning algorithms will suit contract word analysis?

Discuss this:

Your email address will not be published. Required fields are marked *