Supplier and tender evaluation (part 3) – how to score price

In part 3 of our series on tender evaluation, (parts 1 and 2 here) we’ll look at one particular knotty issue – scoring “price” (or whole life cost) as part of the evaluation process.

For many in the private sector, the answer is simple – don’t. Assess and score everything else, then just consider those marks against the prices. Make a judgement – for instance, is it worth paying supplier X £1 million more than supplier Y because they score a few points more on the non-price factors?

But this doesn’t work in the public sector, where we must have transparency and rigour, and even in  private firms, it doesn’t provide a very satisfactory audit trail, and gets difficult when we have multiple bids to compare. A one against one comparison as above is manageable – but 6 or 8 bids with different costs and different non-cost scores? Tricky to manage with judgement alone.

So that takes us into formally scoring and weighting cost as an evaluation factor. And we’ve seen and used many different ways of doing that. It is often assumed that getting the weighting right is the key point, but actually the scoring mechanism is more important. So take a trivial example – two bids, one of £10 and another of £12. We might decide on a mechanism that say the lowest price scores 100 points, and the highest, zero points. In that case, our two bids score 100 and 0 respectively.

Another mechanism might be to score the cheapest at 100 points and other bids a percentage below that related to the price differential. So the £12 bid is 20% more expensive, so we’ll score that 20% lower than the cheapest –it gets  80 points.

So the £12 bid might score 0 or 80 depending on the scoring mechanism used. That far outweighs any debate about whether price should be weighted at 40% or 60%!

It gets worse. I wrote a paper, published in a couple of journals, that showed a situation with three bidders. I showed that three different price scoring mechanisms could lead to three different decisions – with the same tenders and prices, any of the bidders, A, B or C could win depending on how price was scored.

My observation  is also that the mechanism most commonly used in the UK public sector at the moment is seriously flawed. That takes the lowest price as the 100 point score and then looks at percentage differences from that (as in our little example above).  So a bid 10% more expensive than the cheapest scores 90 points , one 50% more expensive scores 50 and so on.

So here’s  a quiz – can you see any logical flaws in that process? How might you challenge that as an unhappy bidder?

I can think of at least two issues - we’ll come back to this next week!

One option, not often used, but that seems to avoid most of the negative issues, is to set in advance a scoring mechanism related to actual prices. So we might agree that some theoretical and unfeasibly low price scores 100, and a theoretical and ridiculously high price zero. Then all bids are slotted into that framework in proportion. While it requires some thought beforehand about likely pricing, it has some real advantages in terms of fairness.

Hopefully, the complexities of this issue are coming through even in our quick skating over the issue. When we get into more complex evaluations and methodologies, and issues such as multiple evaluators, or a desire to run sensitivity analyses, then technology really comes into its own.

While the best in class eSourcing platforms can do a lot to help, for large, complex and sensitive projects, it may be worth looking at one of the more esoteric but interesting providers in the whole eSourcing provider space. QinetiQ Commerce Decisions provides advanced support tools for evaluation decisions, including the tricky stuff around issues such as scoring price, multiple stakeholders and so on (their platform is also used for marking the Supply Management Awards, which is obviously a hugely complex and sensitive task)!

We’ll look at their offering in a little more detail in the final part of our series, where we’ll  also pull a few themes together and sum up our views on this complex but important topic.

Voices (10)

  1. bitter and twisted:

    All seems to me the wrong way round.

    Why not put a $ value on the non-price criteria and add to the price?

    Effectively you are doing that anyway. If non-price ‘issue X’ can win a point and every million pounds above the lowest bid also loses a point, then you are valueing X at a million quid, right?

    And if you cant put a $ value on something , surely it is either worthless (shouldnt be in at all) or priceless (should be a pass/fail qualification issue not a score) ?

    1. RJ:

      I agree that this can work but effectively all you are doing by this process is reversing the point scoring process ($1m = 1m “points”).

      It’s much easier to follow this logic in a simple commodity-based purchase where, for example, you could monetarise a “score” such as speed of delivery, product quality failures or even the quality/content of reporting.

      I also am adamant that any tender process in which I’m involved contains the pass/fail qualfications that any proposal has to meet before they even reach the scoring phase.

      However, for more complex purchases the scoring of both cost/price and non-cost issues will inevitably move into harder to define criteria. Most of what I buy these days could be classified as “professional services”. It’s hard to see how you could monetarise an issue such as a consultancy provider’s level of understanding of an organisation’s stregic direction so that they provide appropriate advice or how to put a precise financial value on a legal firm’s experience to protect you against a £10m litigation action.

      As one of Peter’s earlier posts points out, we don’t generally assess very precisely beyond a 5 or 7 point scale and so weightings of both cost and non-cost factors against a scale do work best in most complex circumstances. It’s the definitions of these scales and the weightings that are the big challenge.

      Overall I’m very thankful that I work mainly in the private sector and so any scoring ends up as being “advisory” rather than absolute but this series of articles is prompting some veyr interesting and useful debate about appropriate ways of tackling the problems.

      1. bitter and twisted:

        Good point, but surely the logical conclusion is that tendering is not the right tool for complex, quality-driven purchases ?

  2. eSourcingSensei:

    Hello Peter

    One issue using the simplified scoring measures above (and I realise it is done just to highlight a point) is the focus on pricing (bid submissions) as the point at which scoring is conducted and almost nothing else.

    And even before we were to get into Paul’s (quite correct) need to look at quality as part of our scoring assessment there are other factors that need to be reviewed.

    Let me suggest some other areas that I have used in scoring RFQ/P/I’s:

    – Partial Quantity Bidding
    – Supplier location (distance from delivery address)
    – RFT (where a previous/existing relationship exists)
    – MOQ levels
    – Business Stock Holding Requirements
    – Payment terms and the resulting cost of cash

    There are so many others.

    Of course a good in-built evaluation tool is more than just useful. However absolute clarity for both the Business and the Supplier on measurements/scoring and how that will be conducted is also key

    With all scoring there is a caveat which is so often missed.

    If you are going to score any event you cannot ask for responses that are text based (unless the response is Yes/No) – text based responses whilst often in a RFI very necessary – are still viewed by emotion or opinion by the Buyer/Scorer – “was that the response I wanted? does that meet the level I was looking for?” and after you answer yourself yes or no you apply a score.

    Wherever you are going to use a scoring mechanism you have to ensure that the respondent can respond in such a way that a score can be applied, so you cannot auto score any text response.

    Why? (In case people haven’t run many auto scored events) Because a 50 word response to your question with all the detail you require can score 100 but, with auto scoring a text response, so can the supplier that responds with a “x” and nothing else.

    With any event you want to run, but particularly with events where you plan to auto score, it is imperative that the time is taken up-front to plan out not just what you want to score but how you will ask for the response to be made.

    1. RJ:

      “If you are going to score any event you cannot ask for responses that are text based (unless the response is Yes/No)” – I think I must be misunderstanding this point. Are you saying that you cannot score any text-based response at all? I don’t believe you can be as the rest of your comment is very articulate and I am sure that you don’t want to reduce the scoring process of all tenders to a mechanical tick-box exercise.

      Firstly, it is absolutely necessary to evaluate text-based responses where there are several potential variants of a solution with different implications. Some form of sliding scale is needed here as some solutions may offer significant improvements in cost or quality that have real value in the final deliverable.

      Secondly, it is very achievable (although quite hard work) to make text-based scoring objective: you simply have to articulate how you define each scoring point. I tend to use simple definitions for “does not comply”, “partially complies”, “complies” and “exceeds requirements” that state what I would expect to see in an appropriate response to each question or group of questions. I’ve not got a lot of experience of public sector tenders but even where I have been involved in these, this approach has been deemed acceptable (the most recent one was for legal services so I think/hope the public sector lawyer working with me might have raised the issue if it wasn’t).

      1. eSourcingSensei:

        Hello RJ

        May I re-assure you I was not trying to state that you “cannot” score text response – and I tried (although a little unsuccesfully I now think) to focus on an automated scoring approach – which cannot work on text responses.

        I agree that you can use certain criteria as you lay out to apply a scoring logic to a response – however where I have a number of stakeholders involved who may all particpate in scoring the responses, so I gather a business wide evaluation, it may be much harder. For example the team in logistics and the team for quality or R&D may see an identical response as two completely different values.

        If I am the only one scoring then I can apply my/your logic to all responses. If I involve a wider team (which in our business we tend to do) text evaluation becomes far more “interesting” although I agree not impossible.

  3. bitter and twisted:

    A very interesting series. But:

    – im not sure you should split subjects into more than 2 posts. the discussion ends up all over the place, especially when we anticipate later points.
    – when a series like this is finished it would be nice to have all the parts in one place

    1. Peter Smith:

      B & T , You’re right, but the problem with putting it all in one post is its just too long I think for the blog format . But I think putting it together into a single Briefing Paper is an excellent idea and I will do that. Can’t promise it will be immediate but fairly soon…

      Paul, Hadn’t planned to look at that but perhaps I will! Either as a blog post or as part of the overall Paper.

  4. Paul Wright:

    Peter, are you going to touch on scoring of quality? In particular the method of scoring that has 5 out of 5 meaning “has significantly exceeded the requirement”, 4/5 exceeded requirements, 3/5 met requirements etc.

Discuss this:

Your email address will not be published. Required fields are marked *