Analysis how-to7 min read

Evaluating supplier proposals: a scoring approach that survives scrutiny

Gut feel and price-only comparisons collapse the moment someone challenges them. Here is a scoring method that holds up, plus where an AI analyst grounds it in the actual proposals.

Published 24 June 2026

Key takeaways

  • Set your criteria before you read a single proposal, splitting must-haves from weighted nice-to-haves, so the supplier's pitch never moves the goalposts.
  • Score every proposal against the same criteria on the same scale, then total cost of ownership, not headline price, to make them genuinely comparable.
  • Mark a criterion not addressed rather than assuming the supplier can do it, and avoid relative heat-maps that colour within a column and make a consistently strong supplier look average.
  • Run a sensitivity check by changing a weight to see if your winner holds, and document the reasoning behind every score.
  • Nexlyr AI's Proposal Analyser builds the side-by-side from the proposals you upload, with an explicit status grid, strengths and risks per supplier and a suggested pick you edit and own.

You have three proposals on your desk, a deadline and a recommendation to make. The easy path is to skim each one, glance at the price and go with the supplier who felt most polished on the call. That decision will not survive the first hard question. Someone asks why you picked the more expensive vendor, or why you dropped the incumbent, and you have nothing but an impression to point at.

A good evaluation is not about being clever. It is about being comparable. The whole job is turning three different sales documents into one apples-to-apples view, scored the same way, so the answer is the same whoever runs the numbers. Here is how to do that properly, and where it tends to go wrong.

Why gut feel and price-only comparisons fall apart

Gut feel fails because suppliers write proposals to feel good. The strongest writer is not the strongest vendor. A confident tone, a slick deck and a friendly account manager all bias you toward a supplier without telling you a single thing about delivery, fit or risk. When you compare impressions, you compare marketing.

Price-only comparisons fail for the opposite reason. The headline number is the one figure every supplier knows you will look at, so it is the one figure they shape hardest. Two quotes for the same work can look 30% apart on the cover and be identical once you add implementation, support, training, the year-two licence step-up and the cost of switching later. You end up choosing the supplier who hid the most cost in the small print.

Both failures share a root cause. There is no fixed yardstick. Each proposal sets its own terms, and you judge each one on the terms it gave you. Fix the yardstick first and the rest of the method follows.

Set your criteria before you read a single proposal

Decide what matters while you are still neutral. The moment you start reading proposals, the suppliers begin defining what counts as important, and they will define it in their own favour. Write your criteria down before any document is open.

Split them into two groups:

  • Must-haves. Pass or fail. If a supplier cannot meet a must-have, they are out, regardless of how strong they are elsewhere. Security certification, data residency, a hard go-live date, a regulatory requirement. Keep this list short and real. A long must-have list usually means you have not decided what is genuinely non-negotiable.
  • Weighted nice-to-haves. Everything that is better-or-worse rather than yes-or-no. Each gets a weight that reflects how much it actually matters to the outcome, not how easy it is to measure. Implementation speed, support quality, roadmap fit, references, commercial flexibility.

Set the weights before scoring too. If you assign weights after you have seen the proposals, you will unconsciously tune them to favour the supplier you already like. Lock the weights, then score.

The single most useful discipline in supplier evaluation is sequencing. Criteria, then weights, then proposals, then scores. Reverse that order and you are not evaluating, you are rationalising a choice you already made.

Score every proposal on the same scale

Use one scale for every criterion across every supplier. A simple 0 to 4 works well: 0 not addressed, 1 weak, 2 meets, 3 strong, 4 exceeds. The exact numbers matter less than using them identically for all three vendors.

The point of a shared scale is that a score becomes a claim you can check. A 3 on support has to mean the same thing for supplier A as for supplier C. Write a one-line reason next to each score as you give it. Not a paragraph, just enough to remind you why. Those one-liners are what you will reach for when the decision is questioned later.

  1. List your criteria down the left, suppliers across the top.
  2. Go criterion by criterion, not supplier by supplier. Scoring all three on one criterion before moving on keeps the scale consistent and stops one impressive proposal inflating its scores everywhere.
  3. Record the score and a short reason in each cell.
  4. Multiply each nice-to-have score by its weight and total each supplier's column.
  5. Apply the must-haves as a gate. A failed must-have removes the supplier from the ranking entirely, no matter how high the weighted total.

Use total cost of ownership, not the headline price

Treat price as a criterion, not the criterion, and treat it as total cost of ownership over the life of the contract. The headline figure is the start of the cost, not the whole of it.

  • One-off costs: licences, setup, implementation, data migration, integration work.
  • Recurring costs: subscription or support fees, and any year-two or year-three step-up the quote mentions in a footnote.
  • Internal costs: the time your own people spend, training, change management.
  • Exit costs: what it takes to leave. A supplier that locks your data in carries a switching cost you will pay later.

Put every supplier's total cost on the same basis and the same horizon. The cheapest cover price is often not the cheapest deal once these land, and a clear total cost figure is far harder to argue with than a vague sense that one quote felt high.

Mark not addressed, never assume

If a proposal is silent on a criterion, that is a finding, not a gap for you to fill in. The generous instinct is to assume a capable-looking supplier can probably do the thing they did not mention. Resist it. A silence becomes a score of zero, marked not addressed, until the supplier confirms otherwise.

This matters for two reasons. It keeps the comparison grounded in what the proposals actually say rather than what you hope they mean. And it gives you a precise list of follow-up questions: every not addressed cell is something to put to the supplier before you decide. Assuming capability is how a vendor wins on a requirement they never committed to.

The relative heat-map trap

Here is a subtle one that catches careful people. You build your scored grid and colour it to make it readable. Many tools, and many spreadsheets, colour relatively, shading each cell against the others in its row or column. Green for the best in that line, red for the worst.

That visual lies about a consistently strong supplier. Imagine a vendor who scores a solid 3 on almost everything while the others spike high on a few criteria and crater on others. A relative colour scale paints your steady 3s as mediocre orange, because they are rarely the single best cell in any row, while a rival's lucky 4 glows green. The eye reads the colours, not the numbers, and the most reliable supplier looks average.

Colour the absolute status of each cell, not its rank within a column. Exceeds, meets, partially meets, does not meet, not addressed. A 3 should look like a 3 wherever it sits, so a supplier who scores well on every line reads as strong, not beige.

Pressure-test the result before you commit

A weighted total gives you a winner. Now find out how fragile that winner is. The number that decided it was built on weights you chose, and those weights are judgements, not facts.

Run a sensitivity check. Take the weight that most influenced the outcome, change it within a reasonable range and see whether the ranking flips. If a small, plausible shift in one weight changes who wins, your result is balanced on a knife edge and you should say so. If the same supplier wins across a range of weightings, your recommendation is robust, and being able to state that is worth far more than the raw total.

Then write the why down. Not just the scores, the reasoning behind the weights and the close calls. A documented decision is one that holds up when a stakeholder revisits it in six months, when the losing supplier asks for feedback, or when the project hits trouble and people want to relitigate the choice. The work you do here is what turns a ranking into a decision people can stand behind.

Where Nexlyr AI fits

The method above is the right way to do it. The slow part is the mechanical middle: pulling three or four long proposals into one grid, on one scale, with a reason in every cell. Nexlyr AI's Proposal Analyser does that part from the actual documents.

You upload the supplier proposals and your criteria, or let it lift the criteria from an attached requirements document. It reads each proposal and builds the side-by-side for you: a status grid scoring every supplier against every criterion as exceeds, meets, partially meets, does not meet or not addressed, plus strengths and risks called out per supplier and a suggested preferred pick.

Three things make the output trustworthy rather than just fast:

  • It is grounded per supplier. Every cell traces back to what that specific proposal actually says. Nexlyr AI will not invent a capability a vendor did not claim. If a criterion is not addressed in the document, it is marked not addressed, never guessed in the supplier's favour. That is what lets you trust the comparison: it reflects the proposals, not an impression of them.
  • It uses the absolute status grid, not the relative heat-map trap above. A supplier who scores well on every line reads as strong, because each cell is coloured by what it is, not by how it ranks against the others in its column.
  • The verdicts and the suggested pick are suggestions you own. The tool organises the grounded evidence and proposes a read. You edit the scores, the verdicts and the recommendation. The decision stays yours, and the editing is where your judgement goes in.

You get the whole thing as a fully editable, branded PowerPoint deck, so the comparison goes straight into the meeting where the choice gets made. And once it is built, a Think further pass reviews the finished analysis like an analyst would, raising the questions, risks and what-ifs you would want surfaced before you commit, including the kind of weight-sensitivity check described above.

It does not replace your judgement. It does the comparable-grid work that makes your judgement hold up, grounds it in the source documents and hands you a result you can edit, present and stand behind.

Questions, answered.

How should you score supplier proposals fairly?+

Set your criteria and their weights before you read any proposal, split them into pass-or-fail must-haves and weighted nice-to-haves, then score every supplier against the same criteria on the same scale with a short reason in each cell. Scoring criterion by criterion rather than supplier by supplier keeps the scale consistent.

What is total cost of ownership in supplier evaluation?+

Total cost of ownership is the full cost of a supplier over the life of the contract, not just the headline price. It includes one-off costs like setup and migration, recurring fees and any year-two step-ups, internal time and training and exit costs to switch away later. Comparing on TCO stops a cheap cover price hiding an expensive deal.

Why are relative heat-maps misleading for comparing suppliers?+

A relative heat-map colours each cell against the others in its row or column, so it shows rank rather than actual quality. A supplier who scores a steady, strong result across every criterion gets painted as average, because it is rarely the single best cell in any line. Colour by absolute status instead, so a strong score looks strong wherever it sits.

What does a sensitivity check tell you about a supplier decision?+

A sensitivity check shows how robust your winner is. You change the most influential weight within a reasonable range and see whether the ranking flips. If a small, plausible change in one weight changes the winner, the result is fragile. If the same supplier wins across a range of weightings, the recommendation is robust and you can say so with confidence.

Should you assume a supplier can do something they did not mention?+

No. If a proposal is silent on a criterion, mark it not addressed and score it zero until the supplier confirms otherwise. Assuming an unstated capability lets a vendor win on a requirement they never committed to, and it makes your comparison reflect your hopes rather than what the proposals actually say.

Upload your supplier proposals and criteria to Nexlyr AI's Proposal Analyser and get a grounded, editable side-by-side you can take straight into the decision.

Give it your files and a short brief. Get back a fully editable deck, grounded in your data.