Why AI tools invent numbers, and how source-grounding stops it
Ask an AI for analysis and it often hands back a number that looks exactly right and is completely wrong. Here is why that happens, and what actually stops it.
Published 24 June 2026
Key takeaways
- A language model predicts the next likely token, so it produces numbers that look plausible without any check on whether they are true.
- This is harmless for prose but dangerous the moment money, percentages or dates are involved, because a polished wrong figure is trusted and acted on.
- Telling a model to use only your data is a request, not a guarantee, and no prompt can enforce arithmetic.
- Real grounding means computing each figure from your data with an actual engine, then checking every figure against the source and removing anything it does not support.
- Nexlyr AI writes SQL and runs it on your files so figures are calculated not guessed, then a code-level check drops any number the source does not back up.
You upload a spreadsheet, ask an AI to pull out the headline numbers and back comes a clean paragraph. Revenue up 14%. Margin at 31.2%. Churn down to 4.8%. It reads like something a competent analyst wrote. You paste it into a deck. Then someone in the room checks it against the actual file, and the 14% is nowhere to be found. The real figure was 9%. The model did not misread the data. It never read it at all in the way you assumed.
This is the single most expensive failure mode in AI-assisted analysis, and it is not a bug you can patch with a better prompt. It comes from how language models work. Once you understand the mechanism, you can spot the risk instantly, test any tool for it and pick one that actually closes the gap.
The symptom: confident, specific and wrong
The dangerous version of this problem is not the obvious error. If an AI told you revenue was "approximately one billion dollars" for a company doing a few million, you would catch it in a second. The version that hurts is the precise one. A figure to one decimal place. A percentage that sits in a believable range. A date that fits the quarter. It looks like the output of a calculation, so you treat it like one.
That polish is exactly why it is more harmful than a crude mistake. An obvious error gets caught and discarded. A plausible one survives review, lands in a slide, gets repeated in a meeting and shapes a decision. The closer a wrong number looks to a right one, the further it travels before anyone notices.
Why it happens: a model predicts text, not truth
A language model has one core job. Given the text so far, predict the next chunk of text, called a token. It does this by having learned, across enormous amounts of writing, which tokens tend to follow which. When you ask for an analysis, it is not opening your file and doing sums. It is generating the most likely-sounding continuation of a sentence that begins "revenue grew by".
Here is the part that matters. The model has no concept of whether a number is true. To it, "14%" and "9%" are both just tokens, and in the context of a confident analysis sentence, both are highly likely. It picks one that fits the rhythm and shape of the text. A number that looks right is, to the model, simply a likely-sounding sequence of characters. There is no internal step where it stops, checks the figure against your data and confirms it. The fluency you see is real. The arithmetic behind it is not happening.
A language model does not know that a number is wrong. It only knows that the number looks like the kind of thing that goes there. Fluency and accuracy are two separate things, and the model is only optimising for one of them.
Fine for prose, dangerous for figures
This same mechanism is why AI is genuinely good at a lot of work. Drafting an email, summarising a long document, rephrasing a clumsy sentence: in all of these, plausible and useful are nearly the same thing. If a model rewrites your paragraph and the wording is slightly different from what you would have chosen, no harm done. The output is judged on whether it reads well.
Numbers are different. A figure is not judged on whether it reads well. It is either the value in your data or it is not. The moment money, percentages, growth rates or dates enter the picture, "plausible" stops being good enough, because a plausible figure that is wrong is indistinguishable on the page from a correct one. You cannot tell by looking. And the people downstream of you cannot either.
What source-grounding actually means
Source-grounding is the idea that an AI's output should be tied to a real source rather than generated from the model's general sense of what sounds right. But grounding is not one thing. There are levels, and the difference between them is the difference between marketing language and a figure you can rely on.
- Retrieval grounding. The tool finds relevant passages in your documents and quotes or paraphrases them. This helps for facts that are written down somewhere as text. But if the answer requires a calculation, retrieval alone does nothing, because the total you need was never sitting in the file as a sentence to quote.
- Computed grounding. The tool does not look for a figure to copy. It computes the figure from the underlying data using a real calculation engine, the same way a spreadsheet formula would, and then verifies the result against the source. This is the level that produces numbers you can stand behind, because the figure was derived, not retrieved and not predicted.
Most tools that say "grounded" mean the first kind. That is fine for pulling a quote out of a contract. It is not enough for analysis, where the answer you want, the variance, the sum, the year-on-year change, usually does not exist as a ready-made sentence anywhere in your files.
Why "we told the model to use only your data" is not enough
A common reassurance is that the tool instructs the model to use only your data and not invent anything. Read that carefully. It is a request written in the prompt. It is the equivalent of writing "please be accurate" at the top of the page. The model will often comply, and just as often will not, because the same prediction mechanism that produces fluent text will still slot in a likely-sounding number when one is missing or hard to compute.
There is a deeper limit too. A prompt cannot make a model do arithmetic reliably. Asking a text-prediction system to total a column of two thousand rows in its head and asking it to write a nice paragraph are the same operation to the model: produce likely text. It might land the total. It might be off by a digit. The instruction does not add a calculator. So a prompt is a soft request, never a guarantee. If the only thing standing between you and a fabricated figure is a sentence asking the model to behave, you do not have grounding. You have a hope.
What real grounding looks like in code
Grounding you can rely on is enforced in code, outside the model, so it does not depend on the model choosing to cooperate. In practice it has a few moving parts:
- Compute, do not predict. The figure is calculated from your actual data with a real engine, not generated as text. The sum of a column is produced by adding the column, not by guessing what the sum probably is.
- Check every figure against the source. After a draft is produced, each number on the page is matched back against the data. A figure the source supports stays. A figure it does not is removed, or the slide is rebuilt as plain text without it.
- Trace each figure to its file. Every number can be pointed back to where it came from, so the claim is checkable rather than something you take on faith.
- Separate calculation from narrative. The model is allowed to write the words around the numbers, where plausible is genuinely fine, while the numbers themselves come from computation and verification.
The key shift is that the verification is mechanical. It is not the model promising it stayed accurate. It is code, after the fact, deleting anything that does not match the data. That is what turns grounding from a claim into a property of the output.
How to test any AI analysis tool for this
You do not need to read a tool's documentation to find out whether it grounds figures. You can probe it directly in a few minutes. Three tests catch almost everything:
- Feed it a number that is not in your data and see if it parrots it back. Mention a figure in your brief that does not appear in the file, then ask for an analysis. A grounded tool ignores or flags the unsupported number. A predictive one happily weaves it in as if it were real.
- Ask it to total a column, then check the maths yourself. Pick a column you can add up in a spreadsheet. If the tool's total matches, it is likely computing. If it is close but off, it is guessing.
- Ask where a figure came from. For any number it gives you, ask which file and which part of it the figure is from. A grounded tool can point to the source. A predictive one will produce a vague, confident answer that does not check out.
Run these against any tool you are considering. The ones that pass are doing real computation and verification. The ones that fail are dressing up next-token prediction as analysis.
How Nexlyr AI does it
Nexlyr AI was built around this exact problem. You give it your files and a short brief. It does not ask a language model to eyeball your spreadsheet and report what the numbers probably are. Instead it writes SQL and runs that query against your actual data in an embedded database. The figure that comes out is computed from your source, the way a formula computes a cell, not predicted as a likely string of text.
Then comes the part that catches the rest. After the deck is drafted, a code-level check goes through every figure on every slide and matches it back against your data. Any number the source does not support is removed, or the slide is rebuilt as text without it. The model is never trusted to police itself. The check is mechanical and it runs every time. It can also derive a figure from your own numbers and show its working, and it can run what-if scenarios, but those are labelled as projections rather than presented as fact. What it will not do is invent a figure with no basis in your data, because the structure of the system does not let an unsupported number survive to the final deck.
The output is a fully editable, branded PowerPoint, and every figure in it can be traced back to the file it came from. So when someone in the room checks the 14% against the spreadsheet, it is there, because it was computed from there in the first place.
The difference is structural, not stylistic. A predictive tool produces numbers that look right. A grounded tool produces numbers that are right because they were computed from your data and checked against it before you ever saw them.
Questions, answered.
Why do AI tools like ChatGPT make up numbers?+
Because a language model predicts the next likely chunk of text rather than calculating. A number that fits the sentence is, to the model, just a plausible token sequence. It has no internal step that checks the figure against real data, so a confident, specific, wrong number is a normal output, not a glitch.
Is this the same thing as hallucination?+
It is the numerical form of it. Hallucination usually describes a model stating something false with confidence. When that false thing is a figure, a total, a percentage or a date, the consequences are sharper, because numbers get acted on and a wrong one that looks right travels far before anyone catches it.
Can a better prompt stop an AI from inventing figures?+
Not reliably. Telling a model to use only your data is a request, and a prompt cannot make a text-prediction system do arithmetic. It might comply, it might not. The fix has to live outside the model, in code that computes figures from the data and verifies each one against the source.
What is the difference between retrieval and computed grounding?+
Retrieval finds and quotes text that already exists in your files, which works for facts written as sentences. Computed grounding calculates the figure from the underlying data with a real engine and verifies it. For analysis, where the total or the variance you need is not sitting in the file as a sentence, computed grounding is the level that matters.
How can I tell if an AI analysis tool is grounding its numbers?+
Run three quick tests. Put a number in your brief that is not in the file and see if the tool repeats it. Ask it to total a column and check the maths against a spreadsheet. Ask which file a given figure came from. A grounded tool ignores the fake number, gets the total exact and can point to the source.
If you want analysis where every figure is computed from your own data and checked against it before you see it, that is exactly what Nexlyr AI is built to do.
Give it your files and a short brief. Get back a fully editable deck, grounded in your data.