The obvious fix for a hallucinating SQL agent is the wrong one
A few months into building a Text2SQL agent — natural language in, SQL out, for non-technical restaurant operators — I noticed the model kept inventing branch names.
It would write WHERE branch = 'Alsanck'. Close to a real branch, but not it: a dropped letter. The query didn’t error — it’s valid SQL, valid column, just no such value. It returned zero rows. And the operator on the other end, who couldn’t read SQL and had no reason to distrust the answer, saw zero sales and concluded one of their locations had flatlined.
That’s the kind of bug that scares me — the one that doesn’t crash but quietly hands a confident, wrong answer to someone who can’t tell it’s wrong.
The fix looks obvious. The model wrote 'Alsanck', the real value is 'Alsancak', the edit distance is one. Just snap it to the nearest real value before running the query. I wrote exactly that, felt clever about it, and it was a while before I realised that “snap to the nearest real value” was about the most dangerous thing I could have done.
Two ways to be wrong, and they don’t cost the same
Here’s what I missed at first. A bad filter value can fail in two directions, and the directions are not symmetric.
Under-correct. The typo slips through, the query returns zero rows, the user sees an empty result. Annoying — but visible. The user knows something’s off and rephrases. Recoverable.
Over-correct. The agent rewrites 'Alsanck' not into the branch the user meant, but into a different real branch that happens to be the closest string match. Now the user gets a complete, correct-looking report for the wrong location. They trust it. They order stock against it. This failure is invisible, and no rephrase recovers it, because nothing ever looked broken.
Framed that way the asymmetry is obvious: a visible miss is cheap, a silent swap is catastrophic. “Always correct to the nearest value” optimizes for the cheap failure and walks straight into the expensive one. (If you’re already nodding, this was probably obvious to you. It wasn’t to me — and I’d already shipped the eager version.)
So the rule I actually wanted wasn’t “fix typos.” It was: fix a typo only when you’re sure, and when you’re not, do nothing and let the miss stay visible.
Ground first, correct second
Two mechanisms — one before generation, one after.
Before: ground the model in real values. The schema I hand each agent isn’t just column names and types. For categorical columns it carries the actual distinct values from the table. The model isn’t asked to recall that a branch is called 'Alsancak' — it’s shown the set and told to pick from it. That turns a recall problem, which LLMs hallucinate their way through, into a selection problem, which they’re far better at. Most of the invented values died right here.
After: canonicalize, but conservatively. Grounding reduces bad values; it doesn’t eliminate them. So before a query runs, each equality filter is checked against the column’s real values, and a clear near-miss gets rewritten — with a lot riding on that word clear:
def canonicalize_filters(sql, distinct_values):
# distinct_values: {column -> the real values in that column}
for column, literal in equality_filters(sql):
if literal in distinct_values[column]:
continue # already exact — leave it
match, score = closest(literal, distinct_values[column]) # difflib ratio
if score >= THRESHOLD: # one unambiguous near-miss
sql = rewrite(sql, column, literal, match)
# else: do nothing. a visible "no results" beats
# a silent swap into the wrong real value.
return sqlThe entire design lives in THRESHOLD and the else branch I didn’t write. I tuned the threshold deliberately high — conservative — against real query logs. I’d rather let ten genuine typos through and show “no results” than auto-correct one value into the wrong neighbour. Being too shy here costs a mild annoyance; being too eager costs a wrong business decision. When the two mistakes are that lopsided, you tune for the expensive one and accept looking dumb on the cheap one.
Two choices in there I’d defend if pushed. It’s post-generation and deterministic on purpose. I could have tried to push all of this into the prompt — “only use values from this list, fix typos carefully” — but prompt behaviour drifts across model versions, and you can’t unit-test a vibe. String similarity is boring, cheap, stable, and testable with a table of inputs and expected outputs. Boring is a feature in the one component whose entire job is to not corrupt data.1
What the benchmark misses
None of this shows up on a Text2SQL benchmark. Spider grades you on whether the SQL is correct, not on whether you avoided silently handing someone a confident lie. But in production, in front of a user who cannot check your work, “never wrong in a way they can’t see” is much closer to the metric that matters than exact-match accuracy.
I won’t pretend it’s solved. The threshold is hand-tuned, which is a generous way of saying I picked it by staring at logs until it felt right, and the honest next step is a small evaluation set so “did this change help?” has a number behind it instead of my gut. I knew that the whole time and shipped without it. Maybe that’s the next post.
From a consulting project building natural-language analytics over restaurant operations data. Customer details, schema, and the actual threshold are abstracted; the reasoning is as built.
Footnotes
The near-miss scoring is just
difflib.SequenceMatcherratio — Python standard library, nothing exotic. The interesting part was never the matching algorithm; it was deciding when not to trust it.↩︎