The router that’s allowed to say ‘I don’t know’

agentic AI

LLM

Text2SQL

My first Text2SQL agent was one big prompt holding the whole schema. It confidently joined tables that had no business being joined. The fix wasn’t a better prompt — it was decomposition, and giving the routing step the right to refuse.

Author

Umut Altun

Published

September 22, 2025

My first version of the agent was one prompt to rule them all. The entire schema went into the context, the user’s question went at the bottom, and the model wrote SQL against all of it. It demos fine. Then someone asked a question that touched sales and staffing, and the model wrote a join between two tables that shared a column name and absolutely nothing else. The result was a confident, beautifully formatted table of nonsense.

The instinct is to fix the prompt. Add instructions, add warnings, add examples of good joins. I did some of that, and it helped a little, and then I realised I was treating a structural problem as a wording problem.

The structural problem is schema linking — mapping a vague natural-language question onto the specific tables and columns that answer it. It’s the genuinely hard part of Text2SQL, and it gets worse as you add tables, not better. My data had several distinct domains — sales, inventory, waste, staffing, and so on — and stuffing all of them into one context turned every question into a needle-in-haystack search across tables that often used the same words to mean different things. More schema in the prompt meant more ways to be confidently wrong. No amount of prompt-polish fixes that; you’re asking one call to both figure out what the question is about and write correct SQL for it in a single shot.

So I split it. Before any SQL gets written, the question goes to a small classifier whose only job is to decide which domain the question belongs to, and dispatch it to a specialist that carries only that domain’s schema. Sales questions go to the sales specialist, which has never heard of the staffing tables and therefore cannot join to them. Decomposition shrinks each specialist’s schema-linking problem from “all tables” to “the handful that matter,” which is the difference between a search and a lookup.

The router is deliberately a different kind of component from the specialists. It runs at temperature zero — routing should be reproducible, not creative — and it returns a structured decision, not prose:

class RoutingDecision(BaseModel):
    specialist: str        # which domain agent should handle this
    confidence: float      # 0.0–1.0
    alternative: str | None # the runner-up, if it was close

decision = router.classify(question)   # temp=0, JSON enforced at the model layer

Pulling routing out into its own deterministic step bought me two things I didn’t fully appreciate until later. The first is that it’s testable in isolation: I can run a fixed list of real questions through the router and assert where each one lands, without executing a single query. The creative step (writing SQL) and the step I need to be boringly predictable (deciding what the question is about) are now separable, and I can hold each to its own standard.

The second is the part I’d actually put on a slide: confidence as a first-class output, and the right to refuse. A classifier that always returns its top guess is a classifier that’s confidently wrong on every ambiguous question. “How did the weekend go?” could be revenue or footfall or labour cost. The honest answer is “I’m not sure which you mean,” and the only way to give that answer is to look at how sure the router actually is:

if decision.confidence < CUTOFF:           # hand-tuned, deliberately cautious
    return ask_user(decision.specialist, decision.alternative)
    # "Did you mean sales or staffing?" — one cheap question beats a wrong report.
dispatch(decision.specialist, question)

That CUTOFF is the whole philosophy in one constant. Below it, the system stops and asks a one-line clarifying question instead of charging ahead. It costs the user a round-trip; it saves them a confident answer to a question they didn’t ask. For non-technical users who can’t read the SQL to catch the mistake, that trade is worth it every time.

The honest cost of all this: it’s an extra LLM call on every single turn. More latency, more tokens, a second thing that can fail. I took that trade on purpose, because the alternative — folding routing back into generation to save the call — gives back the determinism, the testability, and the clean place to put the confidence check. I’d rather pay for a step I can reason about than save a call on a step I can’t.

If I were starting again I’d reach for the router on day one instead of discovering it the hard way. The single-prompt version isn’t a smaller version of the right design — it’s a different design that happens to look right until the schema grows past the size of a demo. The most useful thing I built into this system wasn’t a cleverer prompt. It was a component whose job includes knowing when it doesn’t know.¹

From a consulting project building natural-language analytics for restaurant businesses. Customer details, schema, and tuning constants are abstracted; the reasoning is as built. Code is illustrative.

Footnotes

Temperature zero on the router matters more than it sounds. You want the same question to route the same way every time — a router that occasionally changes its mind is a debugging nightmare, because now a bug reproduces only sometimes. Save the creativity for the step that writes SQL.↩︎