Teaching an analytics agent when not to ask

agentic AI

LLM

Text2SQL

Getting an agent to ask for clarification instead of guessing is the easy half. The hard half is teaching it to stop asking — because an agent that interrogates you on every turn is one nobody uses.

Author

Umut Altun

Published

January 12, 2026

“How am I doing this week?” is a real question that real users ask, and it’s missing almost everything you’d need to answer it. Which metric — revenue, footfall, margin? Which location? Compared to what? The naïve agent answers anyway: it silently picks a metric and a scope and hands back a confident chart of something the user didn’t quite ask for. A guess dressed up as an answer.

So the first fix is obvious and correct: before writing SQL, check whether the question is actually answerable, and if it isn’t, ask. Is there a time range? A scope? A metric? If something essential is missing, the agent says “over what period — this week, this month?” instead of inventing one. The router could already refuse when it wasn’t sure which domain a question belonged to; this is the same instinct one level down — refuse to proceed on missing parameters rather than fabricate them.

I shipped that, felt good about it, and watched it become annoying within a day.

Because the completeness check, applied naïvely, asks every time. The user types “show me last 30 days of sales,” gets their answer, and then types “what about by branch?” — and the agent, evaluating that second question in isolation, sees no time range and asks “over what period?” The user already said. Thirty seconds ago. They said it. Now they’re answering the same question again, and the magic of “just ask in plain language” has curdled into a form with a chatbot’s manners. An agent that re-interrogates you on every turn isn’t careful, it’s exhausting, and exhausting tools get abandoned no matter how correct they are.

The real requirement, it turned out, wasn’t “ask when something’s missing.” It was “ask when something’s missing and can’t be recovered from what was already said.” That’s a much narrower trigger, and it’s the difference between an assistant and a bureaucrat.

So the agent carries a short conversational memory — the parameters from recent turns — and a missing parameter is inherited from context before it’s ever treated as missing:

# session-scoped, bounded: LRU over active sessions, TTL eviction
state.recent = [...]          # parameters resolved on previous turns

def resolve(question):
    params = extract(question)
    for slot in REQUIRED:                 # time range, scope, metric, ...
        if slot not in params:
            params[slot] = state.recent_value(slot)   # inherit before asking
    missing = [s for s in REQUIRED if s not in params]
    return ask_user(missing) if missing else params

“What about by branch?” now inherits the 30-day window from the turn before and just answers. The agent only stops to ask when a slot is genuinely unrecoverable — when you’ve changed the subject, or when you never specified it in the first place and context offers no hint. Knowing when not to ask turned out to be as much of the design work as knowing when to.

The piece I’d flag for anyone building this: keep that state bounded, and treat that as a feature, not a limitation. The agents are otherwise stateless, which is exactly what lets them scale horizontally — any instance can handle any request. The moment you bolt on unbounded conversational memory, you’ve quietly introduced a coordination problem and a slow memory leak. So the session memory is an LRU with a TTL: it remembers enough recent turns to not be annoying, and forgets aggressively enough to stay cheap and stateless-ish. Bounded memory is the compromise between “useful in a conversation” and “doesn’t become a stateful service I have to operate.”¹

It’s not perfect, and the failure mode is instructive. Occasionally it inherits a parameter the user did mean to drop — they’ve moved on to a new question, but it’s phrased similarly enough that the old time range carries over silently. That’s the mirror image of the original bug: now it assumes by over-remembering instead of by guessing from nothing. The honest fix is to make the agent surface what it inherited — “for the same 30-day window:” as a quiet prefix on the answer — so a wrong inheritance is visible and correctable rather than silent. Visible-and-correctable beats silent — that keeps coming up across this whole project, and probably deserves its own post.

What I keep relearning: the impressive-sounding behaviour — an agent that asks smart clarifying questions — is the easy 80%. The last 20% — knowing when to not exercise the impressive behaviour — is what separates something people demo from something people use.

From a consulting project building natural-language analytics for restaurant businesses. Customer details, schema, and constants are abstracted; the reasoning is as built. Code is illustrative.

Footnotes

Why LRU + TTL specifically: active sessions stay warm, idle ones evict themselves, and there’s a hard ceiling on how much conversational state exists at once. If I ever needed sessions to survive across instances I’d reach for an external store, but that’s a real operational cost and I didn’t have the problem — most conversations are short and bursty.↩︎