Two tenants don’t need multi-tenancy

architecture
Text2SQL
production ML
When a second customer arrived, I reached for the textbook shared-schema multi-tenant architecture — and then noticed it would make data isolation depend on an LLM getting a WHERE clause right every single time. On choosing the architecture the problem needs, not the one that looks good on a whiteboard.
Author

Umut Altun

Published

November 9, 2025

The second customer changed the shape of the problem.

With one customer, a Text2SQL agent over their data is just an app. With two — and a third in the pipeline — it’s a multi-tenant system, and the one truly non-negotiable requirement is that no customer ever sees another’s numbers. A wrong chart is embarrassing. One restaurant seeing another’s revenue is the end of the contract.

So I did the responsible thing and sketched the proper multi-tenant architecture. One shared schema, a tenant_id on every table, row-level security, every query scoped by tenant. It’s the textbook answer, it’s efficient, and it scales to thousands of tenants. I had it on the whiteboard before I noticed I was about to make a serious mistake.

Here’s what I’d glossed over. In this system the queries aren’t written by me. They’re written by an LLM, at runtime, from a user’s plain-language question. The shared-schema design makes cross-tenant isolation a property of every generated query getting its WHERE tenant_id = … exactly right. And I’d just spent a week teaching that same model not to invent branch names. I was now proposing to make data isolation — the one thing I could not afford to get wrong — depend on the model’s discipline on every query, forever. One dropped predicate, one creative join, and tenant A is reading tenant B’s books.

You can defend against that, of course. You wrap generation in a layer that force-injects the tenant filter, you audit, you write tests. But now the most important guarantee in the whole system lives in a predicate I have to enforce correctly on every single LLM-written query for the life of the product. That’s an enormous surface area for something whose spec is “must never happen.”

So I didn’t build it. For two tenants — for the realistic near future of a handful — I went the other way: physical isolation. Each tenant gets its own BigQuery dataset, its own auth, and its own deployment of the same codebase, with the configuration swapped per tenant.

# one codebase, per-tenant config — isolation lives in the deployment,
# not in a WHERE clause the model has to remember every time
TENANT = load_config(os.environ["TENANT_ID"])
warehouse = Warehouse(project=TENANT.project, credentials=TENANT.creds)

# this instance's credentials can only reach this tenant's dataset.
# there is no cross-tenant query for the LLM to accidentally write —
# the data it shouldn't see isn't reachable from where it's running.

The difference is where the guarantee lives. In the shared-schema design, isolation is something the application has to do correctly on every request. In the per-tenant design, isolation is something the infrastructure makes impossible to violate. The model can write the worst query it likes; the credentials it runs under physically cannot reach another tenant’s data. I moved the guarantee from “we’ll get the filter right every time” to “there is nothing to get wrong.”

The trade-off is real, and it cuts the other way at scale. Physical isolation does not scale to thousands of tenants — past some point, standing up a deployment and a dataset per customer is the bottleneck, and the shared-schema model with its single efficient footprint wins decisively. It also costs more per tenant up front: a couple of managed containers and a dataset, tens of dollars a month each, multiplied out. Shared-schema amortizes all of that away.

But the failure modes are asymmetric, and that’s what settles it at my scale. The shared-schema model’s failure mode is cross-tenant leakage — the catastrophic one — and it sits one bug away at all times. The per-tenant model’s failure mode is “this got expensive and operationally annoying somewhere around N tenants” — a problem you watch approach for quarters and migrate against on your own schedule. One failure ends a contract; the other shows up on a cost dashboard. When the downsides are that lopsided, you don’t pick the architecture that’s elegant at a scale you don’t have. You pick the one whose worst case you can actually live with.

The mistake I almost made wasn’t technical. It was reaching for the architecture I’d be proud to have built — the one that handles thousands of tenants, the one that looks right on a whiteboard — instead of the one the problem in front of me actually needed. Two tenants don’t need multi-tenancy. They need to not see each other’s data, which is a different and much smaller requirement, and the smaller requirement has a much safer answer.

If this grows to fifty tenants I’ll revisit it, and shared-schema will probably win — by then the operational cost is real and the discipline to enforce tenant scoping properly is worth building. Recognizing which point on that curve you’re actually standing on, and resisting the pull toward the architecture that’s correct one curve over, is most of the job. I don’t always get it right. I got it right this time mostly because putting an LLM in the loop made the elegant option visibly scary.1


From a consulting project building natural-language analytics for restaurant businesses. Customer details, infrastructure specifics, and numbers are abstracted; the reasoning is as built. Code is illustrative.

Footnotes

  1. This isn’t an argument against row-level security in general — it’s excellent when humans or trusted application code write the queries. The specific thing that spooked me was putting an LLM-generated WHERE clause on the critical path of data isolation. Different threat model, different answer.↩︎