MMM wants two years of data; I had two months
Every serious treatment of marketing mix modeling tells you the same thing: bring two or three years of data, ideally with real variation in spend, ideally a few natural experiments where you turned a channel off and watched. It’s sound advice. It also describes a situation I, and almost everyone actually doing this, rarely have. New games, new markets, a measurement need that’s urgent now — the request lands with a fraction of the history the textbook demands. The textbook answer is “then don’t run an MMM.” The job is to do something useful anyway, and the hard part is doing it without lying.
It helps to be precise about why MMM needs so much. You’re not fitting a line. You’re locating, per channel, an adstock decay and a saturation curve plus a coefficient, all at once, from channels that move together so the data can barely tell them apart. That’s a lot of curve to pin down, and the information to do it comes from variation — spend going up while another goes down, a channel pausing, a budget shock. Calendar time isn’t really what you’re short of. Variation is. Sixty quiet days where every channel drifts up together contain almost no information about any individual channel’s curve, no matter how many rows it is.
The wrong response — the tempting one, because it always produces output — is to let the model run regardless and report whatever it returns. It will return something. A maximum-likelihood fit always hands you numbers; Bayesian sampling always hands you a posterior. The numbers will be precise-looking and the dashboard will render them and a UA lead will, reasonably, treat them as real and move budget. That’s the actual danger of data-hunger: not that the model errors out, but that it returns a confident answer indistinguishable on the surface from a good one, built on data that couldn’t possibly support it. A model that fails loudly is safe. A model that fails quietly, with a clean-looking number, is how thin data becomes a bad decision.
So the design principle I leaned on is that the model has to know when it knows nothing, and say so. Concretely, two things.
First, a graceful fallback to the null answer. When a market is below the data it’d need — too few installs, too little spend movement — the system doesn’t fit a heroic model on noise. It returns the honest default: a k-factor of 1.0, “no measurable halo,” no claim of lift it can’t support.
# below the data floor, the model declines to invent a signal.
# k = 1.0 means "no halo detected", not "we measured zero halo".
if not enough_signal(market):
return Result(k_factor=1.0, basis="insufficient_data")
fit_mmm(market)The distinction in that comment is the entire ethic. “No halo detected” is an admission of ignorance; “we measured zero halo” is a measurement. The fallback is making the first claim, never the second — and tagging the output with its basis, so downstream nobody mistakes a default for a finding.
Second, and this is what makes the fallback honest rather than just convenient: the priors do double duty. In the Bayesian setup, when the data is too thin to say much, the posterior simply stays near the prior — which is the correct behaviour, as long as you read it correctly. A posterior sitting on top of its prior isn’t the data confirming your belief. It’s the data having nothing to add, and the model honestly reporting your starting assumption back to you. The failure isn’t the model returning the prior. The failure is you reading “posterior ≈ prior” as a result. So the discipline is to always check how far the posterior actually moved, and treat “it didn’t move” as a signal that this market can’t support a model yet — not as confirmation that you were right all along.
And the maturity of a modeling system isn’t in how well it performs when data is rich — it’s in how it behaves when data is poor. Anyone can fit a clean model to two years of varied history. The professional question is what your system does on the market with sixty flat days, because that’s where the quiet, confident, wrong numbers come from. A system that degrades to “I don’t have enough to say” is worth more than one that always produces a number, because the always-a-number system is indistinguishable from the honest one exactly when it’s lying. Building the model was the easy part. Building it to know the edge of its own competence, and stop at it, was the part that made it safe to put in front of someone spending real money.1
From work on a marketing-analytics system for a mobile-gaming portfolio. Thresholds, channels, and numbers are abstracted; the reasoning is as built. Code is illustrative.
Footnotes
What counts as “enough signal” is itself a judgment I tuned rather than derived, and I won’t pretend otherwise — it’s a threshold on volume and on spend variation, set conservative, checked against markets where I had enough data to know the right answer and could see where the thin-data version started diverging from it. A more principled version would lean on the posterior’s own width — if the credible interval on a channel’s effect spans everything from “useless” to “incredible”, that is the model telling you it doesn’t know, in a language more honest than any threshold I’d hard-code.↩︎