Marketing isn’t linear, and neither is my model

marketing analytics

MMM

production ML

Two assumptions quietly wreck a marketing mix model: that a dollar spent today only matters today, and that the millionth dollar works as hard as the first. Adstock and saturation are how you fix both — and why MMM is a curve-fitting problem, not a regression.

Author

Umut Altun

Published

September 17, 2024

A plain regression of installs on spend makes two assumptions so obviously false that it’s a small miracle it works at all. It assumes a dollar spent on Tuesday affects Tuesday’s installs and nothing else. And it assumes the ten-thousandth dollar of the day drives exactly as many installs as the first. Both are wrong, in opposite directions, and getting marketing mix modeling to be useful is mostly about replacing each one with something that matches how advertising actually behaves.

Take the timing one first. You run a burst of TikTok spend today. Some installs land today — but some land tomorrow, and the day after, as people who saw the ad get around to acting on it, as the creative circulates, as the algorithm keeps serving it. The effect of today’s spend is smeared forward over the following days, decaying as it goes. A model that credits today’s spend only with today’s installs misreads this completely: it sees spend, then a lagged bump in installs it can’t connect to the cause, and it either misses the effect or pins it on whatever else happened to move that day.

The fix is adstock (carryover): before the spend ever enters the model, you transform it so each day inherits a decaying echo of the days before it.

# geometric adstock: today carries a fading memory of prior spend.
# lam ~ 0  -> effect is almost entirely same-day
# lam ~ 0.8 -> a long tail; today's spend still matters a week later
def adstock(spend, lam):
    out = np.zeros_like(spend)
    out[0] = spend[0]
    for t in range(1, len(spend)):
        out[t] = spend[t] + lam * out[t - 1]
    return out

The decay rate isn’t something you set — it’s something you learn, and that’s the point. A channel with a long carryover (brand-ish, slow-burn) and one with an instant, all-same-day response get different decay rates, and the data tells you which is which. In the Bayesian setup this lam is just another parameter with a prior, inferred alongside everything else.

Now the second assumption, which is worse, because it’s the one that makes the model’s advice dangerous. Advertising saturates. The first slice of budget on a channel hits the cheapest, most responsive users; as you pour in more, you’re reaching down into people who are progressively less interested, and each additional dollar buys fewer installs than the last. The response of installs to spend isn’t a line, it’s a curve that bends over — steep at first, flattening toward a ceiling. Diminishing returns, the most reliable empirical fact in all of performance marketing.

A linear model cannot represent this, and the failure isn’t academic. If installs-per-dollar is a constant slope, the model thinks the channel never saturates — so its honest recommendation is put infinite budget here, because the marginal return never drops. Every linear MMM, asked where to spend more, will eventually tell you to bet everything on one channel, because it has no concept of “full.” That’s not a minor glitch — it’s the model confidently recommending the one thing every marketer knows is wrong.

So spend goes through a saturation curve — a Hill curve, an S-shaped transform with parameters for where the bend sits and how sharp it is — and only then does it hit the linear part of the model:

# Hill saturation: turns spend into effective spend, with a ceiling.
# half-saturation point K and steepness s are LEARNED per channel.
def hill(spend, K, s):
    return spend**s / (spend**s + K**s)

The pipeline per channel is the composition of the two: raw spend → adstock (smear it forward in time) → saturation (bend it for diminishing returns) → then a linear coefficient. Which is the thing that finally made MMM click for me: it isn’t a regression with marketing-flavoured variables. It’s a structured curve-fitting problem where the shape of each curve is the marketing knowledge. The adstock decay encodes “how long does this channel’s effect linger,” the saturation curve encodes “how fast does this channel get tired,” and the linear coefficient on top is almost the least interesting parameter in the whole stack.

The cost is that you’ve added two nonlinear parameters per channel, and they trade off against each other in ways that make the fit harder and the identifiability problem sharper — a strong-carryover-low-saturation curve and a weak-carryover-high-saturation one can mimic each other over a short window. This is exactly why the priors matter and why MMM is data-hungry; you’re asking the data to locate several curves at once. But there’s no shortcut worth taking. The linearity assumptions aren’t a simplification you can defend as “good enough” — one of them blinds the model to lagged effects and the other makes it recommend infinite spend. The curves aren’t sophistication for its own sake. They’re the minimum required to not be actively wrong.

I’ve kept one habit from this: when a model gives obviously broken advice, look at the shape it’s assuming before anything else. A linear model recommending infinite budget isn’t mis-tuned. It’s faithfully reporting that a straight line has no top — and the fix is to give it a curve that does.¹

From work on a marketing-analytics system for a mobile-gaming portfolio. Channels, parameters, and numbers are abstracted; the reasoning is as built. Code is illustrative.

Footnotes

Adstock and saturation interact with order, and the order is a genuine modeling choice, not a detail. Adstock-then-saturation (smear time first, then bend) says the carryover accumulates as raw attention and then saturates; saturation-then-adstock says each day saturates on its own and the saturated effects carry forward. They give different curves, the libraries pick a convention, and it’s worth knowing which one you’ve signed up for rather than discovering it in a posterior you can’t explain.↩︎