The installs you can’t attribute

marketing analytics

MMM

mobile gaming

Attribution tools can only credit the install they directly touched. But paid spend also drives installs nobody clicked an ad for — word of mouth, chart rank, the friend who saw you playing. Measuring that halo, not paid ROI, is what I pointed a marketing mix model at.

Author

Umut Altun

Published

November 12, 2024

Here’s a number that should bother any UA team more than it does: a meaningful share of your “organic” installs aren’t organic. They were caused by your paid spend — just not in a way any attribution tool can see.

The mechanism is obvious once you say it out loud. You run a big Meta campaign. It drives paid installs, which attribution dutifully records. But it also pushes the game up the store charts, where new people discover it organically. Some of those paid users tell a friend, or just get seen playing. The campaign manufactured installs that nobody clicked an ad for — and your attribution tool, which can only credit an install it directly touched, files every one of them under “organic” and moves on. The spend gets none of the credit for the demand it actually created.

This is the halo effect, and it’s not a rounding error. If you optimize your UA purely on attributed paid ROI — which is what almost everyone does, because it’s what the dashboards show — you systematically underspend, because you’re crediting each channel only with the installs it directly touched and ignoring the wave of organic demand it set off behind them. You’re flying on an instrument that can’t see a chunk of the thing you’re trying to maximize.

So I stopped trying to measure paid ROI better and changed the question the model was answering. Instead of “how many paid installs did each channel get,” I pointed a marketing mix model at a different target entirely: organic installs. Not paid. Organic — the very installs attribution calls free — as the response variable, with paid spend per channel as the inputs.

That inversion is the whole idea, and it took me a while to be comfortable with how strange it looks. You’re regressing the thing you supposedly can’t buy onto the things you did buy, and asking: when paid spend moves, how much does the organic baseline move with it? Whatever the model can explain — the portion of organic installs that systematically rises and falls with paid spend, after adstock and saturation — is the halo. It’s paid-driven demand hiding in the organic numbers, and MMM can see it precisely because it works at the aggregate level, on correlations over time, instead of trying to trace individual clicks the way attribution does. Attribution asks “did this person touch an ad.” MMM asks “does organic move when spend moves.” Only the second question can find an install that nobody clicked.

The model decomposes organic installs into two parts. The baseline — the intercept, plus trend and seasonality — is the genuinely organic demand: what you’d get at zero paid spend, the brand, the back-catalogue, the season. The media contributions are the halo: the slice of organic installs that each channel’s spend is driving up on top of that baseline.

# target is ORGANIC installs. the decomposition splits them into the
# true baseline (what you'd get at zero spend) and the paid-driven halo.
contributions = create_media_baseline_contribution_df(
    media_mix_model=mmm,
    target_scaler=target_scaler,
    channel_names=channels,
)
# baseline  -> genuinely organic
# per-channel -> organic installs that paid spend manufactured

And then the number that actually changes the conversation — collapse it into a k-factor per channel, a virality multiplier:

# k = 1.15 means: every 100 paid installs from this channel come with
# ~15 organic installs in their wake that attribution credited to nobody.
k_factor = (paid_installs + halo_installs) / paid_installs

A k-factor of 1.0 means a channel buys exactly the installs it’s credited with and nothing spills over. Above 1.0 means it manufactures organic demand on top — and now you can compare channels on their true pull, paid plus halo, instead of the attributed-paid number that flatters the channels with no halo and punishes the ones quietly driving your charts. Two channels with identical attributed ROI can have very different real value once you count what they set off, and the team that knows that allocates budget differently from the team that doesn’t.

One caveat, and it’s a big one. This is a correlational decomposition, not a clean causal experiment. The model attributes the co-movement of organic installs and paid spend to the halo, but co-movement isn’t proof — a confounder that drives both (a seasonal surge, a press hit that coincided with a planned spend ramp) gets quietly absorbed into a channel’s contribution, and the model can’t tell that apart from genuine halo on its own. The gold standard for incrementality is a geo holdout or a proper lift test, where you actually withhold spend and watch what happens. MMM is the always-on, every-channel estimate you run when you can’t afford to stop spending everywhere to find out — and the right way to use it is to validate it against the occasional real lift test, lean hard on the priors, and read the k-factors as well-reasoned estimates rather than measured facts. I’d rather an honest estimate of the right quantity than a precise measurement of the wrong one, and attributed paid ROI is a precise measurement of the wrong one.

So your “organic” bucket is partly a measurement artifact — it’s where attribution files the demand it couldn’t trace, including the demand your own spend created. The most valuable thing a channel does might be the installs it doesn’t get credited for, and the only tools that can see those are the ones that stopped trying to follow the click.¹

From work on a marketing-analytics system for a mobile-gaming portfolio. Channels, k-factors, and numbers are abstracted; the reasoning is as built. Code is illustrative.

Footnotes

Why not just run lift tests everywhere and skip the modeling? Because a clean geo holdout means deliberately turning off spend in real markets and eating the lost installs to measure the counterfactual — expensive, slow, and politically hard to do on every channel every quarter. The pragmatic stack is both: occasional lift tests as ground truth, MMM as the continuous estimate calibrated against them. Neither alone is enough; the lift test is right but rare, the model is always-on but assumption-dependent.↩︎