Every retention curve has a kink, and one power law can’t fit it

LTV
production ML
mobile gaming
My single power-law fit was good everywhere except the first week and the long tail — the two places that actually decide a twelve-month LTV. The fix was to stop fitting through the kink and start fitting around it.
Author

Umut Altun

Published

June 20, 2023

I’d settled on a two-parameter power law for retention, and for a while I was happy with it. Then I started overlaying the fitted curve on the actual data, cohort by cohort, and noticed it was good everywhere except two places: the first week, and the long tail. Which is an awkward way of saying it was good nowhere that mattered, because those two regions are exactly what a twelve-month LTV extrapolation hangs on.

The fit was splitting the difference. It ran a little high through the brutal early drop and a little low through the slow tail, landing a compromise curve that was wrong in both directions at once and right mainly in the middle, where I needed it least.

Once I stopped staring at the residuals and started thinking about the players, the reason was obvious. A retention curve isn’t one process, it’s two stuck together:

Those are two different decay regimes with two different exponents, and forcing one power law across both makes it average them — fitting through the kink between them instead of respecting it. The early steepness drags the tail estimate down; the tail’s flatness drags the early estimate up. You can’t win with one curve because there isn’t one curve.

So I fit piecewise — a separate power law in each regime — and made the number of pieces adapt to how much data the cohort actually has, because the failure mode at the other end is just as real: split a sparse cohort into three segments and you’re fitting noise three times instead of once.

n = len(observed_days)
if n < 21:
    segments = [(1, horizon)]                     # sparse: one fit, don't overfit
elif n < 45:
    segments = [(1, 8), (8, horizon)]             # split at the early/late knee
else:
    segments = [(1, 8), (8, 31), (31, horizon)]   # early drop, mid, long tail

# fit a power law within each segment; if a segment's fit fails to
# converge, fall back to a single full-span fit rather than emit garbage

A young cohort gets one honest fit. A mature one gets up to three, carving the early flush, the mid settling, and the long tail into their own curves. The kink at day 8 stops being an error the model fights and becomes a boundary the model respects.

Two things I’d flag as judgment calls rather than truths. The knee at day 8 I found by eye — overlaying fits across many games and noticing the early regime consistently gave out around there. It’s not sacred; a more principled system would detect the knee per cohort instead of hard-coding it, and a game with an unusual onboarding would want a different boundary. I used a fixed split because it was robust across the portfolio and I could reason about it, and a wrong-but-predictable boundary beats a clever knee-detector that occasionally finds a knee in noise. The other call is the fallback: a segment fit that doesn’t converge doesn’t get to emit nonsense, it collapses back to the single full-span fit. Degrade to the simpler model, never to a confident wrong one.

I’ve since seen the same thing well outside retention curves. When a model is wrong in a structured, repeatable way — high here, low there, every time — that’s not noise to tune away. It’s the data telling you your functional form is too simple for the process. The fix usually isn’t a fancier model. It’s looking at what’s actually generating the data, noticing it’s two things wearing a trenchcoat, and giving each one its own simple model.1


From work on a cohort-LTV system for a mobile-gaming portfolio. Specifics, parameters, and numbers are abstracted; the reasoning is as built. Code is illustrative.

Footnotes

  1. The adaptive piece count matters more than the exact thresholds. The principle — more data earns more flexibility, less data forces more humility — is what generalizes; the specific cutoffs are just where it landed for this portfolio after staring at a lot of fits. If you lifted this wholesale onto a different retention shape without re-checking the knee, you’d deserve what you got.↩︎