On statistics

A friend asked me today what the standard deviation means of something that’s not normally distributed.

I had to answer “not terribly much”: an average, and a standard deviation, are good measures when things follow the normal distribution, where things cluster around a center point.

So what, then, is the right tool for his case, a long tail distribution? Most of his users last a certain number of months of service, and then each successive lengthening of the term has fewer and fewer users. I suggested percentiles or quartiles — show what that long tail looks like, and see where most of the users fall, where most is some interesting portion like 1/2 or 2/3.

All this comes down to estimating average lifetime revenues of customers of a business that isn’t all that old nor all that huge. It means the margins for error are larger, thanks to the relatively small populations.

At some point, I’ll have to revisit this post and add some graphs.