Bias in Estimated Standard Deviations
- William C. Evans
- Standard Deviation
This graphic shows the Helmert distribution for standard deviations estimated at various sample sizes from a Normal distribution. There are two common estimators: one uses n (the sample size) in the denominator of the equation for s and the other uses (n-1). It is well-known that the "n-estimator" is biased, but it is not as well known that the (n-1) estimator is also biased. This bias arises due to the square-root transformation that is used in going from a variance estimate to a standard deviation estimate. For a good treatment of this issue, see Deming, W. E., Some Theory of Sampling, Wiley (1950), Chapter 15 and pp. 495-7; also see the table on p. 530. Dover has this book reprinted in 1984, ISBN 048664684X. There's a lot more statistics in that book than just sampling theory. If there were no bias in the estimated standard deviation, the distributions would have their mean at 1.0; the horizontal axis in the graphic is the ratio of the estimated to known standard deviation. The mean values are the vertical lines on the distributions; these mean values change with the sample size. The bias becomes very small at larger sample sizes, and the distributions become nearly Normal. It can be argued that the bias corrections are small, but they are worth considering at the small sample sizes used in quality control, or in lab exercises in schools. In QC, these correction factors are important enough to have their own symbols (c2 and c4). See, e.g., Wheeler, D. J., Advanced Topics in Statistical Process Control, SPC Press (1995), p. 58 (ISBN 0-945320-45-0) Also see Duncan, A. J., Quality Control and Industrial Statistics, 4th Ed., Irwin (1974), p. 139 and Appendix II, Table M (ISBN 0-256-01558-9). The idea of c2 and c4, which are obtained from the analytically-calculated mean values (expected values) of the respective distributions, is that the estimated s is to be divided by c2 (if n was used for s) or c4 (if n-1 was used for s) in order to obtain an unbiased estimate of the population standard deviation. Note that at small sample sizes there is a lot of variability in the values of s. The variances of these distributions can also be calculated and series approximations obtained. (See Deming.) The point is that a single estimate of s for n = 5, say, is not going to be a very reliable estimate of the population standard deviation. It should also be noted that the Helmert distribution is closely related to the "Chi" distribution; see, e.g., Johnson and Kotz, Distributions in Statistics: Continuous Univariate Distributions-I, Wiley (1970), p. 197 (ISBN 0-471-44626-2). For more information, see the Wikipedia article "Unbiased estimation of standard deviation."