Dispersion - kopio
Categorical variables
Frequency distribution is the summary of the values of a variable based on the frequencies with which they occur.
Numerical variables
Range is the difference between the maximum and the minimum values. It cannot be smaller, if observations are added to the data. It is very sensitive to extreme values.
Interquartile range (IQR, qr) is the difference of the upper and the lower quartile. At least 50 % of the values are at this interval. Quartile deviation is . Both of them can change in both directions, if observations are added to the data.
Deviation of an observation i is the difference of the value and the mean: When adding up deviations, positive and negative values cancel out each other meaning, that dispersion would be underestimated. For that reason, absolute values of deviations should be used:
When adding up the deviations, the number of observations is significant. Although the deviations would be about the same magnitude, the sum would be much more for 1000 observations as for 50 observations. In order to be comparable, the sums should be divided with the number of observations to get mean deviation:
When squared differences are used instead of absolute values, it is called as variance. Variance do not depend on location. If variance is known for the whole population, the number of observation is n and it is marked with :
If the sample is small, then a divisor is n - 1 and variance is marked with :
The divisor n - 1 makes the estimator to be unbiased, meaning that values of the estimator are stable around the estimated parameter.
In variance, the squared values are used for deviation. It means that unit of the result is also squared. The result is much easier to understand, if the square root is taken. In that case, it is called as a standard deviation.
If deviations of the same property in different units is compared, then coefficent of variation can be used:
It can be solved, if the variables are at least in ratio scale.
Example: One year, the mean annual income in USA was $20000 and standard deviation $10000. At the same year, the mean annual income in Great Britain was £6000 and standard deviation £4000. The coefficient of variation for incomes in USA is 0.5 and in the Great Britain is 0.67. Deviation is larger in the Great Britain.
The same kind of comparison could be done for weights of mice and elephants.
