Dispersion
Dispersion indicators aim to express the amount of variation in a data. It could also be thought as how tightly or loosely the values are distributed in distribution.
Categorical variables
Dispersion indicators are not usually used with categorical variables.
Numerical variables
Range is the difference between the maximum and the minimum values. It cannot decrease, if observations are added to the data. Extreme values affect a lot to range. The range in Data 1 is 172cm - 138 cm = 34cm.
Interquartile range (IQR, qr) is the difference of the upper and the lower quartile. At least 50 % of the values are at this interval. It is not as sensible for extreme values as range. Quartile deviation is Both of them can change in both directions, if observations are added to the data. In Data 1, interquartile range is and quartile deviation is
Deviation of an observation i is the difference of the value and the mean: When adding up deviations, positive and negative values cancel out each other meaning, that dispersion would be underestimated. For that reason, absolute values of deviations should be used (notation refers for adding):
When adding up the deviations, the number of observations is significant. Although the deviations would be about the same magnitude, the sum would be much more for 1000 observations as for 50 observations. In order to be comparable, the sums should be divided with the number of observations to get mean deviation:
When squared differences are used instead of absolute values, it is called as variance. Variance do not depend on location. Variance for the whole population or a large sample is marked with :
where the number of observation is n.
If the sample is small (less than 30), then a divisor is n - 1 and variance is marked with :
In variance, the squared values are used for deviation. It means that unit of the result is also squared. The result is much easier to understand, if the square root is taken. In that case, it is called as a standard deviation.
For Data 1, variance is
Thus, the standard deviation is Because data is small (less than 30), the latter formula is used. Try this one with spreadsheet computation!
If deviations of the same property in different units is compared, then coefficent of variation can be used:
It can be solved, if the variables are at least in ratio scale.
Example: One year, the mean annual income in USA was $20000 and standard deviation $10000. At the same year, the mean annual income in Great Britain was £6000 and standard deviation £4000. Comparing just standard deviations, the deviation in USA seem to be larger. The coefficient of variation for incomes in USA is
and in the Great Britain is 0.67. Deviation is larger in the Great Britain.
Comparisons may also be made in the same way for variables in the same unit if there is a large variation in their magnitude. For example, comparing the standard deviation of weight of mice with the standard deviation of the weight of elephants does not make sense without a coefficient of variation.
