A.3.3 Lesson Summary
- Katie Akesson
An association between two variables means that the two variables are statistically related to each other. For example, we might expect that ice cream sales would be higher on sunny days than on snowy days. If sales were higher on sunny days than on snowy days, then we would say that there is a possible association between ice cream sales and whether or not it is sunny or snowing. When dealing with categorical variables, row or column relative frequency tables are often used to look for associations in the data. Here is a two-way table displaying ice cream sales and weather conditions for 41 days for a particular creamery. Noticing a pattern in the raw data can be difficult, especially when the row or column totals are not the same for different categories, so the data should be converted into a row or column relative frequency table to better compare the categories. For the creamery, notice that the number of days with low sales is about the same for the two weather types, which contradicts our intuition. In this case, it makes sense to look at the percentage of days that sold well under each weather condition separately. That is, consider the column relative frequencies. From the column relative frequency table, it is clear that most of the sunny days resulted in sales of at least 50 cones (73%), while most of the snowy days resulted in fewer than 50 cones sold (64%). Because these percentages are quite different, this suggests there is an association between the weather condition and the number of cone sales. A bakery might wonder if the weather conditions impact their muffin sales as well. For the bakery, it seems there is not an association between weather conditions and muffin sales, since the percentage of days with low sales are very similar under the different weather conditions, and the percentages are also close on days when they sold many muffins. Using row or column relative frequency tables helps organize data so that columns (or rows) can be easily compared between different categories for a variable. This comparison can be accomplished using a two-way table or a two-way relative frequency table, but it requires you to account for the differences in the number of data values in a given category.