Distinguishing Different Data?

We have been analyzing data for over half the semester now. We have looked at charts, graphs, tables and lists of numbers in context. We have talked about means, medians, and modes and used fancy words like "trend" and "symmetric" to describe data. But how important are each of these concepts? Do you think it is possible to know how a data set looks WITHOUT graphing it? Can the math alone tell us everything we need to know about our data? In this worksheet, we will be exploring a special collection of data sets developed by a French statistician in 1973 intended to attack the idea among statisticians that "numerical calculations are exact, but graphs are rough." Answer the questions below and please ask Ms. Shoemaker if you're stuck.

Statistics vs. Graphing

In the box below the scatterplot, identify the statistical descriptors that you would need to describe this data set to someone in order for them to recreate the trend. What would someone else need to know about these data points in order to understand what is happening WITHOUT looking at the graph?

Would things like the mean, linear regression equation, standard deviation or correlation be needed for someone to fully understand the data without the graph?

Statistical Descriptors

What elements are included in a Five Number Summary?

Select all that apply
  • A
  • B
  • C
  • D
  • E
  • F
  • G
  • H
  • I
  • J
  • K
  • L
  • M
Check my answer (3)

If I gave you a Five Number Summary about a particular data set, do you think that is enough information for you to determine what that set would look like when plotted? Explain why or why not.

Statistical Descriptors



We've talked a lot about Linear Regression lately and should be familiar with this formula:

y = a + b x

What is the "a" and "b" in this equation?

Select all that apply
  • A
  • B
  • C
  • D
Check my answer (3)

If I gave you a Linear Regression equation for a particular data set, do you think that is enough information for you to determine what that set would look like when plotted? Explain why or why not.

Statistical Descriptors

We have also discussed the idea of correlation. A correlation coefficient is given by "r" and ranges from -1 to 1. What would a correlation of -.976 tell us about the trend of the data?

Select all that apply
  • A
  • B
  • C
  • D
Check my answer (3)

If I gave you the correlation coefficient for a data set, do you think that is enough information for you to determine what that set would look like when plotted? Explain why or why not.

The French statistician aforementioned at the beginning of this worksheet was named Francis Anscombe, and he created 4 distinct data sets consisting of 11 points that show how important graphing and plotting data is. Even with all the basic statistical analysis methods that we have learned, the most vital tool for data analysis remains the basic graph. Click on the graph below* to see how each of the data sets has the same basic statistical descriptors but are very different when plotted. Take note that they each have the same mean, standard deviation and correlation. *this interactive graph was sources from GeogebraTube and modified slightly.

Make Your Own!

Now use the last slide to try and create your own data set that has the same descriptive statistics. Move the red dots around until you have data with the same mean, standard deviation and correlation.

Each of the 4 data sets are unique, yet all of them share the basic statistical descriptors that we have been learning all semester. Which data set is your favourite? Please explain.

Now that we have discussed all of our descriptive statistical methods, which one(s) do you think give us the best understanding of the data? In other words, what mathematical information do we need in order to analyze different data sets? Explain why knowing this information would allow you to best draw conclusions.