Google ClassroomGoogle Classroom
GeoGebraGeoGebra Classroom

Two-Variable Data: Scatterplots, Correlation, & Linear Regression

Scatterplots represent bivariate data by displaying them as points on a plane, and the position of a dot on the x and y axis indicates the value of a data point. Scatterplots determine the strength, form, and direction of the relationship of the data they display. Correlation described the strength of a linear association among bivariate data. It is a coefficient (r) with a value that is between -1 and +1. When the coefficient is exactly -1 or +1, that means the points are plotted along a straight line and are perfectly linear. The closer r gets to 0, the weaker the correlation is of the two variables. 

Hand Span & Number of Lollipops Grabbed

Activity Instructions: - Measure and record the distance from your thumb to your pinky finger in centimeters while your palm is facing up. - Without looking, grab as many lollipops as you can out of the bag with the same hand. Count the number of lollipops grabbed. - Input the measure of the hand span in the X column, and the number of lollipops grabbed in the Y column. - Observe the scatterplot display of hand spans (in cm) and the number of lollipops grabbed.
The objective of this applet is to understand the relationship between hand span and the number of lollipops grabbed by analyzing the data’s scatterplot and interpreting its correlation coefficient. Furthermore, the least squares regression line or “line of best fit” will be used to predict a corresponding y-value (number of lollipops grabbed) given a certain x-value (measure of hand span). The least squares regression line is a line that minimizes the sum of the squares of the residuals. A residual is the difference between the observed y-value and predicted y-value. The line of best fit is positioned through the scatterplot in a way that reduces the sum of the squares of the residuals. 
The equation of this line, known as the regression equation, is used to predict a y-value given a certain x-value. It is similar to the slope intercept form, . The value of m is equal to the product of the correlation coefficient (r) and the ratio of the standard deviation of y and x . Then, to solve for the y-intercept b, substitute (x̄,ȳ), . Substituting any value of x into returns a predicted value of y. 

- Interpret the resulting correlation coefficient and what it means for the direction and form of the variables' relationship.

- Given the scattered spread of the data, would it be appropriate to use the correlation coefficient to describe the relationship of the two variables? Why or why not?

- What effect could extreme or unusual values have on the correlation of two variables?

- Obtain the predicted value of lollipops grabbed given a hand span measure that ranges from 17-21 cm. Write your observation about what happens to the predicted number of lollipops grabbed as the measure of the hand span increases.

- Explain why there are cases when the predicted y-value does not match the real y-value.