1.8 KiB
(Chapter 8, STAT 1040)
Correlation
Scatter Diagrams
A scatter diagram or scatter plot shows the relationship between two variables. One variable is on the X axis, the other on the Y axis.
If a scatter diagram is football shaped, it can be summarized using the 5-number summary:
Variable | Description |
---|---|
\mu_x |
The average of the set graphed along the X axis |
\sigma_x |
The standard deviation of set graphed along the X axis |
\mu_y |
The average of the set graphed along the Y axis |
\sigma_y |
The standard deviation of the set graphed along the Y axis |
r |
The correlation coefficient, or how closely clustered the datapoints are in a line |
The intersection of the averages of x and y will be the center of an oval shaped scatter diagram. Draw lines 2\sigma
(will contain ~95% of all data) from the center along each axis to generalize the shape of a scatter plot.
You can appr
Association
- Positive association is demonstrated when the dots are trend upward as
x
increases (r
is positive). - Negative association is demonstrated when the the dots trend downward as
x
increases (r
is negative). - Strong association is demonstrated when dots are clustered tightly together along a line (
|r|
is closer to 1). - Weak association is demonstrated when dots are not clustered tightly. (
|r|
is closer to 0)
Correlation
Correlation is between -1
and 1
. Correlation near 1 means tight clustering, and correlation near 0 means loose clustering. r
is -1 if the points are on a line with negative slope, r
is positive 1 if the points are on a line with a positive slope. As |r|
gets closer to 1, the line points cluster more tightly around a line.
Terminology
Term | Definition |
---|---|
r |
Correlation Coefficient |
Linear Correlation | Measures the strength of a line |