33 lines
1.8 KiB
Markdown
33 lines
1.8 KiB
Markdown
(Chapter 8, STAT 1040)
|
|
|
|
# Correlation
|
|
## Scatter Diagrams
|
|
A scatter diagram or scatter plot shows the relationship between two variables. One variable is on the X axis, the other on the Y axis.
|
|
|
|
If a scatter diagram is football shaped, it can be summarized using the 5-number summary:
|
|
|
|
| Variable | Description |
|
|
| -- | -- |
|
|
| $\mu_x$ | The average of the set graphed along the X axis |
|
|
| $\sigma_x$| The standard deviation of set graphed along the X axis |
|
|
| $\mu_y$ | The average of the set graphed along the Y axis |
|
|
| $\sigma_y$ | The standard deviation of the set graphed along the Y axis |
|
|
| $r$ | The correlation coefficient, or how closely clustered the datapoints are in a line |
|
|
|
|
The intersection of the averages of x and y will be the center of an oval shaped scatter diagram. Draw lines $2\sigma$ (will contain ~95% of all data) from the center along each axis to generalize the shape of a scatter plot.
|
|
|
|
You can appr
|
|
### Association
|
|
- Positive association is demonstrated when the dots are trend upward as $x$ increases ($r$ is positive).
|
|
- Negative association is demonstrated when the the dots trend downward as $x$ increases ($r$ is negative).
|
|
- Strong association is demonstrated when dots are clustered tightly together along a line ($|r|$ is closer to 1).
|
|
- Weak association is demonstrated when dots are not clustered tightly. ($|r|$ is closer to 0)
|
|
## Correlation
|
|
Correlation is between `-1` and `1`. Correlation near 1 means tight clustering, and correlation near 0 means loose clustering. $r$ is -1 if the points are on a line with negative slope, $r$ is positive 1 if the points are on a line with a positive slope. As $|r|$ gets closer to 1, the line points cluster more tightly around a line.
|
|
|
|
# Terminology
|
|
| Term | Definition |
|
|
| -- | -- |
|
|
| $r$ | Correlation Coefficient |
|
|
| Linear Correlation | Measures the strength of a line |
|