notes/education/statistics/Correlation and Regression.md
2023-12-13 14:18:19 -07:00

1.8 KiB

(Chapter 8, STAT 1040)

Correlation

Scatter Diagrams

A scatter diagram or scatter plot shows the relationship between two variables. One variable is on the X axis, the other on the Y axis.

If a scatter diagram is football shaped, it can be summarized using the 5-number summary:

Variable Description
\mu_x The average of the set graphed along the X axis
\sigma_x The standard deviation of set graphed along the X axis
\mu_y The average of the set graphed along the Y axis
\sigma_y The standard deviation of the set graphed along the Y axis
r The correlation coefficient, or how closely clustered the datapoints are in a line

The intersection of the averages of x and y will be the center of an oval shaped scatter diagram. Draw lines 2\sigma (will contain ~95% of all data) from the center along each axis to generalize the shape of a scatter plot.

You can appr

Association

  • Positive association is demonstrated when the dots are trend upward as x increases (r is positive).
  • Negative association is demonstrated when the the dots trend downward as x increases (r is negative).
  • Strong association is demonstrated when dots are clustered tightly together along a line (|r| is closer to 1).
  • Weak association is demonstrated when dots are not clustered tightly. (|r| is closer to 0)

Correlation

Correlation is between -1 and 1. Correlation near 1 means tight clustering, and correlation near 0 means loose clustering. r is -1 if the points are on a line with negative slope, r is positive 1 if the points are on a line with a positive slope. As |r| gets closer to 1, the line points cluster more tightly around a line.

Terminology

Term Definition
r Correlation Coefficient
Linear Correlation Measures the strength of a line