diff --git a/education/statistics/Correlation and Regression.md b/education/statistics/Correlation and Regression.md index a29b604..f4db2e4 100644 --- a/education/statistics/Correlation and Regression.md +++ b/education/statistics/Correlation and Regression.md @@ -99,6 +99,13 @@ $$ \hat{y} = \frac{x-\bar{x}}{\sigma_x} * r * \sigma_y + \bar{y} $$ - For a positive association, for every $\sigma_x$ above average we are in $x$, the line predicts $y$ to be $\sigma_y$ standard deviations above y.x - There are two separate regression lines, one for predicting $y$ from $x$, and one for predicting $x$ from $y$ - Do not extrapolate outside of the graph + +(Ch 12, stat 1040) +Predicting a y value for a given x value can be calculated when given the regression equation. +$$ y = mx + b $$ +Where $y$ is the predicted value, $m$ is the slope, x is the given val and the $b$ is the intercept. +$$ slope = \frac{r * \sigma_y}{\sigma_x} $$ + ### The Regression Effect - In a test-retest situation, people with low scores tend to improve, and people with high scores tend to do worse. This means that individuals score closer to the average as they retest. - The regression *fallacy* is contributing this to something other than chance error. @@ -114,9 +121,13 @@ $$ \sqrt{1-r^2}(\sigma_y) $$ - To approximate the R.M.S error for a scatter diagram, take a high value and a low value for a given $x$ coordinate, and divide by 4, because r.m.s error is within $2\sigma$ of either side of the line. - 68% = $2\sigma$, 95% = $4\sigma$ - RMS can help determine which observations are outliers. Typically if a value is more than *2 r.m.s* away from the prediction estimate, it is considered to be an outlier +- The RMS error is only appropriate for homoscedastic scatter diagrams (football shape) + +- Heteroscedastic scatter diagrams should not be used to make a prediction, because they do not follow a football shape +- Homoscedastic scatter diagrams can be used to make predictions, because they follow a football shape + + -- Heteroscedastic scatter diagrams should not be used to make a prediction -- homoscedastic scatter diagrams can be used to make predictions, because they follow --- # Terminology | Term | Definition | |