diff --git a/education/statistics/Correlation and Regression.md b/education/statistics/Correlation and Regression.md index e9f7451..a29b604 100644 --- a/education/statistics/Correlation and Regression.md +++ b/education/statistics/Correlation and Regression.md @@ -113,9 +113,14 @@ $$ \sqrt{1-r^2}(\sigma_y) $$ - On a least squared regression line, the 1 r.m.s error away will contain $2\sigma$ of the data, and it should loosely mirror a normal curve. - To approximate the R.M.S error for a scatter diagram, take a high value and a low value for a given $x$ coordinate, and divide by 4, because r.m.s error is within $2\sigma$ of either side of the line. - 68% = $2\sigma$, 95% = $4\sigma$ -- RMS can help determine which observations are outliers. Typically if a value is more than *2 r.m.s* away from the prediction estimate, it is considered to be an outlier. +- RMS can help determine which observations are outliers. Typically if a value is more than *2 r.m.s* away from the prediction estimate, it is considered to be an outlier + +- Heteroscedastic scatter diagrams should not be used to make a prediction +- homoscedastic scatter diagrams can be used to make predictions, because they follow --- # Terminology -| Term | Definition | -| -- | -- | -| $\hat{y}$ | The predicted value | +| Term | Definition | | +| ---- | ---- | ---- | +| $\hat{y}$ | The predicted value | | +| Homoscedastic | The scatter diagram will look the same above and below the LSRL | | +| Heteroscedastic | Will have more variability on one side of the regression line | |