The intersection of the averages of x and y will be the center of an oval shaped scatter diagram. Draw lines $2\sigma$ (will contain ~95% of all data) from the center along each axis to generalize the shape of a scatter plot.
You can approximate the mean by trying to find the upper bound and the lower bound of $2\sigma$ deviation to either side of the mean, then finding the middle of those two points to find $\bar{x}$. You can divide the range between the two points by 4 to find $\sigma$.
Correlation is between `-1` and `1`. Correlation near 1 means tight clustering, and correlation near 0 means loose clustering. $r$ is -1 if the points are on a line with negative slope, $r$ is positive 1 if the points are on a line with a positive slope. As $|r|$ gets closer to 1, the line points cluster more tightly around a line.
1. Convert the $x$ each x value in the list to standard units($z$). Convert each $y$ value to standard units. This will create two new tables containing $z_x$ and $z_y$.
- The variable you are trying to predict is called the *response variable*. It is graphed along the *y-axis*. This is the thing being predicted/measured.
- The variable you have information about that you are using to make the prediction is called the *explanatory variable*. It is graphed along the *x-axis*. This is the treatment.
- Just because a relationship exists between $x$ and $y$ *does not* mean that changes in $x$ *cause* changes in $y$.
- If the graph is given to you already set up, you already know the response and explanatory variables.
- The $\sigma$ line will always always have a slope of:
Given a scatter diagram where the average of each set lies on the point $(75, 70)$, with a $\sigma_x$ of 10 and a $\sigma_y$ of 12, you can graph the SD line by going up $\sigma_y$ and right $\sigma_x$, then connecting that point (in this example, $(85, 82)$) with the mean points.
### The Regression Line/Least Squared Regression Line (LSRL)
- This line has a more moderate slope than the SD line. it does not go through the peaks of the "football"
- In a test-retest situation, people with low scores tend to improve, and people with high scores tend to do worse. This means that individuals score closer to the average as they retest.
- The regression *fallacy* is contributing this to something other than chance error.
- On a least squared regression line, the 1 r.m.s error away will contain $2\sigma$ of the data, and it should loosely mirror a normal curve.
- To approximate the R.M.S error for a scatter diagram, take a high value and a low value for a given $x$ coordinate, and divide by 4, because r.m.s error is within $2\sigma$ of either side of the line.
- RMS can help determine which observations are outliers. Typically if a value is more than *2 r.m.s* away from the prediction estimate, it is considered to be an outlier