notes/education/statistics/Hypothesis Tests.md

(Ch 26, stat 1040)
## z tests for percentages
This test can be used if:
- The data is a simple random sample from the population of interest
- The sample size is large ($>30$)
- A qualitative variable of interest summarized by percentages
- Can use a box with tickets of 1s and zeros to represent the population
If an observed value is too many SEs away from the expected value, it is hard to explain by chance.

1. Start by finding a null and alternative hypothesis:
	- Null: *x* is *y*. This is often given in the problem
	- Alternative: If you're being asked to determine if something has changed, you're determining whether or not *x* is equal to. If you're being asked to find the more than, or less than, it's a one sided test.
2. Then find the SE. This is usually found with: $\frac{SD}{\sqrt{num_{draws}}}$. 
3. The EV (Expected Value) is usually given as the population %. Then with the above info, you can find the $z$ score with the formula $z = \frac{expected_\% - observed_\%}{SE_\%}$.
4. You can use this $z$ score combined with something like $normalcdf$ to find the amount that is outside of the expected range. If that total amount is less than 5%, than the null hypothesis should be rejected. If that total amount is more than 5%, the difference is too small, and it should not be rejected. 
Then you can provide a conclusion based off of either the null hypothesis, or the alternative hypothesis.

| Term | Description |
| ---- | ---- |
| Null Hypothesis | This is a statement about a *parameter*. It's a statement about equality. The chance of getting *x* is *y%*. A null hypothesis isn't proven true, you either prove it wrong (reject it), or don't (fail to reject).  |
| Alternative/Research Hypothesis | What the researcher is out to prove, a statement of inequality. (Less than, greater than, not equal to). |
| One-tailed test | Use when the alternative hypothesis says that the % of 1s is *less than* or *greater than* expected. It's one sided, because the area of importance on a distribution only has one side, and extends all the way outwards, away from the normal curve. |
| Two tailed test | Use when something is *not equal* to the expected. It's called a two tailed test because the area of significance has two sides. You can find the likelihood of ending up on one side, and the likelihood of ending up on another side, and adding them together (or multiplying by 2 if it's the same on each). |
## z tests for averages
This test will look very similar to a z test for percentages, it still requires that a large, random, sample was given ($>30$).

## Two sample z tests for averages
These tests are still very similar to a normal z test. In order to conduct a two sample z-test, the two samples being used must be independent from each other. Each sample must be large ($>30$), and a simple random sample. 

The two sample z-statistic is computed from:
- the sizes of the two samples
- the averages of the two samples
- the SDs of the two samples
$$ \frac{observed_{diff} - expected_{diff}}{SE_{diff}}$$
The diff is the difference between the two samples, and can be found by subtracting one from the other.
$$ SE_{diff} = \sqrt{a^2 + b^2} $$
The above formula is used where $a$ and $b$ are the $SE_{ave}$ of each sample. 
## t tests for averages
This test is used when you have a small sample size ($<30$). 
The only major differences used with a *t* test is that you use SD+.

With a small sample size, the standard deviation will be relatively higher, so this is compensated with the $SD_+$.
$$ SD_+ = \sqrt{\frac{size\space sample}{sample\space size}}*SD$$
This found value is then used in all further calculations where you would normally use the $SD$ in a z score test.
$$ t = \frac{obs_{ave} - EV_{ave}}{SE_{ave}} $$
The student/t curve is then used instead of the normal curve. It is similar, but has more area under the tails. 

Degrees of freedom ($df$) can be found by subtracting 1 from the sample size. The lower the degree of freedom, the greater the difference between the student curve and the normal curve.

The equivalent of $normalcdf$ for a t test is $tcdf$. This function returns a percentage.
## P Value
The chance of observing at least a sample statistic, or something more extreme, if the null hypothesis is true.
If the p-value is less than *5*%, reject the null hypothesis.
If the p-value is greater than *5*%, fail to reject the null hypothesis.
vault backup: 2024-02-01 14:24:19 2024-02-01 21:24:19 +00:00			`(Ch 26, stat 1040)`
vault backup: 2024-02-01 13:59:19 2024-02-01 20:59:19 +00:00			`## z tests for percentages`
			`This test can be used if:`
			`- The data is a simple random sample from the population of interest`
vault backup: 2024-02-05 14:14:23 2024-02-05 21:14:23 +00:00			`- The sample size is large ($>30$)`
vault backup: 2024-02-01 13:59:19 2024-02-01 20:59:19 +00:00			`- A qualitative variable of interest summarized by percentages`
			`- Can use a box with tickets of 1s and zeros to represent the population`
vault backup: 2024-02-01 14:09:20 2024-02-01 21:09:20 +00:00			`If an observed value is too many SEs away from the expected value, it is hard to explain by chance.`
vault backup: 2024-02-02 12:42:57 2024-02-02 19:42:57 +00:00
vault backup: 2024-02-06 17:54:50 2024-02-07 00:54:50 +00:00			`1. Start by finding a null and alternative hypothesis:`
			`- Null: x is y. This is often given in the problem`
			`- Alternative: If you're being asked to determine if something has changed, you're determining whether or not x is equal to. If you're being asked to find the more than, or less than, it's a one sided test.`
			`2. Then find the SE. This is usually found with: $\frac{SD}{\sqrt{num_{draws}}}$.`
vault backup: 2024-02-07 20:04:09 2024-02-08 03:04:10 +00:00			`3. The EV (Expected Value) is usually given as the population %. Then with the above info, you can find the $z$ score with the formula $z = \frac{expected_\% - observed_\%}{SE_\%}$.`
vault backup: 2024-02-06 17:54:50 2024-02-07 00:54:50 +00:00			`4. You can use this $z$ score combined with something like $normalcdf$ to find the amount that is outside of the expected range. If that total amount is less than 5%, than the null hypothesis should be rejected. If that total amount is more than 5%, the difference is too small, and it should not be rejected.`
vault backup: 2024-02-02 13:13:50 2024-02-02 20:13:50 +00:00			`Then you can provide a conclusion based off of either the null hypothesis, or the alternative hypothesis.`
vault backup: 2024-02-02 13:03:50 2024-02-02 20:03:50 +00:00
vault backup: 2024-02-01 14:14:19 2024-02-01 21:14:19 +00:00			`\| Term \| Description \|`
			`\| ---- \| ---- \|`
vault backup: 2024-02-02 12:42:57 2024-02-02 19:42:57 +00:00			`\| Null Hypothesis \| This is a statement about a parameter. It's a statement about equality. The chance of getting x is y%. A null hypothesis isn't proven true, you either prove it wrong (reject it), or don't (fail to reject). \|`
vault backup: 2024-02-01 14:19:19 2024-02-01 21:19:19 +00:00			`\| Alternative/Research Hypothesis \| What the researcher is out to prove, a statement of inequality. (Less than, greater than, not equal to). \|`
vault backup: 2024-02-02 12:42:57 2024-02-02 19:42:57 +00:00			`\| One-tailed test \| Use when the alternative hypothesis says that the % of 1s is less than or greater than expected. It's one sided, because the area of importance on a distribution only has one side, and extends all the way outwards, away from the normal curve. \|`
vault backup: 2024-02-02 12:58:50 2024-02-02 19:58:50 +00:00			`\| Two tailed test \| Use when something is not equal to the expected. It's called a two tailed test because the area of significance has two sides. You can find the likelihood of ending up on one side, and the likelihood of ending up on another side, and adding them together (or multiplying by 2 if it's the same on each). \|`
vault backup: 2024-02-02 12:42:57 2024-02-02 19:42:57 +00:00			`## z tests for averages`
vault backup: 2024-02-06 17:54:50 2024-02-07 00:54:50 +00:00			`This test will look very similar to a z test for percentages, it still requires that a large, random, sample was given ($>30$).`
vault backup: 2024-02-05 14:09:23 2024-02-05 21:09:23 +00:00
vault backup: 2024-02-06 17:54:50 2024-02-07 00:54:50 +00:00			`## Two sample z tests for averages`
			`These tests are still very similar to a normal z test. In order to conduct a two sample z-test, the two samples being used must be independent from each other. Each sample must be large ($>30$), and a simple random sample.`

			`The two sample z-statistic is computed from:`
			`- the sizes of the two samples`
			`- the averages of the two samples`
			`- the SDs of the two samples`
			`$$ \frac{observed_{diff} - expected_{diff}}{SE_{diff}}$$`
			`The diff is the difference between the two samples, and can be found by subtracting one from the other.`
			`$$ SE_{diff} = \sqrt{a^2 + b^2} $$`
			`The above formula is used where $a$ and $b$ are the $SE_{ave}$ of each sample.`
vault backup: 2024-02-05 14:09:23 2024-02-05 21:09:23 +00:00			`## t tests for averages`
vault backup: 2024-02-05 14:14:23 2024-02-05 21:14:23 +00:00			`This test is used when you have a small sample size ($<30$).`
			`The only major differences used with a t test is that you use SD+.`

			`With a small sample size, the standard deviation will be relatively higher, so this is compensated with the $SD_+$.`
			`$$ SD_+ = \sqrt{\frac{size\space sample}{sample\space size}}*SD$$`
			`This found value is then used in all further calculations where you would normally use the $SD$ in a z score test.`
			`$$ t = \frac{obs_{ave} - EV_{ave}}{SE_{ave}} $$`
vault backup: 2024-02-06 17:54:50 2024-02-07 00:54:50 +00:00			`The student/t curve is then used instead of the normal curve. It is similar, but has more area under the tails.`
vault backup: 2024-02-05 14:19:23 2024-02-05 21:19:23 +00:00
			`Degrees of freedom ($df$) can be found by subtracting 1 from the sample size. The lower the degree of freedom, the greater the difference between the student curve and the normal curve.`
vault backup: 2024-02-05 14:29:23 2024-02-05 21:29:23 +00:00
vault backup: 2024-02-06 17:54:50 2024-02-07 00:54:50 +00:00			`The equivalent of $normalcdf$ for a t test is $tcdf$. This function returns a percentage.`
vault backup: 2024-02-01 14:19:19 2024-02-01 21:19:19 +00:00			`## P Value`
			`The chance of observing at least a sample statistic, or something more extreme, if the null hypothesis is true.`
vault backup: 2024-02-01 14:24:19 2024-02-01 21:24:19 +00:00			`If the p-value is less than 5%, reject the null hypothesis.`
			`If the p-value is greater than 5%, fail to reject the null hypothesis.`