Chapter 20

Stat 1040
Chapter 20 Solutions

1. 50,000.
2. Each ticket in the box shows a zero (for each form with gross income less than or equal to $50,000) or a one (for those forms with gross income over $50,000).
3. False. The SD is actually (1-0)*sqrt(0.20 * 0.80) = 0.4.
4. True.
5. The number of audited forms in the sample with income over $50,000 is like the sum of 900 draws from the box described in 3.b. The EV for this box is 180, and the SE is sqrt(900) * {SD of the box} = 12. As a percentage of the number of draws, the EV is 20% and the SE is 1.3%. This means that we expect the number of audited forms in the sample with income over $50,000 to be 20%, give or take 1.3% or so. The chance that the percentage of the sample falls between 19% and 21% is equivalent to finding the area under the normal curve from -0.75 to +0.75 which is about 55%.
6. We have no way of calculating this chance. We need to know the percentage of forms that have gross income over $75,000 in order to find an expected value and standard error. Besides all that, the data will probably not be normal, and our methods won't be valid anyway.
The box we'll use in this question does not have zeroes and ones in it. Sampling 900 tax forms for the total gross income of all 50,000 forms is like finding the sum of 900 draws from a new box which has a gross income on each ticket -- one ticket for each tax form -- making 50,000 tickets in all. We know from the information given in the problem that the average of this box is $37,000 and the SD is $20,000.
1. 50,000 again.
2. a gross income.
3. True.
4. True.
5. The chance that the total gross income is over $30,000,000 is the same as the chance that our sample sum is over $30,000,000. We expect our sample sum to be (900)*($37,000) = $33,300,000. The SE of our sample sum is sqrt(900)*($20,000)=$600,000. So the chance of our sample sum being over $30,000,000 is equivalent to the area under the normal curve to the right of -0.5 which is about 70%.
Statement (ii) is correct. The accuracy of California's sample will be higher because California's sample will be larger.
Statements (a), (c) and (e) are true, while (b), (d) and (f) are false. For (b) and (f), we know exactly the expected value for the percentage of 1's among the draws, and we know exactly the percentage of one's in the population. We only attach measures of chance error to quantities we don't know, such as the percentage of one's that come up in a sample we happen to take. That amount will be different for each sample, whereas the values mentioned in (b) and (f) don't change from sample to sample.