Confidence Intervals

Return to Main Page
Index of OnLine Text Exercises for This Topic Everything for Finite Math Everything for Calculus Everything for Finite Math & Calculus Utility: Normal Distribution Utility Table: Normal Distribution Table 
Q I ask 200 randomly selected Hofstra students how much money they spent on Internet purchases over the past week. The sample mean for the 200 students is $42.35. Therefore, I can make the follwing claim:
Q OK then what is the point of taking a sample mean, since it tells us nothing?
A Slow down. It does not tell us nothing at all, it just gives no information with absolute certainty (unless, of course, our sample consists of the whole population). However, the larger the sample size, the more confident we can be that the population mean lies "fairly close" to the sample mean we obtained. This idea of "confidence" as oposed to "certainty" is what we will make precise here.
To understand what this is all about, you should know something about sampling distributions, where we learned about the
Central Limit Theorem If the population distribution has mean and standard deviation , then, for sufficiently large n, the sampling distribution of is approximately normal, with mean
_{ }= and standard deviation
Notice that, as the sample size gets larger, the standard deviation gets smaller. Thus, the sample means tend to be very close to the population mean, resulting in a single, narrow peak at _{} as shown in the distribution curves below.

Note If the (population) distribution X was already normal to begin with, then no matter what the sample size, the sampling distribution of is exactly normal. Thus, the Central Limit Theorem is most useful for us only when the original distribution of X is not known to be normal  often the case in practice.
Q How large must the sample size n be before the Central Limit Theorem "kicks in"?
A In principle, there is no way to know this, but for most practical purposes, people use the following rule of thumb: If n > 30, then assume that n is sufficiently large, so that the sampling distribution is approximately normal.
Q OK what has this got to do with the original questoin about student spending on the Internet?
A Since the sample size was n = 200 students, the Central Limit Theorem tells us that the sample means ($42.35 was one of those sample means) are approximately normally distributed. Now, from our knowledge about normal distributions, we can deduce:
95.45% of the sample means will lie within two standard deviations of the population mean (because P(2s +2s) = 0.9545)  
Thus,  
If we take a large number of sample means, 95.45% of the time, the distance between and _{} will be less than two standard deviations (of the sampling distribution)  that is, within a distance 2/n of n.  
Or,  
If we take a large number of sample means, 95.45% of the time, the (unknown) population mean is between  2/n and + 2/n. 
Thus, we call the interval [  2/n, + 2/n] the 95.45% confidence interval for the population mean. distance between and _{} will be less than two standard deviations (of tghe sampling distribution)  that is, within a distance 2/n of n.
Q OK. How do we get, say, the 90% confidence interval, or the 99% confidence interval?
A All we need to know is how many standard deviations about the mean will include 90% or 99% of the sample means. The following picture of the standard normal curve shows the zvalue we want so that a total area of 0.90 (or 90%) is included between z = 1.645 and z = 1.645:
We call this calue of z "z_{.05} since the area of the tail to its right is .05 units, and we can use this value instead of 2 in the above formula:
Similarly, for the 99% confidence interval, we can consult the following picture
and obtain:
For a general formula, let us take to be the 100%  percentage of confidence:
Then the interval we want is given by the following formula:
Large Sample 100(1)% Confidence Interval
= sample mean
Note: When (as is often the case) we don't know the population standard deviation , we can approximate it by the sample standard deviation s, and obtain the following (good) approximation of the confidence interval:

Here is a little table of zvalues:
z_{.1}  z_{.05}  z_{.025}  z_{.01}  z_{.005}  z_{.001}  z_{.0005} 
1.282  1.645  1.960  2.326  2.576  3.090  3.291 
Here is an example where you can put the above formula to use.
Your hot sauce company rates its sauce on a scale of spiciness of 1 to 20. A sample of 50 bottles of hot sauce is tastetested, resulting in a mean of 12 and a sample standard deviation of 2.5. Find a 95% confidence interval for the spiciness of your hot sauce.
Solution
Fill in the following values and press "Check" (don't "Peek" unless you absolutely have to...)
Q How do I interpret this confidence interval?
A It says that, if you repeatedly test 50bottle random samples of hot sauce and compute the confidence intervals each time, the confidence intervals you get will include the population mean 95% of the time. In that sense, there is a 95% chance that any specific confidence interval (such as the one above) actually contains the population mean. So, you can be 95% "certain" that the mean spiciness of your hot sauce is somwewhere between 11.307 and 12.693.
Following is a simulation that generates a number of random samples of size n = 30 from a uniformly distributed random variable taking values between 0 and 1 (mean _{} = 0.5). For each sample, the mean and 90% confidence interval will be computed automatically. The standard deviation for a uniformly distributed random variable is given by = (ba)/12 = (10) /12 0.2887.
Each time a confidence interval is computed, it will be determined whether the interval comtains the mean. This should happen about 90% of the time.
_{} Example 2 Illustration of Confidence Intervals
Pressing "Generate Samples" will give a window showing the indicated number of samples of size n = 30 together with the 90% confidence interval, and whether it contains the population mean 0.5. If you press Approximately 90% of the confidence intervals given should contain the population mean of 0.5. Thus, you should average 18 yes's for every 20 samples.
Before we go on...Notice that, since the distribution we are sampling from is not normal (it is uniform), we need fairly large samples to guarantee that the distribution of the sample means is approximately normal  assumed in our formulation of confidence intervals. Notice also that we use the theoretical population standard deviation in computing each interval rather than the sample standard deviation. We could have equally well have used the sample standard deviations instead.
When we are dealing with small samples, we cannot invoke the Central Limit Theorem. Hence, we cannot use our formula confidence intervals unless we are sampling from a normally distributed random variable.
However, there is one further issue: if we know the population standard deviation , then all is well and good, and we can go ahead and use the above formula for the confidence interval for small samples (assuming, of course that we are sampling from a normally distributed variable). But if, as is usually the case, we do not know , then if we go ahead and use the sample standard deviation s instead, we will tend to obtain confidence intervals that are too small. The reason is that, while the sampling distribution of ()/, is normal (provided x is normal) the sampling distribution of ( )/s is not normal (unless we are dealing with large samples, in which case it is approximately normal).
Q Why care about the sampling distribution of ()/s?
A The reason we must care is that, when we use s instead of , then our computation of the confidence interval is based on the probability that is within a certain number of standard deviations of the mean . This number of standard deviations is ()/. We then set that equal to a desired zvalue and solve for to obtain the confidence interval (after dividing the standard deviation by n). When we use s instead of , we cannot use a zvalue, since the distribution of ()/s is not normal, but is distributed according to the "tdistribution".
It follows that, instead of using z_{/2} in our formula, we need to use t_{/2}. Furthermore, we get different tdistributions for different sample sizes, and we use the value of t_{/2} corresponding to "n1 degrees of freedom", which we can get from a table.
Small Sample 100(1)% Confidence Interval
When the Population Standard Deviation is Known:
= sample mean
When Only the Sample Standard Deviation s is Known:
= sample mean

Let us try this out on the following variant of the "Hot Sauce" Example above.
_{} Example 3 More Hot Sauce
When the CEO of your hot sauce company was informed that the spiciness of the hot sauce averrages only 12, he was furious and ordered instant adjustm,ents to the recipe, threatening to fire the whole sauce division unless the average spiciness increased to above13. Yesterday, you randomly sampled 8 bottles of the new sauce and found an average spiciness of 13.5 with a sample standard deviation of 0.75.
(a) Compute the 95% confidence interval for the population mean. Based on the answer, can you be 95% sure that the mean spiciness of the new sauce is above 13?
(b) Repeat part (a) assuming the sample standard deviation was 0.58.
Solution
(a) Fill in the following values and press "Check".
(b) The calculation is almost identical to the one above, excpet for the value s = 0.58, which gives the new confidence interval [13.0150, 13.9850]. Since this interval does not contain 13, we can be 95% certain that the mean spiciness of all the sauce is above 13.