Confidence Intervals Miscellaneous on-line topics for Finite Mathematics 2e

Return to Main Page
Index of On-Line Text
Exercises for This Topic
Everything for Finite Math
Everything for Calculus
Everything for Finite Math & Calculus
Utility: Normal Distribution Utility
Table: Normal Distribution Table

Confidence Intervals for Large Samples (n >= 30)

Q I ask 200 randomly selected Hofstra students how much money they spent on Internet purchases over the past week. The sample mean for the 200 students is \$42.35. Therefore, I can make the follwing claim:

Hofstra students spent an average of \$42.35 on Internet purchases last week.
right?
A Wrong. It could be the case that the 200 students you selected just happened to be bigger Interenet spenders than the other Hofstra students. In fact, the average for all Hofstra students (the population mean) could be very different from the sample mean of \$42.35. In fact, one can never know with absolute certainty even approximately wnat the population mean is. For instance, what if one stuent not polled happened to spend \$10 million on the Internet last week? The effect of including that student might be to raise the mean figure to over \$1,000.

Q OK then what is the point of taking a sample mean, since it tells us nothing?
A Slow down. It does not tell us nothing at all, it just gives no information with absolute certainty (unless, of course, our sample consists of the whole population). However, the larger the sample size, the more confident we can be that the population mean lies "fairly close" to the sample mean we obtained. This idea of "confidence" as oposed to "certainty" is what we will make precise here.

To understand what this is all about, you should know something about sampling distributions, where we learned about the

Central Limit Theorem If the population distribution has mean and standard deviation , then, for sufficiently large n, the sampling distribution of is approximately normal, with mean

=

and standard deviation

=
 n .

Notice that, as the sample size gets larger, the standard deviation gets smaller. Thus, the sample means tend to be very close to the population mean, resulting in a single, narrow peak at as shown in the distribution curves below.

Note If the (population) distribution X was already normal to begin with, then no matter what the sample size, the sampling distribution of is exactly normal. Thus, the Central Limit Theorem is most useful for us only when the original distribution of X is not known to be normal -- often the case in practice.

Q How large must the sample size n be before the Central Limit Theorem "kicks in"?
A In principle, there is no way to know this, but for most practical purposes, people use the following rule of thumb: If n > 30, then assume that n is sufficiently large, so that the sampling distribution is approximately normal.

Q OK what has this got to do with the original questoin about student spending on the Internet?
A Since the sample size was n = 200 students, the Central Limit Theorem tells us that the sample means (\$42.35 was one of those sample means) are approximately normally distributed. Now, from our knowledge about normal distributions, we can deduce:

 95.45% of the sample means will lie within two standard deviations of the population mean (because P(-2s +2s) = 0.9545) Thus, If we take a large number of sample means, 95.45% of the time, the distance between and will be less than two standard deviations (of the sampling distribution) -- that is, within a distance 2/n of n. Or, If we take a large number of sample means, 95.45% of the time, the (unknown) population mean is between - 2/n and + 2/n.

Thus, we call the interval [ - 2/n,   + 2/n] the 95.45% confidence interval for the population mean. distance between and will be less than two standard deviations (of tghe sampling distribution) -- that is, within a distance 2/n of n.

Q OK. How do we get, say, the 90% confidence interval, or the 99% confidence interval?
A All we need to know is how many standard deviations about the mean will include 90% or 99% of the sample means. The following picture of the standard normal curve shows the z-value we want so that a total area of 0.90 (or 90%) is included between z = -1.645 and z = 1.645:

We call this calue of z "z.05 since the area of the tail to its right is .05 units, and we can use this value instead of 2 in the above formula:

90% confidence interval = [ - 1.645/n,   + 1.645/n]

Similarly, for the 99% confidence interval, we can consult the following picture

and obtain:

90% confidence interval = [ - 2.576/n,   + 2.576/n]

For a general formula, let us take to be the 100% - percentage of confidence:

= 1.00 - 0.90 = 0.10   90% Confidence
= 1.00 - 0.99 = 0.01   99% Confidence

Then the interval we want is given by the following formula:

Large Sample 100(1-)% Confidence Interval

 z/2 n

= sample mean
n = sample size
= population standard deviation
z/2 = z-value with an area of /2 to its right (obtained from a table).

Note: When (as is often the case) we don't know the population standard deviation , we can approximate it by the sample standard deviation s, and obtain the following (good) approximation of the confidence interval:

 z/2 s n

Here is a little table of z-values:

 z.1 z.05 z.025 z.01 z.005 z.001 z.0005 1.282 1.645 1.960 2.326 2.576 3.090 3.291

Here is an example where you can put the above formula to use.

Example 1 Hot Sauce

Your hot sauce company rates its sauce on a scale of spiciness of 1 to 20. A sample of 50 bottles of hot sauce is taste-tested, resulting in a mean of 12 and a sample standard deviation of 2.5. Find a 95% confidence interval for the spiciness of your hot sauce.

Solution

Fill in the following values and press "Check" (don't "Peek" unless you absolutely have to...)

 = n = s = = z/2 =

Therefore, the confidence interval is approximately (round to four decimal places):

 z/2 s n
= [ ,     ]
- z/2s/n,      + z/2s/n

Q How do I interpret this confidence interval?
A It says that, if you repeatedly test 50-bottle random samples of hot sauce and compute the confidence intervals each time, the confidence intervals you get will include the population mean 95% of the time. In that sense, there is a 95% chance that any specific confidence interval (such as the one above) actually contains the population mean. So, you can be 95% "certain" that the mean spiciness of your hot sauce is somwewhere between 11.307 and 12.693.

Following is a simulation that generates a number of random samples of size n = 30 from a uniformly distributed random variable taking values between 0 and 1 (mean = 0.5). For each sample, the mean and 90% confidence interval will be computed automatically. The standard deviation for a uniformly distributed random variable is given by = (b-a)/12 = (1-0) /12 0.2887.

Each time a confidence interval is computed, it will be determined whether the interval comtains the mean. This should happen about 90% of the time.

Example 2 Illustration of Confidence Intervals

Pressing "Generate Samples" will give a window showing the indicated number of samples of size n = 30 together with the 90% confidence interval, and whether it contains the population mean 0.5. If you press Approximately 90% of the confidence intervals given should contain the population mean of 0.5. Thus, you should average 18 yes's for every 20 samples.

 Number of Samples:

Before we go on...Notice that, since the distribution we are sampling from is not normal (it is uniform), we need fairly large samples to guarantee that the distribution of the sample means is approximately normal -- assumed in our formulation of confidence intervals. Notice also that we use the theoretical population standard deviation in computing each interval rather than the sample standard deviation. We could have equally well have used the sample standard deviations instead.

Confidence Intervals for Small Samples (n < 30)

When we are dealing with small samples, we cannot invoke the Central Limit Theorem. Hence, we cannot use our formula confidence intervals unless we are sampling from a normally distributed random variable.

However, there is one further issue: if we know the population standard deviation , then all is well and good, and we can go ahead and use the above formula for the confidence interval for small samples (assuming, of course that we are sampling from a normally distributed variable). But if, as is usually the case, we do not know , then if we go ahead and use the sample standard deviation s instead, we will tend to obtain confidence intervals that are too small. The reason is that, while the sampling distribution of (-)/, is normal (provided x is normal) the sampling distribution of (- )/s is not normal (unless we are dealing with large samples, in which case it is approximately normal).

Q Why care about the sampling distribution of (-)/s?
A The reason we must care is that, when we use s instead of , then our computation of the confidence interval is based on the probability that is within a certain number of standard deviations of the mean . This number of standard deviations is (-)/. We then set that equal to a desired z-value and solve for to obtain the confidence interval (after dividing the standard deviation by n). When we use s instead of , we cannot use a z-value, since the distribution of (-)/s is not normal, but is distributed according to the "t-distribution".

It follows that, instead of using z/2 in our formula, we need to use t/2. Furthermore, we get different t-distributions for different sample sizes, and we use the value of t/2 corresponding to "n-1 degrees of freedom", which we can get from a table.

Small Sample 100(1-)% Confidence Interval

When the Population Standard Deviation is Known:

 z/2 n Same as Large Sample Formula

= sample mean
n = sample size
= population standard deviation
z/2 = z-value with an area of /2 to its right (obtained from a table).

When Only the Sample Standard Deviation s is Known:

 t/2 s n We use t instead of z

= sample mean
n = sample size
s = sample standard deviation
t/2 = t-value with an area of /2 to its right (t/2 can be obtained from a table here.).

Let us try this out on the following variant of the "Hot Sauce" Example above.

Example 3 More Hot Sauce

When the CEO of your hot sauce company was informed that the spiciness of the hot sauce averrages only 12, he was furious and ordered instant adjustm,ents to the recipe, threatening to fire the whole sauce division unless the average spiciness increased to above13. Yesterday, you randomly sampled 8 bottles of the new sauce and found an average spiciness of 13.5 with a sample standard deviation of 0.75.
(a) Compute the 95% confidence interval for the population mean. Based on the answer, can you be 95% sure that the mean spiciness of the new sauce is above 13?
(b) Repeat part (a) assuming the sample standard deviation was 0.58.

Solution

(a) Fill in the following values and press "Check".

 = n = s = = df = df = degrees of freedom = n-1 t/2 =

Therefore, the confidence interval is approximately (round to four decimal places):

 z/2 s n
= [ ,     ]
- t/2s/n,      + t/2s/n

Based on the answer, we notice that the confidence interval contain 13.0. Therefore, we with 95% confidence that the mean is above 13.

(b) The calculation is almost identical to the one above, excpet for the value s = 0.58, which gives the new confidence interval [13.0150, 13.9850]. Since this interval does not contain 13, we can be 95% certain that the mean spiciness of all the sauce is above 13.

Last Updated:September, 2000
Copyright © 2000 StefanWaner and Steven R. Costenoble