# Calculus Applied to Probability and Statistics by Stefan Waner and Steven R. Costenoble

## This Section: 1. Continuous Random Variables and Histograms

 Section 1 Exercises 2. Probability Density Functions: Uniform, Exponential, Normal, and Beta Calculus and Probability Main Page "Real World" Page

1. Continuous Random Variables and Histograms

Suppose that you have purchased stock in Colossal Conglomerate, Inc., and each day you note the closing price of the stock. The result each day is a real number X (the closing price of the stock) in the unbounded interval [0, +). Or, suppose that you time several people running a 50-meter dash. The result for each runner is a real number X, the race time in seconds. In both cases, the value of X is somewhat random. Moreover, X can take on essentially any real value in some interval, rather than, say, just integer values. For this reason we refer to X as a continuous random variable. Here is the official definition.

 Continuous Random Variable A random variable is a function X that assigns to each possible outcome in an experiment a real number. If X may assume any value in some given interval I (the interval may be bounded or unbounded), it is called a continuous random variable. If it can assume only a number of separated values, it is called a discrete random variable.

For instance, if X is the result of rolling a die (and observing the uppermost face), then X is a discrete random variable with possible values 1, 2, 3, 4, 5 and 6. On the other hand, if X is a random choice of a real number in the interval [1,6], then it is a continuous random variable.

If X is a random variable, we are usually interested in the probability that X takes on a value in a certain range. For instance, if X is the daily closing price of Colossal Conglomerate stock and we find that 60% of the time the price is between \$10 and \$20, we would say

The probability that X is between \$10 and \$20 is 0.6.

We write this statement mathematicallly as follows.

P(10 X 20) = 0.6.

We can use a bar chart, called a probability distribution histogram, to display the probabilities that X lies in selected ranges. This is shown in the following example.

Example 1 College Population by Age

The following table shows the distribution of U.S. residents (16 years old and over) attending college in 1980 according to age .

 Age 15-19 20-24 25-29 30-34 35-? Number in 1980 (thousands) 2,678 4,786 1,928 1,201 1,763
† Source: 1980 Census of Population, US Department of Commerce/Bureau of the Census.

Draw the probability distribution histogram for X = the age of a randomly chosen college student.

Solution

Summing the entries in the bottom row, we see that the total number of students in 1980 was 12.356 million. We can therefore convert all the data in the table to probabilities by dividing by this total.

 Age 15-19 20-24 25-29 30-34 35-? Probability 0.22 0.39 0.16 0.10 0.13

In the category 15-19, we have actually included anyone at least 15 years old and less than 20 years old. For example, someone 191/2 years old would be in this range. We would like to write 15-20 instead, but this would be ambiguous, since we would not know where to count someone who was exactly 20 years old. Now the probability that a college student is exactly 20 years old (and not, say, 20 years and 1 second) is essentially 0, so it doesn't matter (see the discussion after Example 2 below). We therefore rewrite the table with these ranges.

 Age 15-20 20-25 25-30 30-35 35 Probability 0.22 0.39 0.16 0.10 0.13

The table tells us that, for instance,

P(15 X 20) = 0.22
and
P(X 35) = 0.13.

The probability distribution histogram is the bar graph we get from these data:

Before We Go On ...

Had the grouping into ranges been finer-for instance into divisions of 1 year instead of 5, then the histogram would appear smoother, and with lower bars (why?). (See the figure below.)

This smoother looking distribution suggests a smooth curve. It is this kind of curve that we shall be studying in the next section.

Example 2

A survey finds the following probability distribution for the age of a rented car.

 Age 0-1 1-2 2-3 3-4 4-5 5-6 6-7 Probability 0.20 0.28 0.20 0.15 0.10 0.05 0.02

Plot the associated probability distribution histogram, and use it to evaluate (or estimate) the following:

(a) P(0 X 4)
(b) P(X 4)
(c) P(2 X 3.5)
(d) P(X = 4)

Solution

The histogram is shown below.

(a) We can calculate P(0 X 4) from the table by adding the corresponding probabilities:

P(0 X 4) = 0.20 + 0.28 + 0.20 + 0.15 = 0.83

This corresponds to the shaded region of the histogram shown in the following figure.

Notice that since each rectangle has width equal to 1 unit and height equal to the associated probability, its area is equal to the probability that X is in the associated range. Thus P(0 X 4) is also equal to the area of the shaded region.

(b) Similarly, P(X 4) is given by the area of the unshaded portion of the above figure, so

P(X 4) = 0.10 + 0.05 + 0.02 = 0.17.

(Notice that P(0 X 4) + P(X 4) = 1. Why?)

(c) To calculate P(2 X 3.5), we need to make an educated guess, since neither the table nor the histogram has subdivisions of width 0.5. Referring to the graph, we can approximate the probability by the shaded area shown below:

Thus,

P(2 X 3.5) 0.20 + (0.15)(1/2) = 0.275.

(d) To calculate P(X = 4), we would need to calculate P(4 X 4). But this would correspond to a region of the histogram with zero area (see the figure below) so we conclude that P(X = 4) = 0.

Question

In the above example P(X = 4) was zero. Is it true that P(X = a) is zero for every number a in the interval associated with X?

As a general rule, yes. If X is a continuous random variable, then X can assume infinitely many values, and so it is reasonable that the probability of its assuming any specific value we choose beforehand is zero.

 Caution If you wish to use a histogram to calculate probability as area, make sure that the subdivisions for X have width 1; for instance, 1 X 2, 2 X 3, and so on. The first histogram in Example 1 had bars corresponding to larger ranges for X. The first bar has a width of 5 units, so its area is 5 0.22, which is 5 times the probability that 15 X 20. If you wish to use a histogram to give probability as area, divide the area by the width of the intervals.

There is another way around the above problem that we shall not use, but which is used by working statisticians: Draw your histograms so that the heights are not necessarily the probabilities but are chosen so that the area of each bar gives the corresponding probability. This is necessary if, for example, the bars do not all have the same width.

 Section 1 Exercises 2. Probability Density Functions: Uniform, Exponential, Normal, and Beta Calculus and Probability Main Page "Real World" Page

We would welcome comments and suggestions for improving this resource.

Mail us at:
 Stefan Waner (matszw@hofstra.edu) Steven R. Costenoble (matsrc@hofstra.edu)

Last Updated: September, 1996