## Summary of Chapter 7 inFinite Mathematics/Finite Mathematics & Applied CalculusTopic: Probability

Student Home
True/False Quiz
On-Line Tutorial
Review Exercises
Summary Index
Everything for Calculus
Everything for Finite Math
Everything for Finite Math & Calculus
Chapter 6 Summary     Chapter 8 Summary

Sample Space and Events | Combining Events | Estimated Probability | Some Properties of Estimated Probability | Theoretical Probability | Computing Theoretical Probability for Equally Likely Outcomes | Probability Distributions | Addition Principle | Further Properties of Probability | Conditional Probability | Independent Events | Bayes' Theorem | Markov Systems, State Transition Diagram, Transition Matrix | Distribution Vectors and Powers of the Transition Matrix | Long Term Behavior of Markov Systems

Sample Space and Events

An experiment is an occurrence whose result, or outcome is uncertain.

The set of all possible outcomes is called the sample space for the experiment.

Given a sample space S, an event E is a subset of S. The outcomes in E are called the favorable outcomes.

We say that E occurs in a particular experiment if the outcome of that experiment is one of the elements of E, that is, if the outcome of the experiment is favorable.

On-Line Tutorial Beginning With This Topic

Examples

1. Experiment: Cast a die and observe the number facing up.
Outcomes: 1, 2, 3, 4, 5, 6
Sample Space: S = {1, 2, 3, 4, 5, 6}
An Event: E: the outcome is even; E = {2, 4, 6}

2. Here is an experiment that simulates tossing three fair distinguishable coins. To run the experiment, press the "Toss Coins" button and record the occurrences of heads and tails. The sample space is the set of eight possible outcomes:

S = {HHH, HHT, HTH, HTT, THH, THT, TTH, TTT}

Let E be the event that heads comes up at least twice.

E = {HHH, HHT, HTH, THH}

 Coin 1: Coin 2: Coin 3: Event Favorable? (In E)

Top of Page
Combining Events

If E and F are events In an experiment, then:

E' is the event that E does not occur.

E F is the event that either E occurs or F occurs (or both).

E F is the event that both E and F occur.

E and F are said to be disjoint or mutually exclusive if (E F) is empty.

Examples

Let S be the sample space for the coin tossing experiment in the box above, so that

S = {HHH, HHT, HTH, HTT, THH, THT, TTH, TTT}

Let E be the event that heads comes up at least twice;

E = {HHH, HHT, HTH, THH}

and let F be the event that tails comes up at least once;

F = {HHT, HTH, HTT, THH, THT, TTH, TTT}.

Then:

E' = {HTT, THT, TTH, TTT}
E F = {HHH, HHT, HTH, HTT, THH, THT, TTH, TTT}
= S
E F = {HHT, HTH, THH}

E and F are not mutually exclusive, since E F

Top of Page
Estimated Probability

If an experiment is performed N times, and the event E occurs fr(E) times, then the ratio

 P(E) = fr(E)N
is called the relative frequency or estimated probability of E.

The number fr(E) is called the frequency of E. N, the number of times that the experiment is performed, is called the number of trials or the sample size.

If E consists of a single outcome s, then we refer to P(E) as the estimated probability of the outcome s, and write P(s).

On-Line Tutorial on Estimated Probability

Example

In the above experiment (toss three coins) take E to be the event that heads comes up at least twice, and let us compute the estimated probability of E.

Every time you press "Toss Coins" the web page will compute both fr(E) and P(E).

 Coin 1: Coin 2: Coin 3:
 Event Favorable? (In E) N: fr(E): P(E):

Top of Page
Some Properties of Estimated Probability

Let S = {s1, s2, ... , sn} be a sample space and let P(si) be the estimated probability of the event {si}. Then

(a) 0 P(si) 1
(b) P(s1) + P(s2) + ... + P(sn) = 1
(c) If E = {e1, e2, ..., er}, then P(E) = P(e1) + P(e2) + ... + P(er).

In words:

(a) The estimated probability of each outcome is a number between 0 and 1.
(b) The estimated probabilities of all the outcomes add up to 1.
(c) The estimated probability of an event E is the sum of the estimated probabilities of the individual outcomes in E.

Top of Page
Theoretical Probability

The theoretical probability, P(E), of an event E is a probability determined from the nature of the experiment rather than through actual experimentation.

The estimated probability approaches the theoretical probability as the number of trials gets larger and larger.

Notes

1. We write P(E) for both estimated and theoretical probability. Which one we are referring to should always be clear from the context.
2. Theoretical probability satisfies the same properties (shown above) as estimated probability:

Examples

In the above experiment, there are eight outcomes in S, and half of them are in E. Therefore, we expect E to occur half the time. In other words, the theoretical probability of E is 0.5.

If you "toss the coins" a large number of times, the estimated probability should approach the theoretical probabilty of 0.5. In the following simulation, you can toss the coins 50 times with each click on the button.

 N: fr(E): P(E):

Top of Page
Computing Theoretical Probability for Equally Likely Outcomes

In an experiment in which all outcomes are equally likely, the probability of an event E is given by

 P(E) = number of favorable outcomestotal number of outcomes = n(E)n(S)
where n(E) is the number of elements in E, and n(S) is the number of elements in S.

Examples

In the above experiment (toss three coins) there are eight equally likely outcomes in S, and half of them are in E (the event that heads comes up at least twice) . Therefore,

 P(E) = n(E)n(S) P(E) = 48 = 12 .

Top of Page
Probability Distributions

Since the different kinds of probability share similar properties, we refer to estimated and theoretical probability collectively as just probability. What they have in common is the idea of a probability distribution:

A finite sample space is just a finite set S. A probability distribution is an assignment of a number P(si) to each outcome si in a sample space S ={s1, s2, ... , sn} so that

(a) 0 P(si) 1
(b) P(s1) + P(s2) + ... + P(sn) = 1.

P(si) is called the probability of si. Given a probability distribution, we obtain the probability of an event E by adding up the probabilities of the outcomes in E.

If P(E) = 0, we call E an impossible event. The event is always impossible, since something must happen.

Notes

1. All the above properties apply to both estimated and theoretical probability. Thus, when we speak only of "probability," we could mean either estimated or theoretical probability, depending on the context.

On-Line Tutorial on Probability Distributions

Examples

1. All the examples of estimated and theoretical probability above give examples of probability distributions.

2. Let us take S = {H, T} and make the assignments P(H) = 0.2, P(T) = 0.8. Since these numbers are between 0 and 1, and add to 1, they specify a probability distribution.

3. With S = (H, T} again, we could also take P(H) = 1, P(T) = 0, so that T is an impossible event.

4. The following table gives a probability distribution for the sample space S = {1, 2, 3, 4, 5, 6}.

 Outcome 1 2 3 4 5 6 Probability 0.3 0.3 0 0.1 0.2 0.1

It follows that

P({1, 6}) = 0.3 + 0.1 = 0.4
P({2, 3}) = 0.3 + 0 = 0.3
P(3) = 0         An impossible event

5. Suppose we toss three coins as above, but this time, we only look at the number of heads that come up. In other words, S = {0, 1, 2, 3}. The (theoretical) probability distribution is given by counting the number of combinations that give 0, 1, 2, or 3 heads:

 Outcome 0 1 2 3 Probability 0.125 0.375 0.375 0.125

The following simulation computes the estimated probability distribution. You will find that, after many coin tosses, these probabilities converge to the theoretical ones.

 N
 Frequency Distribution 0 1 2 3
 Estimated Probability Distribution 0 1 2 3

Top of Page

Mutually Exclusive Events
If E and F are mutually exclusive events, then

P(E F) = P(E) + P(F).
This holds true also for more events: If E1, E2, . . . , En are mutually exclusive events (that is, the intersection of any pair of them is empty) and E is the union of E1, E2, . . . , En, then
P(E) = P(E1) + P(E2) + . . . + P(En).

If E and F are any two events, then

P(E F) = P(E) + P(F) - P(E F).
Examples

Let S be the original sample space for the experiment above;

S = {HHH, HHT, HTH, HTT, THH, THT, TTH, TTT}

Let E be the event that heads comes up at exactly once;

E = {HTT, THT, TTH}
and let F be the event that heads comes up exactly twice;
F = {HHT, HTH, THH}.

Then E and F are mutually exclusive, and

P(E F) = P(E) + P(F) = 3/8 + 3/8 = 6/8.

Now let E be as above, and let F be the event that heads comes up at most once.

F = {HTT, THT, TTH, TTT}
Then E and F are not mutually exclusive, and E F is the event that heads comes up exactly once (= E). Therefore,
P(E F) = P(E) + P(F) - P(E F)
= 3/8 + 4/8 3/8 = 4/8.

Top of Page
Further Properties of Probability

The following are true for any sample space S and any event E.

 P(S) = 1 The probability of somethinghappening is 1. P() = 0 The probability of nothinghappening is 0. P(E') = 1-P(E) The probability of E nothappening is 1 minus the probability of E.

Example

Continuing with

S = {HHH, HHT, HTH, HTT, THH, THT, TTH, TTT}

Let E be the event that heads comes up at exactly once;

E = {HTT, THT, TTH}.
Then E' is the event that heads does not come up exactly once, and
P(E') = 1- P(E) = 1- 3/8 = 5/8.

Top of Page
Conditional Probability

If E and F are two events, then the conditional probability, P(E|F), is the probability that E occurs, given that F occurs, and is defined by

 P(E|F) = P(E F)P(F) .

We can rewrite this formula in a form known as the multiplication principle:

P(E F) = P(F)P(E|F).

Conditional Estimated Probability
If E and F are events and P is the estimated probability, then

 P(E|F) = fr(E F)fr(F) .

Conditional Probability for Equally Likely Outcomes
If all the outcomes in S are equally likely, then

 P(E|F) = n(E F)n(F) .
Example

Let S be the original sample space for the experiment above;

S = {HHH, HHT, HTH, HTT, THH, THT, TTH, TTT}

Let E be the event that heads comes up at exactly once;

E = {HTT, THT, TTH}
and let F be the event that the first coin comes up heads;
F = {HHH, HHT, HTH, HTT}.

Then the probability that heads comes up exactly once, given that the first comes up heads is

 P(E|F) = P(E F)P(F) = P{HTT}P(F) = 1/81/2 = 14 .

Since the outcomes in this experiment are equally likely, we could also use the formula

Independent Events

The events E and F are independent if

P(E|F) = P(E)
or, equivalently (assuming that P(F) is not 0), we have the:

 Test for Independence The events E and F are independent if and only if P(E F) = P(E)P(F).

If two events E and F are not independent, then they are dependent.

Given any number of mutually independent events, the probability of their intersection is the product of the probabilities of the individual events.

Examples

As in the example immediately above, let S be the original sample space for the experiment above;

S = {HHH, HHT, HTH, HTT, THH, THT, TTH, TTT}

Let E be the event that heads comes up at exactly once;

E = {HTT, THT, TTH}
and let F be the event that the first coin comes up heads;
F = {HHH, HHT, HTH, HTT}.

To test these two events for independence, we check the formula

P(E F) = P(E)P(F).
Here,
P(E) = 3/8, P(F) = 1/2, and
E F = {HTT}, so that P(E F) = 1/8.
Since
(3/8)(1/2) 1/8,
the events E and F are not independent.

Top of Page
Bayes' Theorem

The short form of Bayes' Theorem states that if E and F are events, then

 P(F|E) = P(E|F)P(F)P(E|F)P(F) + P(E|F')P(F') .

We can often compute P(F|E) by instead constructing a probability tree.(To see how, go to the tutorial by following the link below.)

An expanded form of Bayes' Theorem states that if E is an event, and if F1, F2, and F3 are a partition of the sample space S, then

 P(F1|E) = P(E|F1)P(F1)P(E|F1)P(F1) + P(E|F2)P(F2) + P(E|F3)P(F3) .

A similar formula works for a partition of S into four or more events.

Example

If P(E|F) = 0.95     P(E|F') = 0.15     P(F) = 0.1     P(F') = 0.9, then

P(F|E)=
 P(E|F)P(F) P(E|F)P(F) + P(E|F')P(F')
=
 (0.95)(0.1)(0.95)(0.1) + (0.15)(0.9)
0.4130.

This example comes from a scenario discussed in the tutorial (link on the adjacent box).

Top of Page
Markov Systems, State Transition Diagram, Transition Matrix

A Markov system (or Markov process or Markov chain) is a system that can be in one of several (numbered) states, and can pass from one state to another each time step according to fixed probabilities.

If a Markov system is in state i, there is a fixed probability, pij, of it going into state j the next time step, and pij is called a transition probability.

A Markov system can be illustrated by means of a state transition diagram, which is a diagram showing all the states and transition probabilities. (See example opposite.)

The matrix P whose ijth entry is pij is called the transition matrix associated with the system. The entries in each row add up to 1. Thus, for instance, a 22 transition matrix P would be set up as in the following figure.

Example

Transition Diagram:
(Missing arrows indicate zero probability.)

Matrix:

Top of Page
Distribution Vectors and Powers of the Transition Matrix

A distribution vector is a row vector with one nonnegative entry for each state in the system. The entries can represent the number of individuals in each state of the system.

A probability vector is a row vector in which the entries are nonnegative and add up to 1. The entries in a probability vector can represent the probabilities of finding a system in each of the states.

If v is an initial distribution vector and P is the transition matrix for a Markov system, then the distribution vector after 1 step is the matrix product, vP.

Distribution After 1 Step:   vP
The distribution one step later, obtained by again multiplying by P, is given by (vP)P = vP2.
Distribution After 2 Steps:   vP2
Similarly, the distribution after n steps can be obtained by multiplying v on the right by P n times, or multiplying v by Pn.
Distribution After n Steps:   vPn

P2 is the two-step transition matrix for the system. Similarly, P3 is the three-step transition matrix, and Pn is the n-step transition matrix. This means that the ijth entry in Pn is the probability that the system will pass from state i to state j in n steps.

Try our on-line utility which computes powers of the transition matrix and its action on the distribution vector. Even better (and far more flexible) is our Matrix Algebra Tool, (updated in November 2003) where you can compute several things simultaneously, have the answers shown in fraction or decimal form, and also compute inverses and even complicated expressions involving several matrices.

Examples

Let

 P = 0.2 0.8 0 0.4 0 0.6 0.5 0.5 0

and let v = [ 100   200   300 ] be an initial distribution. Then the distribution after one step is given by
 vP = [ 100   200   300 ] 0.2 0.8 0 0.4 0 0.6 0.5 0.5 0 = [ 250   230   120 ]

The distribution one step later is given by
 vP2 = (vP)P = [ 250   230   120 ] 0.2 0.8 0 0.4 0 0.6 0.5 0.5 0 = [ 202   260   138 ].

To obtain the two-step transition matrix, we calculate
 P2 = 0.2 0.8 0 0.2 0.8 0 0.4 0 0.6 0.4 0 0.6 0.5 0.5 0 0.5 0.5 0
 = 0.36 0.16 0.48 0.38 0.62 0 0.3 0.4 0.3

Thus, for example, the probability of going from State 3 to State 1 in two steps is given by the 3,1-entry in P2, namely 0.3.

Top of Page
Long Term Behavior of Markov Systems

If P is a transition matrix for a Markov system, and if v is a distribution vector with the property that vP = v, then we refer to v as a steady state (distribution) vector.

To find a steady state distribution for a Markov System with transition matrix P, we solve the system of equations given by

 x + y + z + . . . = 1 [x   y   z . . . ]P = [x   y   z . . .]
where we use as many unknowns as there are states in the Markov system. A steady state probability vector is then given by
v = [x   y   z . . . ]
A regular Markov system is one for which some power of its transition matrix has no zero entries. A regular Markov system has only one steady state vector.

Long Term Behavior If the higher and higher powers of P approach a fixed matrix P, we refer to P as the steady state or long-term transition matrix. If a Markov system is regular, then its long-term transition matrix is given by the square matrix whose rows are all the same and equal to the steady state probability vector

[x   y   z . . .].

Computing the Steady State Matrix Numerically

Using technology, it is often possible to approximate P with great accuracy by simply computing a very large power of P. How large? You know it is large enough when the rows are all the same to withing the accuracy you desire. For small matrices like the ones in the textbook, P256 is usually more than good enough. (256 is a power of 2, and so the computation of P256 is especially fast on our Matrix Algebra Tool. Try it!) Warning;If the matrix does not stabalize fairly quickly, then there is a danger of significant computational: the larger the numer of computations required, the more serious this problem becomes.

Example

The transition matrix

 P = 0.2 0.8 0 0.4 0 0.6 0.5 0.5 0

that we looked at above is regular, since P3 contains only non-zero entries. (Check this on your graphing calculator or on our on-line utility.)

The steady state probability vector is given by

v = [35/99   40/99   24/99],
so that
 vP = [35/99   40/99   24/99] 0.2 0.8 0 0.4 0 0.6 0.5 0.5 0 = [35/99   40/99   24/99] = v.

The long-term transition matrix is therefore

 P = 35/99 40/99 24/99 35/99 40/99 24/99 35/99 40/99 24/99

(To check this, calculate P256 on the on-line utility.)

Using the Matrix Algebra Tool

Open the Matrix Algebra Tool and enter the matrix P using the following format (note the commas between each pair of entries in each row.): (If you like, you can just copy the blue text below and paste it into the window.)

P = [0.2, 0.8, 0
0.4, 0, 0.6
0.5, 0.5, 0]
Then, in the formula field, enter the formula P^256 and press "Compute". If you want to see the answer as fractions rather than decimals, make sure "Fraction Mode" is checked (it is located just under the buttons).

Top of Page

Last Updated: September, 2006