ENGG 181/BIO 179

Dr. S.Y. Rabbany

Statistics


Two statistics methods:

Measure of the "center" of a distribution or the "spread" of a distribution.

I. Measures of Central Tendency

In statistics, one needs to describe the typical term of a given set of data.
Three measures: Mean, Median, and the Mode

The Mean (arithmetic average) is the number obtained by adding all the data and dividing that sum by the number of data
Formula for the mean are also expressed:

The Median Half of the data are larger and half smaller.

a) for an odd number of terms, the median is the middle term of the distribution, after the terms have been arranged in numerical order.
b) for an even number of terms, it's the mean value of the 2 middle terms of the distribution, after they have been arranged in ascending order

The Mode The mode of the distribution is that term in the distribution which occurs most frequently.
A distribution may have more than one mode.
A distribution which has 2 modes, is called bimodal.
In cases that all numbers occur exactly the same number of times, the mode is undefined.

For different distribution, different choices of these statistics are most appropriate.

II. Measures of Spread It's useful to know the average value of your data, but this number alone is not sufficient to get a good sense of the distribution. We need some measure of how spread out the data are around the center

The Variability (sometimes called dispersion or spread)
Common measure of variability are the range, variance, and standard deviation.
The Range = largest value - smallest value
Variance and SD Range involves only the two extremes, therefore we need to quantify

variability. Chose center of the distribution and determine variation from it. A natural choice for the center is the mean. A disadvantage is that the variance is measured in square units but the distribution is not!

Therefore, we need to make the variability the same unit as the distribution. Take square root of the variance this measure of disturbance is called the Standard deviation.

Note: When the center is measured using the mean, the usual way of reporting the spread is via the standard deviation. When the mean is not a whole number, computations may be cumbersome. An alternative computational formula for the SD is as follows:

III. Normal Approximation
Normal curve is a symmetric "bell-shaped" curve having a high point in the center and falling rapidly to each side. Because the distribution is symmetric, the mean and median coincide. By analyzing the normal curve, it is possible to show:

Nearly all data (about 99.7%) fall within 3 SD's of the mean.

In summary, we can use the process of approximation from the normal curve to compute the percent of data falling into various ranges, provided we know three things:

 

Back to course outline.

email me