Geol 135 Sedimentation
J Bret Bennington
Statistical Analysis The Basics
Mean azimuth of the current indicators
Imagine a bedding surface covered with tool marks and other paleocurrent indicators. We want to know what the current direction was that make the paleocurrent indicators. We realize that the current was not always precisely flowing in the same direction, so there will be variability in the azimuths given by the current indicators. Still, we would like to know what the average azimuth is the concensus of all the current indicators.
Deviation of the azimuths
We might also want to know how much variation there is from the mean azimuth value. In other words, how sloppy are the paleocurrent indicators? This can be calculated using the consistancy ratio (R). The consistancy ratio is analogous to the standard deviation in univariate statistics. If the consistancy ratio is multiplied by 100 it yields a percent value that is called the vector magnitude (L). The closer that L is to 100%, the more clumped are the azimuth values.
L = R x 100
Population mean vs. sample mean
One issue that we have to deal with is the fact that we cannot measure every single paleocurrent indicator on the bed. There are too many and most of the bed surface is probably inaccessible.
Population mean m - is not known
We can estimate m by sampling some lesser number (n) paleocurrent indicators. From this sample we get a sample mean (X), which is an estimate of the population mean.
Some questions we might want to ask:
Is our sample mean significant? For azimuth data significance indicates that the azimuths sampled are not random, rather, that they are clumped in a particular direction. Intuitively, this question must hinge in some way on sample size. Even if the population of all current indicators was randomly distributed, if we only sample a few, we might, by accident, choose ones that happen to point in a similar direction. Therefore, the fewer observations we have in our sample, the more clumped we want them to be to believe them to be nonrandom. On the other hand, if we have a lot of observations it should not surprise us if a few are off the mark.
H0 = null hypothesis = azimuths are randomly distributed, mean is not significant
Ha = alternative hypothesis = azimuths are clumped, mean is significant
To test the above hypotheses, we generate a test statistic that is usually some measure of the spread of observations around the mean. For azimuth data the test statistic is the constancy ratio. For a given number of observations we compare the constancy ratio to a critical value which represents the minimum value we expect for the number of observations made at a particular level of confidence (a), given that H0 is true.
Why choose a to be .05? Why not .01 or .001?
Type I error
Type II error
|H0 is true||Ha is true|
|Accept H0||correct||Type II error|
|Reject H0||Type I error||correct|
We might also like to know how good our estimate of the population mean is. We determine this by calculating a 95% confidence interval around the sample mean. What this confidence interval indicates is the range of values around the sample mean that we can be 95% certain contains the population mean. In other words, if we repeated our sampling 100 times, 95 of our samples would have confidence intervals that overlap the population mean.
Confidence intervals can be expressed graphically by showing a mean vector bracketed by the CI vectors.