Lab 5: Frequency Distributions





Frequency Distribution Tables


For this lab we'll use a new data file that includes hypothetical course grade information. Download the file Students.sav. To get this file, click on Students.sav and then select "save". You may need to name the file as "students.sav" so that SPSS will recognize it (all SPSS data files must have a .sav ending in the name). Put the file someplace to save it for future use. After saving the file you should open it up in SPSS. Open SPSS and then open the data file students.sav from where you saved it.


In this file there are a number of variables. For now we'll just look at quiz1 and quiz2. Your task for this part of the lab is to create a frequency distribution table for each of these variables and to compare them to get a feel for some of the features of distributions.


Go to SPSS (which should already be open if you followed the instructions above). The students.sav file should be open already.

It is hard to answer these last 4 questions just by looking at the numbers as they are. Instead we can start using some statistical procedures to organize the data, to make it easier to understand the data.


A frequency distribution is an organized tabulation of the number of individuals located in each category on the scale of measurement.


STEP 1: What is the range of responses (highest and lowest numbers)? The X column has been filled in for you based on the range of responses. (Your book often uses Y to refer to scores in a data set. X and Y can be used interchangeably.)



  X f p % c%

Scores on quiz1 range from 0 to 10 so we list these values in the X column starting with the highest value and listing each value down to the lowest.


STEP 2: How many of each did we get?


Fill in the f column. This is the frequency of occurrence. For each X value list in the f column next to it how many of those scores were listed in the quiz1 column in the students.sav file.


This tells you how many of each response we got. Note that there may be 0's in the f column if no one got that particular score.


Notice that if you add up the frequency column, you get the total number of observations.
S f = N


If you wanted to know what the total of all of the X's was, how would you do it? The easiest way would be to multiply the (X) & (f) columns and then add (sum) the results.
S (Xf )


Calculate the sum of all the scores using this formula


Now let's work on the other columns in the table.


STEP 3: Proportions How much of the total group got this value for X? How do you get this information?


p = f / N


Recall that N = the total number of observations.

Fill in the p column for each X value by caluculating the proportion of all scores from the value you listed in the f column.


STEP 4: Percentages What percentage of the group got each value of X? To get this, convert the proportions to percentages.


p * 100


Fill these values into the % column in the table.


STEP 5: c% The c% column is cumulative percentage. Basically all you do here is start from the lowest and go up the chart adding together the percentages. Think back to getting your ACT scores. You may remember something like "your score is in the 76th percentile. This means that 76% of the people who took the test got your score or worse. Notice that the final c% (on the top of the chart) should always equal 100 (because 100% of the people could get the maximum score or worse).


Fill in the c% column in the table by adding each c% to the next % value, starting from the bottom (X value of 0).


From a frequency distribution table you can "see" the distribution more easily. At a glance you can see what the highest and lowest scores are, whether some scores are "outside" of the rest (that is did a few people really bomb the test or did a few ace it), what the most common score was, where most of the scores were, etc.


Click here to see what your quiz1 frequency distribution table should look like when it's completed. Note that this table is in reverse order to the one you were asked to make (i.e., 10 is at the bottom). This is a feature specific to SPSS so be careful when interpreting frequency distribution tables created by SPSS.


Now look at your finished frequency distribution table and answer the following questions:


(2) What percentage of the scores is at or below a score of 7?
(3) Where does it appear that most of the scores are located?
(4) What does your answer to (2) tell you about the difficulty of the quiz?

Grouped Frequency Distribution Tables


When there are too many different response categories to list every category in a frequency distribution table, we can group the scores into class intervals and use the intervals as the X values in our table. For example, think of a percentage grading scale, (A = 90-100, B = 80-89, ...). Percentage grades can be any value between 0% and 100%. We'll use the percent variable in the students.sav file to make a grouped frequency distribution table. We'll group the scores into typical grading categories (i.e., A = 90%-100%, B = 80%-89%, etc.)


I've set up the table below for with class intervals as the X values.


(5) Please finish the table below for the variable percent, which represents final course grades for the students.sav file. You will need to count frequencies for each interval and then follow the steps you did above for filling in the p, %, and c% columns. Note that you may need to round some scores to place them into categories. Round below .5 to the fill percentage below and .5 and above to the full percentage above.


  X	    f	    p	    % 	    c%


Making Frequency Distribution Tables with SPSS


SPSS will also create this table for you. Go to the "Analyze" menu, select "Descriptive statistics", and within that sub menu select "Frequencies".



SPSS will then ask you for which variable you want the table for.



For quiz 1 the frequency table output should look something like this:



(6) Please create a frequency distribution table using SPSS for quiz2. You may either print out the output created and staple it to your lab worksheet to hand in, or you can try to "cut" and "paste" the graphs directly into this worksheet.  


Compare the two frequency distributions (quiz 1 and quiz 2)  and answer the following questions:

(7) For which quiz do the scores appear to be more evenly distributed across the scale?


(8) Which quiz appeared to be harder? How do you know this?



Graphing Frequency Distributions


In the sections we saw that one way to summarize and simplify an entire distribution of scores is by organizing the scores in a frequency distribution table. In this section we will learn about several other ways to represent distributions, focusing primarily on graphic displays: bar charts, histograms, and stem-and-leaf plots.

Bar graphs

(9) Make a bar graph of the counts of the final grades (variable "grades" in the file) in the class (i.e. A, B, C,...). What was the most common grade in the course? Copy into worksheet.

(10) Make a bar graph of the counts of the final grades in the class (i.e. A, B, C,...), further broken down by whether they attended the review session or not. Copy into worksheet. Based on the graph, would you conclude that attending the review session had an impact on final grades? Why?




For quiz 1 the frequency table output should look something like this:


Because, the above histogram is based on a Grouped frequency distribution table of quiz 1 (see previous lab for discussion). Go ahead and group scores 10 & 9, 8 & 7, 6&5, etc. and see if now the histogram looks as you'd expect it would.


An important lesson from this is that the size of the interval that you plot may influence the overall shape of the histogram. Below is a histogram of the quiz 1 scores. Use the sliding arrow to change the bin width and observe how the apparent shape of the distribution changes.

        (11) Make histograms of quiz 2, 3, and 4.

        Note: They should all be added to the same output window (so don't close the output window until you're done with the lab).



Frequency Distributions and Their properties


Now that we have a feel for how to look at distributions of variables let's return to our three quizzes (quiz 2, 3, & 4).


(12) Which quiz was the hardest? Which was the easiest? Why do you come to that conclusion?


(13) Which quiz(zes) was/were positively skewed? Which quiz(zes) was/were negatively skewed? Are there any that are not skewed (i.e. are roughly symmetric)?


(14) Are there any scores that may be potential outliers?


Hint: There are 3 characteristics used that completely describe a distribution: shape, central tendency, and variability (we'll also consider outliers). We'll be talking about central tendency (roughly, the center of the distribution) and variability (how broad is the distribution) in future labs.