Statistics for Scientists

Material for Session 5: Analyzing Categorical Data

(Return to the main Course page)


Probability Distribution Functions:

Some Random Variates arise naturally:

Others arise from computations done on other random variates:


Probability Tables:

Simply tabulate how often (what % of the time) a random variate will be larger than some specified value

Consolidated Probability Tables
Distribution df 0.5 0.2 0.1 0.05 0.02 0.01 0.005
Normal 0.67 1.28 1.64 1.96 2.33 2.58 2.81
Chi Square 1 0.46 1.64 2.71 3.84 5.41 6.64 7.88
Chi Square 2 1.39 3.22 4.60 5.99 7.82 9.21 10.60
Chi Square 3 2.37 4.64 6.25 7.82 9.84 11.34 12.84
Chi Square 4 3.36 5.99 7.78 9.49 11.67 13.28 14.86
Chi Square 5 4.35 7.29 9.24 11.07 13.39 15.09 16.75


Comparing Observed vs Expected Values

For a single type of event:

Historically, we expect 9 fatal accidents in any given week. Last week we had 15 fatal accidents. How likely is this?

The Poisson distribution predicts the following likelihood of accidents in a week if the average is 9:

Use the Poisson Distribution:

# Accidents

Probability

# Accidents

Probability

0

0.0 %

10

11.9 %

1

0.1 %

11

9.7 %

2

0.5 %

12

7.3 %

3

1.5 %

13

5.0 %

4

3.4 %

14

3.2 %

5

6.1 %

15

1.9 %

6

9.1 %

16

1.1 %

7

11.7 %

17

0.6 %

8

13.2 %

18

0.3 %

9

13.2 %

19

0.1 %

The probability of getting 15 events is 1.9%.

But the probability of getting 15 or more events is 1.9 + 1.1 + 0.6 + 0.3 + 0.1, or about 4.0%.

But getting 6 fewer events than expected is just as unusual as getting 6 more events than expected.

The probability of getting 3 or fewer events is 1.5 + 0.5 + 0.1, or about 2.1%.

So we would expect a weekly fatal accident count as "surprising" as ours to occur about 6.1% of the time. (three weeks a year).

So, it might be due to chance (a borderline case).

OR:

Use the Normal approximation to the Poisson distribution:

A Poisson variable with a mean of N is distributed approximately like
a Normal variable with a mean of N and a SD of Sqrt(N).

A Poisson variable with a mean of 9 is distributed like a Normal variable with a mean of 9 and a SD of 3.

15 is 6 larger than 9 (the mean).

6 is twice as much as 3 (the SD)

So having 16 events when you expect 9 is like drawing the number 2.0 from a Standard Normal distribution (m=0, sd=1).

This only happens about 5% of the time.