Statistics for Scientists

Material for Session 4: The Statistical Probability Distribution Functions

(Return to the main Course page)


Probability Distribution Functions:

All our observations and measurements are affected by Random Processes

Random Processes produce Random Variates

Random Variates are described by Probability Distribution Functions (PDFs)

PDFs allow you to predict the probability of getting different outcomes of the random process

PDFs can be discrete or continuous

PDFs have adjustable parameters to allow them to fit real-world situations

Statistical tables simply give the probability of a random variate being of a certain size


Common Random Variates and their Probability Distribution Functions

Random Variate Process that Produces this kind of Variate Function Characterized by
Binary Single Flip of a Coin

or

If Uniform<p, Binary=0, otherwise 1

P(H) = p
P(T) = 1-p
p = Probability of Heads
(p=0.5 for Fair Coin)
Binomial The number of successes in N attempts N = # of Trials (Flips)
x = # of Successes (Heads)
p = probability of Success in 1 Trial
Poisson Count the # of Sporadic events occurring in a unit time (or space)
Uniform Computer-generated (pseudo-random)
Normal Add a lot of other random variates together

(Normality arises from the additive combination of many independent random effects)

Mean, Standard Deviation

(Standard Normal has mean=0, SD=1)

Log-normal Multiply a lot of other random variates together

(Log-normality arises from the multiplicative combination of many independent random effects)

or

Raise e to the power of a Normal variate

Mean

Standard Deviation

Chi Square The sum 1 or more squared Standard Normal variates degrees of freedom (# of independent variates combined)
Student t Division of a Normal by a Chi Square degrees of freedom (of the Chi Square variable in the denominator)
Fisher F Division of one Chi Square by another Chi Square DFnum, DFdenom, the degrees of freedom of the Chi Square variates in the numerator and denominator


How to Create the different kinds of Random Variates using a penny

Random Binary Digits
(equal probability of 0 or 1)
Flip a coin once:
H becomes 0,
T becomes 1.
Random Decimal Digits
(all digits equally likely)
Flip a coin 4 times:
H-H-H-H becomes 0,
H-H-H-T becomes 1,
H-H-T-H becomes 2,
H-H-T-T becomes 3,
H-T-H-H becomes 4,
H-T-H-T becomes 5,
H-T-T-H becomes 6,
H-T-T-T becomes 7,
T-H-H-H becomes 8,
T-H-H-T becomes 9,
For any other outcome, flip the coin four more times and try again.
Standard Uniform Random Numbers
(range from 0 to 1)
Generate 5 or 6 random decimal digits, eg: 4 8 6 2 5 9,
write them with a decimal point in front: 0.486259
General Uniform Random Numbers
(range from a to b)
Generate a 0-to-1 Uniform number, eg: 0.486259,
Multiply it by (b-a), then add a.
Standard Normal Variate
(mean=0, Std Dev=1)
Generate 12 Uniform (0-to-1) numbers,
Add them together,
Subtract 6.
General Normal Variate
(mean = m, Std Dev = s)
Generate a Standard (m=0, s=1) variate,
Multiply by s,
Add m.
Chi Square, N degrees of freedom Generate N Standard Normal Variates,
Square each one,
Add them up.
Student t (N degrees of freedom) Generate a Standard Normal Variate,
Generate a Chi Square variate with N degrees of freedom and divide it by N,
Divide the Normal by the Square Root of Chi Square/N.
Fisher F (N1, N2 degrees of freedom) Generate a Chi Square with N1 d.f., and divide it by N1
Generate another Chi Square with N2 d.f., and divide it by N2
Divide the first by the second.


How to construct a statistical table for a PDF

Generate a lot of random variates from the PDF

Arrange them from smallest to largest

Find the value that chops off the top 5% of the numbers. This is the value for which the top tail area is 5%.

Similarly, find the values for other tail areas.

Repeat this process many times, and average the results.


How Random Variates and PDFs are used in Statistical Inference

Steps in the Logic Chain:

Null Hypothesis (H0 )says that the apparent "effect" you observed is due only to random fluctuations, not to any real effect in the population.

Don't Reject the H0 unless it's unreasonable to believe in it. (there is very little chance that your observed "effect" could have arisen solely from random fluctuations.

Find the Random Process that corresponds to H0 .

Find the Random Variable that results from the Random Process.

Find the Distribution Function for the Random Variable.

Fit the Distribution Function to the your data

Find the probability of the random variable exceeding the value you observed (p-value)

If p<0.05, it is unlikely that random fluctuations could have produced your "effect", so you can reject H0 and claim that the effect is real.


How are Practical Statistical Tests Developed

Find a formula that expresses the size of the effect, relative to the size of random fluctuations.

Determine how this number is distributed.

Construct a table showing how often this number will exceed a certain size.