Gamma Distributions
Gamma distributions show up frequently in DNA and many
other fields including cell biology, insurance claims, rainfall, particle
physics, radio waves and signal
detection. These probability distributions describe a wide variety of
natural phenomena.
Gammas are a class of probability distributions of which
exponential, Poisson and chisquare are special cases. TMRCA calculators
based on ySTR matches, for example, employ gamma distributions with different parameter values.
Author's note: If you are mathphobic, you can skip this page.
But, please recognize that these are important aspects of genetic
genealogy.
HTML is (strangely) not kind to presentation of mathematical expressions,
which this topic demands. We've done our best to present the equations
as clearly as possible given the limitations.
Here on the right, for example, is a figure from our study of yhaplotype rarity: The curves represent percentages (by category) of actual observed scores on a particular scale. As will be seen later,
these curves bear strong resemblance to gamma distributions with increasing
α parameters as we go from 12 markers to more.
Gamma distributions look odd because they
involve factorials. The gamma function (symbol Γ) is a divisor in the
gamma distribution equation; Γ(x) = (x1)!. Gamma functions
start small (Γ[1] = 1) but grow rapidly (Γ[6] = 120).
Gamma distributions derive from
Σ(β^{α}x^{α1}e^{βx})
dx = Γ(a), where Σ[f(x)]dx
represent the integral from 0 to ∞ with respect to x.
Aside: Looking into gamma distributions more closely got
interesting after noticing so much of the data generated in the Taylor
Family Genes project appeared to match these
curves.
Then, they got fascinating for their applicability to so many uses.
About the Gamma Distribution
Gamma distributions can be parameterized in terms of a
shape
parameter α = k and
an inverse scale parameter β =
1/θ, called a
rate
parameter. A random variable X that
is gammadistributed with shape X ~ Γ(xα,β) ≡ Gamma(α,β). The parameters (k,θ or α,β)
must be positive, real numbers  that is, not less than 0 and not involving
√(1).
Only in some instances must they be integers  that is, not less than 0 and
not involving √(1). Only in some instances must they be integers..
Which of the parameter pairs  α,β or k,θ  is chosen depends on
convenience for the particular application. Both are common because either can be more convenient depending
on the situation. We'll place more emphasis on the α,β form;
it is more convenient in genetic genealogy.
Of most interest to us are
 PDF: The probability
density function describes the probability for a particular x
value.
 CDF: The cumulative
distribution function is the sum of probabilities for
a range of x values.:
Properties of the Gamma
{adapted from Wikipedia)
Property 
Equations 
with α, β parameters 
with k, θ parameters 
Parameters 
Shape: α>0
Scale: β>0, β=1/θ 
Shape: k>0
Scale: θ>0, θ
= 1/β 
Probability density
function (probability
at x) 
PDF[x] = (β^{α}x^{α1}e^{βx })
/ Γ(a)
= (β^{α}x^{α1})
/ e^{βx} Γ(a) 
PDF[x] = x^{k1}e^{x/θ }/
[Γ(k)θk] 
Cumulative distribution
function (probability
from 0 to x) 
CDF[x] = 1 γ(α,βx)/Γ(a) 
CDF[x] = 1  [Γ(k, x/θ] / Γ(k) 
Mean: 
E[x] = α/β 
E[x] = kθ 
Mode (PDF): (where PDF
peak occurs) 
Mode[PDF] = (α1)/β for α
>= 1 
Mode[PDF] = (k1)θ for k >= 1 
Median: 
No simple closed form 
No simple closed form 
Variance,
standard deviation: 
Var[y] = α/β^{2},
s[y] = sqrt(α)/β 
Var[y] = αθ^{2},
s[y] = θsqrt(α) 
Skew: (asymmetry) 
Sk[f(x)] = 2/√α
= 2/sqrt(α) 
Sk[x] = 2/√k
= 2/sqrt(k) 
Kurtosis (tails): 
K[x] = 6/α + 3 
K[x] = 6/k + 3 
Entropy
(Information
in distribution): 
H[x] = α  ln(β) + ln[Γ(α)] +
(1α)ψ(a),
where ψ(a) = Γ'(α)/Γ'(α) 
digamma function
or derivative
(slope) of the gamma function
at α 
H[x] = k  ln(1/θ) + ln[Γ(k)] +
(1k)ψ(k) 
Slope: (dy/dx) 
See below 

Moment generating
function 
MGF(x) = (1t/β}^{α} for
t < β 

Characteristic
function 
CF(x) = (1it/β}^{α}
for 

What is e?
The number symbolized by e is the base of natural logarithms: ~2,7183.
It is a constant, neither a parameter nor a variable.
What is skewness?
Skew is "bunching up" or peaking of the distribution to either the left or right of
the mean. An unskewed curve is symmetrical, looking the same on both sides
of the mode. The further the mode occurs from the mean, the more skewed is
the distribution curve. For comparison, a normal (AKA, Gaussian)
distribution has a skew of 0; the mean is at the curve's peak.
Notice that the skew of a gamma distribution depends only on α, by
definition a real number >0. Thus, its skew is always >0.
What is kurtosis?
Kurtosis (from a Greek word meaning "curved, arching") is a measure of the "tailedness"
of the probability distribution for a realvalued random variable. The
higher the kurtosis, the fatter are the tails and the more outlier (extreme)
values are found. For comparison, a normal (Gaussian) distribution has kurtosis = 3; K3>0
is considered "excess".
An alternative interpretation of kurtosis (by Moor in 1986) defined kurtosis
as dispersion around the mean and standard deviation (e.g., as in bimodal
distributions).
Kurtosis, too, of a gamma depends only on α. For α<2, kurtosis is greater
than that of a Gaussian (normal) distribution.
What is entropy?
Entropy measures the unpredictability of information content, how much
new information the data contain. Large entropy is more informative, less
predictable, than small
entropy. For
example, a firsttime opinion poll has a higher entropy value (new information) than a repetition of the same poll
because the first largely predicts the second.
Important!
The probability density function (PDF) is more informative of the
particular distribution's characteristics than the
cumulative distribution function (CDF). The gamma's CDF curve will take on only two shapes: hockeystick
(J) or inclined S, both eventually reaching 100%.
The PDF curve will have either a hockeystick or skewed bell
(rollercoaster) shape; the x/y
values for the peak and slopes of the tails say much about the
distribution.
It is therefore essential to derive or graph the PDF to see the
distribution's parameters. Our discussion focuses on the PDF, not the CDF.
Behavior:
Some things we can say about how gamma distributions behave:
 When α<2, β=1 & x>=1, the gamma appears as a classic
power distribution (e.g., y=c/e^{x}) with maximum
PDF at x<=1, The curve shape (for both PDF & CDF) is like a J or hockeystick.
This happens because of the factorial in the divisor:
1/Γ[int(α)1] = 1/Γ(11) =
1/(11)! = 1/0! = 1/1 =1 and because the β power has no effect: (e^{βx }
= e^{1x }= e^{x}),
The formula thus reduces to PDF[x] = [x^{(α1)}e^{x}]
= [x^{(α1)}]/(e^{x}); the only nonconstant
terms are α
& x.
In the case of α=1, the formula further reduces to PDF[x] = 1/e^{x};
x is the only nonconstant term.
.
 When α=>2, the PDF distribution has a rollercoaster shape, an
increasingly steep rise until near the peak, followed by steep &
gradually slowing decline.
 The x value for the PDF maximum is often found near x=(α1)*β.
If such a point exists, it will be where the slope is zero, that is, where
x^{(α1)}/[e^{βx}] = Γ(α)/β^{α}
 The gamma converges to the normal distribution as α → ∞.
For a normal curve, skew = 0 & kurtosis = 3.
At α>10, skew[Γ(x}α,β)] < 0.6 & kurtosis[Γ(x,α,β)] < 3.6;
at α>30, skew[Γ(x,α,β)] < 0.4 & kurtosis[Γ(α,β)] < 3.2; these
gammas are close approximations to the normal.
 When θ=2 (β=1/2) with k=v/2 (α=v/2), the gamma is a chisquare
distribution with v degrees of freedom. (v is an integer.)
 A Poisson distribution is a gamma for discrete events (integers) and time. It is
typically used to assess expectations for rare events, i.e., on the
right tail of the distribution. In this region, slope (i.e., dy/dt) is ~0,
so can be disregarded.
Slope:
The slope of a function's curve is also known as its "instantaneous rate
of change", differential or derivative.
Here, α & β are parameters, not variables; so.
 f'(x) = ^{d}/_{dx}[(β^{α}x^{α1}e^{βx}) / Γ(a)] = [β^{α}/Γ(a)] *^{ d}/_{dx}[x^{α1}e^{βx}] (Note: Bringing nonvariables together)
But, by the Product Rule: (fg)' = (f'g)*(fg')
with respect to x
>
(Notes: e^{βx} = 1/e^{βx}, β^{α}/β = β^{α1})
^{ d}/_{dx}[x^{α1}e^{βx}] = ^{ d}/_{dx}[x^{α1}]e^{βx}
* ^{ d}/_{dx}[e^{βx}]x^{α1} = (αx^{α2})e^{βx}
* (x^{α1})(e^{βx}/β) = (α/β)(x^{α2}/e^{βx})(x^{α1}/e^{βx})
=
= (α/β)(x^{α2}x^{α1}/(e^{βx}e^{βx})
=
(α/β)(x^{2α3})(e^{2βx})
 Therefore:
= [β^{α}/Γ(a)] * [αβ(x^{α2}/e^{βx})]
= [αβ^{α}/{Γ(a)β}] * (x^{2α3})(e^{2βx})
= [αβ^{α1}/Γ(a)] * [(x^{2α3}/e^{2βx})]

= [αβ^{α1}/Γ(a)] * [(x^{2α3}/e^{2βx})]
= αβ^{α1}/(α1)! * (x^{2α3}/e^{2βx}) .
 The slope is in two parts: A function of x,
(x^{2α3} / e^{2βx}), multiplied by a function
of the parameters, [αβ^{α1})/(a1)!]
In Excel
The Excel function is =GAMMADIST(x,α,β,cumulative) where
 x is the value at which to evaluate the function,
 α is the shape parameter for the function,
 β is the scale parameter and
 "cumulative" is either TRUE (1) or FALSE (0).
TRUE yields the cumulative distribution function (CDF) value for Σ(0
to x); FALSE yields the PDF value at x.
Other spreadsheet programs will also have a gamma distribution function,
though syntax may vary.
For an entire distribution, use various x values and constant α,β parameters.
Or, one may develop "test" distributions and vary α,β for a range of x
values.
Comment: Many of the graphs displayed here were created using Excel.
Applications
The gamma is used as a model in yDNA interpretations, such as
TMRCA
MacDonald TMRCA
This popular TMRCA calculator
apparently uses gamma distributions for its
model. The graph below shows calculated results for 0 to 7 mismatching
markers of 37 compared:
MacDonald TMRCAs for 37 Markers 


Exact match:
α=1, β~ 1.5
One mismatch: α~2, β~2.6
Two mismatches α~3, β~3 
37 mkrs: α=1, β~ 0.267
67 mkrs: α=1, β~ 0.322
111 mrks: α=1, β~ 0.439 
Note: The MacDonald calculator cuts off distributions when PDF(Gen)
<= 0.05. Graphs for more or less than 37 markers would display
approximately the same patterns, though with different parameters.
While β values (for other than exact matches) are somewhat hard to infer, α values appear to follow
the rule mismatches plus 1.
FTDNA TiP^{©}
TiP probabilities graph
This algorithm displays cumulative probabilities, whose graph looks
similar to the one to the right. This graph depicts TMRCA cumulative
probabilities by generation for a single mismatch on three markers, all
other markers matching.
But, we analyzed
the numbers into pergeneration results. it is a simple matter of subtracting the
cumulative of each generation from that of the prior. For example to get the
probability for the fifth generation (counting backward in time), the
cumulative probability for four generations is subtracted from that for
five.
The graphs below illustrate TMRCA probabilities for some matches. (As TiP
predictions rely on which specific markers differ, there are many solutions.).
The "per 3 gen" curve sums probabilities for Generation(x) +/ 1, resulting in
higher peak & greater slope at peak.




Exact match,
37 markers 
GD=1 @ CDY, 37 markers 
GD=1 @ DYS576, 37 markers 
GD=1 @ DYS464, 67 markers 
One purpose
of pulling cumulative probabilities apart into pergeneration
probabilities is to enable recombination into more focused
categories. For example, 57 generations is more focused than 17.
Note: The FTDNA TiP calculator cuts off distributions at Gen = 25;
Gen >24 is not considered genealogically useful. TiP does not calculate the
entire curve.
Exact match, no mismatches, GD=0
For an exact match, the α (alpha) parameter equals 1 and the β (beta) parameter equals the probability for generation 1,
making it possible to extend the curve to more than 24 generations in the
past though these probabilities will be very small. The curve for an exact match is the same
by TiP as MacDonald's. Parameters are α=1 and β=1/Max(PDF), a function of
the number & volatility of markers.
Single Mismatch, CDY, 37 markers
CDY, multicopy markers #s 33 & 34, with an average mutation frequency of 1 per ~28 generations, is the most volatile STR marker tested by FTDNA.
Being a multicopy marker, all copies are considered together and treated by
the
infinite alleles model. A mismatch only on CDY also yields an exponential curve. (TiP will count CDY
in the infinite alleles manner; values either match or they don't.) The α
parameter is less than 2 and close enough to 1 that Γ(α) = 1;
β=1.8.
Single Mismatch, DYS576, 37 markers
DYS576, marker #32, has an average mutation rate of 1 per 127
generations. Approximate parameters for the gamma distribution are α=2.1 and
β =2.1.
Single Mismatch, DYS464, 67 markers
DYS464 is also a multicopy marker, #s 2225, with an average mutation
rate of 1 per 177 generations. Approximate gamma distribution parameters are
α=2.0 and β=2.2.
More than one mismatch
Parameters for multiple mismatches are less obvious.
Haplotype rarity
The graph to the right depicts the distribution of "Wheaton scores" found
in a study of haplotype rarity.
After collecting the data, summarizing and graphing histograms, the
distribution curves resemble those of gamma distributions. This was an unintentional
byproduct; the resemblance was noticed only after the fact.
As the resolution (12 markers, 25, 37, 67) increase the α,β parameters of the
corresponding gammas also increase. Apparent parameters are approximately
 !2marker distribution: α = 1.00, β = 0.955 yields a curve
which best fits the actual data;
χ^{2} = 0.00808. However, the gamma does
not reach the actual percentages found in the lowest categories.
 25marker distribution: α = 2.70, β = 1.35 yields a curve which
best fits the data; χ^{2} = 0.00304.
 37marker distribution: α = 3.21, β = 2.20 yields a curve which
best fits the data; χ^{2} = 0.0064.
 67marker distribution: α = 3.81, β = 3.20 yields a curve
which best fits the data; χ^{2} = 0.0171.
The chisquare (χ^{2}) "goodness of fit" test was used to find
parameters. The lower the χ^{2} value, the better the
calculated curve fits the actual data.
Finding parameters by inspection
We are mostly interested in small value of α and β, that is α=k <=10 and
β<=10, θ>=1/10.
Here, α= 5 while β takes values of 1, 2, & 3
To find α, β
This method yields approximate values. From a data table or graph, estimate the mode (peak probability) and
variance (sd^2).
 From Mode = (α1)/β: α = 1 + mode*β,
 From Var[x] = α/β^{2}, β = α/sqrt(Var[x]); {Var(x) = Σ(xẍ)/N;
For TiP: Var(x=1, 2,3,..23, 24) = 50; thus
β = α/sqrt(50)}
 Solve as simultaneous equations.
For the special case of α=1 and x=1,
the formula reduces to
PDF[Γ(x,α,β)] = β/e^{β}.
This function reaches a peak value of 1/e (~0.3679) when β=1; for β<>1,
the value is less.
Graphical method
 Graph x,y values for the probability distribution of interest.
(Note: Do not graph the cumulative distribution; it's less informative.)

A spreadsheet program's x,y graph is recommended, with one xvalue column and two
yvalue columns.
 One y column is for the distribution to inspect; the
other y for a gammadistribution model based on α & β values you'll input as
trials.
 Estimate peak y value and its corresponding x value (mode) this
yields the relationship: Mode[PDF] = (α1)/β for α
>= 1, which can be algebraically transformed to α = β*Mode(PDF) +1
and β = (α1)/Mode[PDF]
Hints:
 If x for the highest y value = 0 (an exponential
distribution); α,k <= 1 and β
= 1/Max(y), θ = Max(y).
 If x for peak y <10, try limiting the x scale to 20 or less.
 Look at the curve, remembering that skew and kurtosis are both inverse
functions of α. A small α produces highly skewed, curve with a
fat left tail (i.e., big y for small x) and thin right tail (small y for
big x).
 When α>1, β affects dispersion. Peak y shifts right and is
less.
 Try various α & β values; when the graph lines merge, you're close.
 Bear in mind that the distribution you're inspecting may use some
unknown function for its α,β parameters.
 The table below gives some examples. These will be approximate; tinker.
Max
y 
x @
Max y,
β=1 
Trial α for β 
β=1,
α= 
β=2,
α= 
β=3,
α= 
1.00 
0 
1 
N/A 
N/A 
0.50 
0 
N/A 
1 
N/A 
0.40 
1.0 
1.75 
N/A 
N/A 
0.30 
1.7 
2.6 
1.25 
1.01 
0.25 
2.4 
2.5 
1.45 
1.1 
0.20 
3.7 
4.7 
1.8 
1.25 
0.15 
7 
7.9 
4.8 
2.5 
0.10 
7.5 
16.5 
4.75 
1.6 
0.05 
>30 
35 
15 
7.5 
Note: In the special case of α=1, the maximum pointvalue probability occurs at x=0.
The β parameter can be
found by PDF(x=0) = θ =1/β.
Caution: Not the whole truth
It is unusual for natural phenomena to exactly follow a simple
mathematical model. There are too many variables, many of which are unknown
or incompletely known. It is best to interpret results from these models as
most likely estimates, only roughly approximating reality.
Math symbols used
 Γ = Greek letter Gamma; refers to either (depending on
usage) the gamma function or gamma distribution
 ! = Factorial, if after a number or number symbol; a number's integer
value times one less than the integer times one more less times one
more less, etc., until the number one is reached. n! =
n*(n1)*(n2)*(n3)...*(1), where n is an integer.1!=1, 2!=2*1=2,
3!=3*2*1=6,
4!=4*3*2*1=24. By convention, 0!=1.
 (). [], {} = Enclosures for equation terms;
indicate order of operations. Parentheses are used first, then square
brackets, then curly brackets. May also mean "a function of", as below.
 Γ(x) = Gamma function: Γ(α) = gamma
function of α = (α1)!,
i.e., factorial of α1
 Γ(x,α,β) = Gamma probability distribution, of which the
gamma function is one term.
 x = An independent variable; a value for evaluation of a
function or distribution;
 y = A dependent variable whose value is determined by a
given x and the given function.
 α= Greek letter alpha, the "shape" parameter for the
gamma distribution
 β = Greek letter beta, the "scale" parameter for the
gamma distribution
 k = either Latin k or Greek kappa; an alternate shape
parameter for the gamma; k=α
 θ = Greek letter theta. an alternate scale parameter for the
gamma; θ=1/β
 PDF = Probability Density Function, the probability of an x
value given distribution parameters
 CDF = Cumulative Distribution Function. the cumulative
probability from 0 to x. Where the PDF represents the probabilities of
specific x values, the CDF represents the area under the PDF curve, the
integral of the PDF. From
calculus: CDF[Γ(x,α,β)] =
∑PDF[Γ(x,α,β)]
from 0 to x.
 f(x), g(x), means any functions of x. When g(x)
is used in combination with f(x), it denotes a separate function of x.
 ^{ d}/_{dx}f(x), also f'(x),
represents the derivative, differential, rate of change or slope
(these are synonyms) of the function f(x).