On this page:

Gamma Distributions

Gamma distributions show up frequently in DNA and many other fields including cell biology, insurance claims, rainfall, particle physics, radio waves and signal detection. These probability distributions describe a wide variety of natural phenomena.

Gammas are a class of probability distributions of which exponential, Poisson and chi-square are special cases. TMRCA calculators based on ySTR matches, for example, employ gamma distributions with different parameter values.

Author's note: If you are math-phobic, you can skip this page. But, please recognize that these are important aspects of genetic genealogy. 

HTML is (strangely) not kind to presentation of mathematical expressions, which this topic demands. We've done our best to present the equations as clearly as possible given the limitations.

Distributions of Wheaton Scores

Here on the right, for example, is a figure from our study of y-haplotype rarity: The curves represent percentages (by category) of actual observed scores on a particular scale. As will be seen later, these curves bear strong resemblance to gamma distributions with increasing α parameters as we go from 12 markers to more.

Gamma distributions look odd because they involve factorials. The gamma function (symbol Γ) is a divisor in the gamma distribution equation; Γ(x) = (x-1)!.  Gamma functions start small (Γ[1] = 1) but grow rapidly (Γ[6] = 120).

Gamma distributions derive from Σαxα-1e-βx) dx = Γ(a), where Σ[f(x)]dx represent the integral from 0 to ∞ with respect to x.

Aside: Looking into gamma distributions more closely got interesting after noticing so much of the data generated in the Taylor Family Genes project appeared to match these curves. Then, they got fascinating for their applicability to so many uses.

About the Gamma Distribution

Gamma distributions can be parameterized in terms of a  shape parameter α = k and an inverse scale parameter β = 1/θ, called a rate parameter. A random variable X that is gamma-distributed with shape  X ~ Γ(x|α,β) ≡ Gamma(α,β). The parameters (k,θ or α,β) must be positive, real numbers -- that is, not less than 0 and not involving √(-1). Only in some instances must they be integers - that is, not less than 0 and not involving √(-1). Only in some instances must they be integers..

Which of the parameter pairs -- α,β  or k,θ -- is chosen depends on convenience for the particular application. Both are common because either can be more convenient depending on the situation. We'll place more emphasis on the α,β  form; it is more convenient in genetic genealogy.

Of most interest to us are

Properties of the Gamma

{adapted from Wikipedia)
Property Equations
with α, β parameters with k, θ parameters
Parameters Shape: α>0
Scale:  β>0, β=1/θ
Shape: k>0
Scale:  θ>0, θ = 1/β
Probability density
  function (probability
  at x)
PDF[x] = (βαxα-1e-βx ) / Γ(a)
    = (βαxα-1) / eβx Γ(a)
PDF[x] = xk-1e-x/θ / [Γ(k)θk]
Cumulative distribution
  function (probability
  from 0 to x)
CDF[x] = 1- γ(α,βx)/Γ(a) CDF[x] = 1 - [Γ(k, x/θ] / Γ(k)
Mean: E[x] = α/β E[x] = kθ
Mode (PDF): (where PDF
  peak occurs)
Mode[PDF] = (α-1)/β for α >= 1 Mode[PDF] = (k-1)θ for k >= 1
Median: No simple closed form No simple closed form
  standard deviation:
 Var[y] = α/β2, s[y] = sqrt(α)/β Var[y] = αθ2, s[y] = θsqrt(α)
Skew: (asymmetry)  Sk[f(x)] = 2/α = 2/sqrt(α) Sk[x] = 2/√k = 2/sqrt(k)
Kurtosis (tails): K[x] = 6/α + 3 K[x] = 6/k + 3
Entropy (Information
 in distribution):
 H[x] = α - ln(β) + ln[Γ(α)] + (1-α)ψ(a),
where ψ(a) = Γ'(α)/Γ'(α) -- digamma function
or derivative (slope) of the gamma function
at α
H[x] = k - ln(1/θ) + ln[Γ(k)] + (1-k)ψ(k)
Slope: (dy/dx) See below  
Moment generating
MGF(x) = (1-t/β} for t < β  
CF(x) = (1-it/β} for  

What is e?

The number symbolized by e is the base of natural logarithms: ~2,7183. It is a constant, neither a parameter nor a variable.

What is skewness?

Skew is "bunching up" or peaking of the distribution to either the left or right of the mean. An un-skewed curve is symmetrical, looking the same on both sides of the mode. The further the mode occurs from the mean, the more skewed is the distribution curve. For comparison, a normal (AKA, Gaussian) distribution has a skew of 0; the mean is at the curve's peak.

Notice that the skew of a gamma distribution depends only on α, by definition a real number >0. Thus, its skew is always >0.

What is kurtosis?

Kurtosis (from a Greek word meaning "curved, arching") is a measure of the "tailedness" of the probability distribution for a real-valued random variable. The higher the kurtosis, the fatter are the tails and the more outlier (extreme) values are found. For comparison, a normal (Gaussian) distribution has kurtosis = 3; K-3>0 is considered "excess".

An alternative interpretation of kurtosis (by Moor in 1986) defined kurtosis as dispersion around the mean and standard deviation (e.g., as in bimodal distributions).

Kurtosis, too, of a gamma depends only on α. For α<2, kurtosis is greater than that of a Gaussian (normal) distribution.

What is entropy?

Entropy measures the unpredictability of information content, how much new information the data contain. Large entropy is more informative, less predictable, than small entropy. For example, a first-time opinion poll has a higher entropy value (new information) than a repetition of the same poll because the first largely predicts the second.


The probability density function (PDF) is more informative of the particular distribution's characteristics  than the cumulative distribution function (CDF). The gamma's CDF curve will take on only two shapes: hockey-stick (J) or inclined S, both eventually reaching 100%. The PDF curve will have either a hockey-stick or skewed bell (roller-coaster) shape; the x/y values for the peak and slopes of the tails say much about the distribution.

It is therefore essential to derive or graph the PDF to see the distribution's parameters. Our discussion focuses on the PDF, not the CDF.


Some things we can say about how gamma distributions behave:


The slope of a function's curve is also known as its "instantaneous rate of change", differential or derivative.

Here, α & β are parameters, not variables; so.

In Excel

The Excel function is =GAMMADIST(x,α,β,cumulative) where

Other spreadsheet programs will also have a gamma distribution function, though syntax may vary.

For an entire distribution, use various x values and constant α,β parameters. Or, one may develop "test" distributions and vary α,β for a range of x values.

Comment: Many of the graphs displayed here were created using Excel.


The gamma is used as a model in yDNA interpretations, such as TMRCA

MacDonald TMRCA

This popular TMRCA calculator apparently uses gamma distributions for its model. The graph below shows calculated results for 0 to 7 mismatching markers of 37 compared:

MacDonald TMRCAs for 37 Markers
Exact match:   α=1, β~ 1.5
One mismatch: α~2, β~2.6
Two mismatches α~3, β~3
  37 mkrs: α=1, β~ 0.267
  67 mkrs: α=1, β~ 0.322
111 mrks: α=1, β~ 0.439

Note: The MacDonald calculator cuts off distributions when PDF(Gen) <= 0.05. Graphs for more or less than 37 markers would display approximately the same patterns, though with different parameters.

While β values (for other than exact matches) are somewhat hard to infer, α values appear to follow the rule mismatches plus 1.


TiP probabilities graph

This algorithm displays cumulative probabilities, whose graph looks similar to the one to the right. This graph depicts TMRCA cumulative probabilities by generation for a single mismatch on three markers, all other markers matching.

But, we analyzed the numbers into per-generation results. it is a simple matter of subtracting the cumulative of each generation from that of the prior. For example to get the probability for the fifth generation (counting backward in time),  the cumulative probability for four generations is subtracted from that for five.

The graphs below illustrate TMRCA probabilities for some matches. (As TiP predictions rely on which specific markers differ, there are many solutions.). The "per 3 gen" curve sums probabilities for Generation(x) +/- 1, resulting in higher peak & greater slope at peak.

Exact match, 37 markers GD=1 @ CDY, 37 markers GD=1 @ DYS576, 37 markers GD=1 @ DYS464, 67 markers
One purpose of pulling cumulative probabilities apart into per-generation probabilities is to enable recombination into more focused categories. For example, 5-7 generations is more focused than 1-7.

Note: The FTDNA TiP calculator cuts off distributions at Gen = 25; Gen >24 is not considered genealogically useful. TiP does not calculate the entire curve.

Exact match, no mismatches, GD=0

For an exact match, the α (alpha) parameter equals 1 and the β (beta) parameter equals the probability for generation 1, making it possible to extend the curve to more than 24 generations in the past though these probabilities will be very small. The curve for an exact match is the same by TiP as MacDonald's. Parameters are α=1 and β=1/Max(PDF), a function of the number & volatility of markers.

Single Mismatch, CDY, 37 markers

CDY, multi-copy markers #s 33 & 34, with an average mutation frequency of 1 per ~28 generations, is the most volatile STR marker tested by FTDNA. Being a multi-copy marker, all copies are considered together and treated by the infinite alleles model. A mismatch only on CDY also yields an exponential curve. (TiP will count CDY in the infinite alleles manner; values either match or they don't.) The α  parameter is less than 2 and close enough to 1 that Γ(α) = 1; β=1.8.


Single Mismatch, DYS576, 37 markers

DYS576, marker #32, has an average mutation rate of 1 per 127 generations. Approximate parameters for the gamma distribution are α=2.1 and β =2.1.

Single Mismatch, DYS464, 67 markers

DYS464 is also a multi-copy marker, #s 22-25, with an average mutation rate of 1 per 177 generations. Approximate gamma distribution parameters are α=2.0 and β=2.2.


More than one mismatch

Parameters for multiple mismatches are less obvious.

Haplotype rarity

The graph to the right depicts the distribution of "Wheaton scores" found in a study of haplotype rarity. After collecting the data, summarizing and graphing histograms, the distribution curves resemble those of gamma distributions. This was an unintentional byproduct; the resemblance was noticed only after the fact.

As the resolution (12 markers, 25, 37, 67) increase the α,β parameters of the corresponding gammas also increase. Apparent parameters are approximately

The chi-square (χ2) "goodness of fit" test was used to find parameters. The lower the χ2 value, the better the calculated curve fits the actual data.

Finding parameters by inspection

We are mostly interested in small value of α and β, that is α=k <=10 and β<=10, θ>=1/10.

Here, α= 5 while β takes values of 1, 2, & 3

To find α, β

This method yields approximate values. From a data table or graph, estimate the mode (peak probability) and variance (sd^2).

For the special case of α=1 and x=1, the formula reduces to  PDF[Γ(x,α,β)] = β/eβ. This function reaches a peak value of 1/e (~0.3679) when β=1; for β<>1, the value is less.

Graphical method

  1. Graph x,y values for the probability distribution of interest. (Note: Do not graph the cumulative distribution; it's less informative.)
  2. Estimate peak y value and its corresponding x value (mode) this yields the relationship: Mode[PDF] = (α-1)/β for α >= 1, which can be algebraically transformed to α = β*Mode(PDF) +1 and β = (α-1)/Mode[PDF]   
    1. If x for the highest y value = 0 (an exponential distribution); α,k <= 1 and β = 1/Max(y), θ = Max(y).
    2. If x for peak y <10, try limiting the x scale to 20 or less.
  3. Look at the curve, remembering that skew and kurtosis are both inverse functions of α. A small α produces highly skewed, curve with a fat left tail (i.e., big y for small x) and thin right tail (small y for big x).
  4. When α>1, β affects dispersion. Peak y shifts right and is less.
  5. Try various α & β values; when the graph lines merge, you're close.
  6. Bear in mind that the distribution you're inspecting may use some unknown function for its α,β parameters.
  7. The table below gives some examples. These will be approximate; tinker.
x @
Max y,
Trial α for β
1.00 0 1   N/A     N/A  
0.50 0 N/A 1 N/A
0.40 1.0 1.75 N/A N/A
0.30 1.7 2.6  1.25 1.01
0.25 2.4 2.5 1.45 1.1
0.20 3.7 4.7 1.8 1.25
0.15 7 7.9 4.8 2.5
0.10 7.5 16.5 4.75 1.6
0.05 >30   35 15  7.5  
Note: In the special case of α=1, the maximum point-value probability occurs at  x=0. The β parameter can be found by PDF(x=0) = θ =1/β.

Caution: Not the whole truth

It is unusual for natural phenomena to exactly follow a simple mathematical model. There are too many variables, many of which are unknown or incompletely known. It is best to interpret results from these models as most likely estimates, only roughly approximating reality.

Math symbols used