We are often asked what the chances are for a Y-STR haplotype to remain
unchanged. The answer depends on how many markers are being considered and
for how many DNA Transmission events (TE).
We worked this out for the FTDNA 37 and 67 marker panels.
Do not be misled into thinking the information here is more than
"most likely estimates". Calculations are based on averages of published average mutation rates,
with estimates for unpublished rates. They
are indicative in the aggregate but may not apply in specific instances. As
one expert (Susan Hedeen) wrote: "Some families show great haplotype
stability over many generations; others are fire-breathing mutants."
Imagine a "jars of marbles" experiment. Say that you have a jar in
front of you, containing 1,000 marbles, 999 white and one red. What are the
chances of randomly picking the red marble from the jar? Now say that there are
37 (or 67) such jars and you'll pick one marble from each. What are the
chances that at least one marble will be red? What are the chances of two
Now, for the advanced experiment, say that you pick from the jars
multiple times, replacing the marbles each time (but recording their
colors). What are the chances of no red (all white) marbles? One red marble?
One DNA Transmission Event
For one TE, this is a matter of the probability of multiple independent
events (PMIE). Any marker is free to mutate, though the chances of it doing so are
small. The mathematical expressions are
p = chance of a mutation
q = chance of no mutation = 1 - p
For simplicity of the mathematics, it is convenient to use a constant,
average probability of marker mutations than to calculate each marker
individually. See note 1 for details
Those chances range
The average represents the chance of each marker in a haplotype of 37 or 67
markers undergoing a mutation in one TE vs. none. For the entire haplotype, we multiplied by
the number of markers to derive a
probability of any of the markers mutating. See note 2 for
To account for the chance of a marker undergoing a forward-and-back pair of
mutations, resulting in no observable change, we adjusted the mutation rate
as described in note 3.
That gives us the chance, q = 1-p, of no observable (see note 3)
For one TE, the chance of no mutations (or a forward/back pair) is large; the
chance of one or more observable mutations is smaller.
Although the chance of a haplotype experiencing no observable change in
one transmission event is almost 100%; the chances decrease with each additional
transmission. This is a problem in binomial probability. (k occurrences in n
We will let:
P(k) = the probability of k mutations;
p = the probability (from above) of any one of the markers mutating;
k = the number of mutation occurrences (sometimes called "successes") and;
n = the number of trials (transmission events).
The binomial probability formula is to the right:
(The exclamation point,
!, means to take the factorial of the number. 2!=2*1, 3!=3*2*1, 4!=4*3*2*1,
In Excel, the worksheet functions are
P(k) =BINOMDIST(k,n,p,FALSE) for
probability of the exact number of mutations (k) and
P(i≤k) =BINOMDIST(k,n,p,TRUE) for probability of the cumulative, k or fewer mutations.
Note: Calculations and graphs were done with an Excel spreadsheet. Other spreadsheet
programs can perform the same function but syntax may differ.
We recognize that a marker may mutate away from an initial allele value and
then back to that value; two changes have occurred but appear as no change.
(They are not "observable".) We have attempted to adjust the probabilities
downward to account for this.
We are not concerned here with degrees of mutation, only with the fact of occurrence (or non-occurrence).
Whether a marker follows the step-wise or infinite alleles model is a
No Observable Mutations
The special case of k=0 simplifies the formula to
P(k=0) = (1-p)n.
The chance of no mutations shrinks exponentially with each
Fortunately for k=0, the factorial of zero = 1, any number raised to the power
of zero, x0= 1 and n-0=n, allowing us to remove all the k terms.
The graph and table below show the probabilities for a haplotype
remaining observably unchanged over a series of transmission events. (Here, k
= 0 or zero)
Figure 1: Chance of No Observable Mutations
P(k= 0 mutations)
The calculations show these patterns:
The probability of no observable mutations decreases with each additional
By four transmissions, the probability of a 37- or 67-marker haplotype
remaining observably unchanged is less than 50%; it is more probable than
not that a marker will mutate.
A 67-marker haplotype has lesser chance than a 37-marker haplotype of
observably unchanged due to
the greater number of markers.
The calculations show one reason for exact matches between two living men being relatively uncommon.
Between two men sharing the same 2nd-great-grandfather as a MRCA, five
DNA transmissions ("trials") have happened; the probabilities of the haplotype remaining unchanged are ~1:3 for 37
markers and ~1:5 for 67.
One or More Mutations
We next turn to the chances of one or more mutations (k≥1), again a matter of binomial
probability -- except when k>n, it is a matter of multiple dependent
events where P(n,k)≈ pn/k.
One or Two Mutations
The graph and table below show the probabilities of exactly k=1 & k=2.
Probabilities for k>n (e.g., k=2, n=1; see note 6)
are estimated; the binomial distribution
does not solve because the factorial of (n-k)<0 is not defined.
Figure 2: Chance of one or two
(Irrespective of back-mutations.)
P(k= 1, 2 mutations)
The probabilities of a specific number of mutations
increase to a maximum, and then decrease as
the chances of greater numbers of mutations are increasing.
Three or Four Mutations
The graph and table below show the probabilities of exactly k=3 & k=4.
Probabilities for k>n are estimated by probability of multiple dependent
events; the binomial distribution function does not solve because the factorial of (n-k)<0 is undefined.
Figure 3: Three to four
(Irrespective of back-mutations.
P(k= 3, 4 mutations)
The stability of a haplotype is revealed by how closely it remains to its
original pattern as it experiences
more transmission events. This is shown in the graph and table below using
k≤2 & k≤4.
Cumulative Probability of
k≤2 & k≤4
(Irrespective of back-mutations.
Note that the cumulative distributions have a distinctly
appearance from those for a specific number
Σ P(k≤2, k≤4)
A 37- or 67-marker haplotype is almost certain to undergo two or fewer
mutations for four transmissions and four or fewer for five transmissions. After
which, the probabilities decrease with each transmission.
As the number of transmission events increases, it becomes increasingly unlikely that a haplotype will retain its exact
pattern (i.e., undergo no observable mutations). By four TE, the probability of a 37-marker
haplotype retaining its exact identity has declined to 43% and a 67-marker
haplotype to 29%.
Corollary: One should not expect to find exact matches between two
descendants of a common ancestor if the generations of separation between the
descendants is two or more, i.e., TE≥4 (the most recent common ancestor is their
grandfather or more distant). It is more likely than not that there will
have been at least one mutation in a 37- or 67-markeer haplotype.
However, it is very likely that a haplotype will retain a high degree of
similarity to its original pattern through six or fewer TE. One can expect to
find matches of ≤2 mutations (see note 5) for four generations of separation, TE≥8, and perhaps
Revised: 15 Dec 2016
Mutation probability calculations
We used published data of individual
marker mutations frequencies and calculated the geometric mean for 37 markers. For 67 markers, we took the geometric mean for 37 and -- since the 38-67 makers
are less volatile than 26-37 -- took the geometric mean for 1-25 and 1-37
(including 1-25 twice).
Return to prior place.
For 37 markers, we took, as p, the geometric mean of the marker rates. For 67 markers, as #s
38-67 rates are unpublished but believed to be less volatile, we took the geometric mean
of #s1-25 & #s1-37 (1-25 included twice). To calculate P(n,k,p) we used the
Excel binomial distribution formula Return.
The chance of a pair of mutations on the same marker in a number of TE is the square
of the mutation rate but the second of the pair may more likely be toward
the initial value than away from it. We arbitrarily assigned a
weight of 3:1 so that, in a mutation pair, the second would more likely be toward the
Because the impact of this weighting is a function of the square of a small number,
selecting a weighting of, say, 5:1 would only adjust p in the 4th place after the decimal
(The chance of a mutation pair is slight, ~2.5*10-6,
as compared to ~2.3*10-3 for a single mutation.)
No observable mutation
"Observable" refers to the possible effect of a back-mutations disguising a
change from the prior haplotype. If a marker mutates to a new value and
then back to its initial value, no change will be observed. This is more
relevant to apparent non-change of the haplotype than to instances with
observable mutations. Return.
This is roughly equivalent to genetic distance (GD) ≤2. However, GD is
calculated in a manner in which a mutation is not always a step of GD.
Our page on this complicated
subject covers GD in more detail. Return.
k>n -- e.g., k=2, n=1
The factorial of (n-k)<0 -- e.g., (-1)! = ∞ -- is not defined, rendering
the binomial probability function without a solution. Therefore, we used the
following method to estimate the probabilities.
While a mutation can not occur twice on the same marker in one TE, it is
possible for more than one marker to mutate in one TE. We applied the
probability of multiple dependent events in these cases because a 2nd
mutation depends on there having been a 1st, a 3rd depends on a 2nd, etc.