Haplotype Stability
We are often asked what the chances are for a YSTR haplotype to remain
unchanged. The answer depends on how many markers are being considered and
for how many DNA Transmission events (TE).
We worked this out for the FTDNA 37 and 67 marker panels.
Caution:
Do not be misled into thinking the information here is more than
"most likely estimates". Calculations are based on averages of published average mutation rates,
with estimates for unpublished rates. They
are indicative in the aggregate but may not apply in specific instances. As
one expert (Susan Hedeen) wrote: "Some families show great haplotype
stability over many generations; others are firebreathing mutants."
Analogy
Imagine a "jars of marbles" experiment. Say that you have a jar in
front of you, containing 1,000 marbles, 999 white and one red. What are the
chances of randomly picking the red marble from the jar? Now say that there are
37 (or 67) such jars and you'll pick one marble from each. What are the
chances that at least one marble will be red? What are the chances of two
red marbles?
Now, for the advanced experiment, say that you pick from the jars
multiple times, replacing the marbles each time (but recording their
colors). What are the chances of no red (all white) marbles? One red marble?
Two? More?
One DNA Transmission Event
For one TE, this is a matter of the probability of multiple independent
events (PMIE). Any marker is free to mutate, though the chances of it doing so are
small. The mathematical expressions are
 p = chance of a mutation
 q = chance of no mutation = 1  p
For simplicity of the mathematics, it is convenient to use a constant,
average probability of marker mutations than to calculate each marker
individually. See note 1 for details
Those chances range

µ  No µ 
Slowest marker  0.0090%  99.991% 
Fastest marker  3.4375%  96.563% 
Average (geometric
mean)  0.4889%  99.511% 
Average (arithmetic
mean)  0.4932%  99.507% 
The average represents the chance of each marker in a haplotype of 37 or 67
markers undergoing a mutation in one TE vs. none. For the entire haplotype, we multiplied by
the number of markers to derive a
probability of any of the markers mutating. See note 2 for
more.
To account for the chance of a marker undergoing a forwardandback pair of
mutations, resulting in no observable change, we adjusted the mutation rate
as described in note 3.
That gives us the chance, q = 1p, of no observable (see note 3)
mutations occurring
 37 mkrs  67 mkrs 
Mutation  ~16.61%  ~40.29% 
No Mutations  ~83.38%  ~59.71% 
For one TE, the chance of no mutations (or a forward/back pair) is large; the
chance of one or more observable mutations is smaller.
More Transmissions
Although the chance of a haplotype experiencing no observable change in
one transmission event is almost 100%; the chances decrease with each additional
transmission. This is a problem in binomial probability. (k occurrences in n
trials)
We will let:
 P(k) = the probability of k mutations;
 p = the probability (from above) of any one of the markers mutating;
 k = the number of mutation occurrences (sometimes called "successes") and;
 n = the number of trials (transmission events).
The binomial probability formula is to the right:
(The exclamation point,
!, means to take the factorial of the number. 2!=2*1, 3!=3*2*1, 4!=4*3*2*1,
etc.)
Excel Formulas
In Excel, the worksheet functions are
 P(k) =BINOMDIST(k,n,p,FALSE) for
probability of the exact number of mutations (k) and
 P(i≤k) =BINOMDIST(k,n,p,TRUE) for probability of the cumulative, k or fewer mutations.
Note: Calculations and graphs were done with an Excel spreadsheet. Other spreadsheet
programs can perform the same function but syntax may differ.
Back mutations
We recognize that a marker may mutate away from an initial allele value and
then back to that value; two changes have occurred but appear as no change.
(They are not "observable".) We have attempted to adjust the probabilities
downward to account for this.
Mutation steps
We are not concerned here with degrees of mutation, only with the fact of occurrence (or nonoccurrence).
Whether a marker follows the stepwise or infinite alleles model is a
separate matter.
No Observable Mutations
The special case of k=0 simplifies the formula to
P(k=0) = (1p)^{n}.
The chance of no mutations shrinks exponentially with each
new transmission.
Fortunately for k=0, the factorial of zero = 1, any number raised to the power
of zero, x^{0}= 1 and n0=n, allowing us to remove all the k terms.
The graph and table below show the probabilities for a haplotype
remaining observably unchanged over a series of transmission events. (Here, k
= 0 or zero)
Figure 1: Chance of No Observable Mutations
P(k= 0 mutations) 
TE  37 mkrs  67 mkrs 
n= 1  81.32%  73.32% 
n= 2  66.13%  53.75% 
n= 3  53.78%  39.41% 
n= 4  43.74%  28.89% 
n= 5  35.57%  21.18% 
n= 6  28.93%  15.53% 
n= 7  23.52%  11.39% 
n= 8  19.13%  8.35% 
n= 9  15.56%  6.12% 
n=10  12.65%  4.49% 
n=11  10.29%  3.29% 
n=12  8.37%  2.41% 
n=13  6.80%  1.77% 
n=14  5.53%  1.30% 
n=15  4.50%  0.95% 
n=16  3.66%  0.70% 
The calculations show these patterns:
 The probability of no observable mutations decreases with each additional
transmission.
 By four transmissions, the probability of a 37 or 67marker haplotype
remaining observably unchanged is less than 50%; it is more probable than
not that a marker will mutate.
 A 67marker haplotype has lesser chance than a 37marker haplotype of
remaineing
observably unchanged due to
the greater number of markers.
The calculations show one reason for exact matches between two living men being relatively uncommon.
Between two men sharing the same 2ndgreatgrandfather as a MRCA, five
DNA transmissions ("trials") have happened; the probabilities of the haplotype remaining unchanged are ~1:3 for 37
markers and ~1:5 for 67.
One or More Mutations
We next turn to the chances of one or more mutations (k≥1), again a matter of binomial
probability  except when k>n, it is a matter of multiple dependent
events where P(n,k)≈ p^{n/k}.
One or Two Mutations
The graph and table below show the probabilities of exactly k=1 & k=2.
Probabilities for k>n (e.g., k=2, n=1; see note 6)
are estimated; the binomial distribution
does not solve because the factorial of (nk)<0 is not defined.
Figure 2: Chance of one or two
observable
mutations
(Irrespective of backmutations.)
P(k= 1, 2 mutations) 
TE  k=1 
k=2 
37 mkrs  67 mkrs  37 mkrs  67 mkrs 
n= 1  18.68%  26.68% 
~3.49%  ~7.1% 
n= 2  30.38%  39.13% 
39.13%  35.6% 
n= 3  37.06%  43.03% 
43.03%  43.1% 
n= 4  40.18%  42.06% 
42.06%  34.7% 
n= 5  40.84%  38.55% 
38.55%  23.3% 
n= 6  39.86%  33.92% 
33.92%  14.1% 
n= 7  37.82%  29.01% 
29.01%  7.9% 
n= 8  35.15%  24.31% 
24.31%  4.3% 
n= 9  32.16%  20.05% 
20.05%  2.2% 
n=10  29.06%  16.33% 
16.33%  1.1% 
n=11  25.99%  13.17% 
13.17%  0.5% 
n=12  23.06%  10.53% 
10.53%  0.3% 
n=13  20.32%  8.37% 
8.37%  0.1% 
n=14  17.79%  6.61% 
6.61%  0.06% 
n=15  15.50%  5.19% 
5.19%  0.03% 
n=16  13.45%  4.06% 
4.06%  ~0.00% 
The probabilities of a specific number of mutations
increase to a maximum, and then decrease as
the chances of greater numbers of mutations are increasing.
Three or Four Mutations
The graph and table below show the probabilities of exactly k=3 & k=4.
Probabilities for k>n are estimated by probability of multiple dependent
events; the binomial distribution function does not solve because the factorial of (nk)<0 is undefined.
Figure 3: Three to four
observable mutations
(Irrespective of backmutations.
P(k= 3, 4 mutations) 
TE  k=3 
k=4 
37 mkrs  67 mkrs  37 mkrs  67 mkrs 
n= 1  ~0.0%  ~0.0% 
~0.0%  ~0.0% 
n= 2  ~0.6%  ~1.9% 
~0.5%  ~0.5% 
n= 3  0.7%  ~1.9% 
~3.0%  ~3.0% 
n= 4  2.1%  5.6% 
0.1%  7.1% 
n= 5  4.3%  10.2% 
0.5%  21.2% 
n= 6  7.0%  15.0% 
1.2%  15.5% 
n= 7  10.0%  19.2% 
2.3%  11.4% 
n= 8  13.0%  22.5% 
3.7%  8.3% 
n= 9  15.8%  24.8% 
5.5%  6.1% 
n=10  18.4%  26.0% 
7.4%  4.5% 
n=11  20.6%  26.2% 
9.4%  3.3% 
n=12  22.3%  25.6% 
11.5%  2.4% 
n=13  23.6%  24.4% 
13.5%  1.8% 
n=14  24.4%  22.8% 
15.4%  1.3% 
n=15  24.8%  20.8% 
17.1%  1.0% 
n=16  24.8%  18.8% 
18.5%  0.7% 
Cumulative Changes
The stability of a haplotype is revealed by how closely it remains to its
original pattern as it experiences
more transmission events. This is shown in the graph and table below using
k≤2 & k≤4.
Figure 4:
Cumulative Probability of
k≤2 & k≤4
(Irrespective of backmutations.
Note that the cumulative distributions have a distinctly
different
appearance from those for a specific number
of mutations.
Σ P(k≤2, k≤4) 
TE  k≤2  k≤4 
37 mkrs  67 mkrs  37 mkrs  67 mkrs 
n= 1  ~100%  ~100% 
~100%  ~100% 
n= 2  ~100%  ~100% 
~100%  ~100% 
n= 3  ~100%  ~100% 
~100%  ~100% 
n= 4  ~100%  ~100% 
~100%  ~100% 
n= 5  97.8%  93.9% 
~100%  ~100% 
n= 6  95.2%  87.8% 
99.98%  99.9% 
n= 7  91.7%  80.3% 
99.9%  99.4% 
n= 8  87.4%  72.1% 
99.7%  98.3% 
n= 9  82.5%  63.6% 
99.2%  96.4% 
n=10  77.2%  55.4% 
98.5%  93.7% 
n=11  71.7%  47.6% 
97.5%  90.1% 
n=12  66.1%  40.4% 
96.1%  85.6% 
n=13  60.6%  34.0% 
94.4%  80.6% 
n=14  55.1%  28.4% 
92.2%  75.0% 
n=15  49.9%  23.5% 
89.7%  69.1% 
n=16  44.9%  19.4% 
86.8%  63.0% 
n=17  40.3%  15.8% 
83.6%  56.9% 
n=18  35.9%  12.9% 
80.2%  51.0% 
n=19  32.0%  10.4% 
76.5%  45.3% 
n=20  28.3%  8.39% 
72.6%  39.9% 
n=21  25.0%  6.73% 
68.7%  34.9% 
n=22  22.0%  5.38% 
64.6%  30.3% 
n=23  19.3%  4.29% 
60.6%  26.2% 
n=24  16.9%  3.40% 
56.6%  22.5% 
A 37 or 67marker haplotype is almost certain to undergo two or fewer
mutations for four transmissions and four or fewer for five transmissions. After
which, the probabilities decrease with each transmission.
Summary
As the number of transmission events increases, it becomes increasingly unlikely that a haplotype will retain its exact
pattern (i.e., undergo no observable mutations). By four TE, the probability of a 37marker
haplotype retaining its exact identity has declined to 43% and a 67marker
haplotype to 29%.
Corollary: One should not expect to find exact matches between two
descendants of a common ancestor if the generations of separation between the
descendants is two or more, i.e., TE≥4 (the most recent common ancestor is their
grandfather or more distant). It is more likely than not that there will
have been at least one mutation in a 37 or 67markeer haplotype.
However, it is very likely that a haplotype will retain a high degree of
similarity to its original pattern through six or fewer TE. One can expect to
find matches of ≤2 mutations (see note 5) for four generations of separation, TE≥8, and perhaps
more.
Revised: 15 Dec 2016
Notes
Mutation probability calculations
We used published data of individual
marker mutations frequencies and calculated the geometric mean for 37 markers. For 67 markers, we took the geometric mean for 37 and  since the 3867 makers
are less volatile than 2637  took the geometric mean for 125 and 137
(including 125 twice).
Return to prior place.
Calculations
For 37 markers, we took, as p, the geometric mean of the marker rates. For 67 markers, as #s
3867 rates are unpublished but believed to be less volatile, we took the geometric mean
of #s125 & #s137 (125 included twice). To calculate P(n,k,p) we used the
Excel binomial distribution formula Return.
Backmutations
The chance of a pair of mutations on the same marker in a number of TE is the square
of the mutation rate but the second of the pair may more likely be toward
the initial value than away from it. We arbitrarily assigned a
weight of 3:1 so that, in a mutation pair, the second would more likely be toward the
original value.
Because the impact of this weighting is a function of the square of a small number,
selecting a weighting of, say, 5:1 would only adjust p in the 4th place after the decimal
point.
(The chance of a mutation pair is slight, ~2.5*10^{6},
as compared to ~2.3*10^{3} for a single mutation.)
Return.
No observable mutation
"Observable" refers to the possible effect of a backmutations disguising a
change from the prior haplotype. If a marker mutates to a new value and
then back to its initial value, no change will be observed. This is more
relevant to apparent nonchange of the haplotype than to instances with
observable mutations. Return.
≤2 mutations
This is roughly equivalent to genetic distance (GD) ≤2. However, GD is
calculated in a manner in which a mutation is not always a step of GD.
Our page on this complicated
subject covers GD in more detail. Return.
k>n  e.g., k=2, n=1
The factorial of (nk)<0  e.g., (1)! = ∞  is not defined, rendering
the binomial probability function without a solution. Therefore, we used the
following method to estimate the probabilities.
While a mutation can not occur twice on the same marker in one TE, it is
possible for more than one marker to mutate in one TE. We applied the
probability of multiple dependent events in these cases because a 2nd
mutation depends on there having been a 1st, a 3rd depends on a 2nd, etc.
Return.