Other pages & sections of our site:
[Home]  [Y-DNA]   [Contacts   [Groups]  [Haplogroups[Trees]  [Project Blog]  [Special]   [FAQ]
Other pages & sections of our site:
[Home]  [Y-DNA]   [Contacts   [Groups]  [Haplogroups[Trees]  [Project Blog]  [Special]   [FAQ]
On this page:
 

Haplotype Stability

We are often asked what the chances are for a Y-STR haplotype to remain unchanged. The answer depends on how many markers are being considered and for how many DNA Transmission events (TE).

We worked this out for the FTDNA 37 and 67 marker panels.


  Caution:

Do not be misled into thinking the information here is more than "most likely estimates". Calculations are based on averages of published average mutation rates, with estimates for unpublished rates. They are indicative in the aggregate but may not apply in specific instances. As one expert (Susan Hedeen) wrote: "Some families show great haplotype stability over many generations; others are fire-breathing mutants."


Analogy

Imagine a "jars of marbles" experiment. Say that you have a jar in front of you, containing 1,000 marbles, 999 white and one red. What are the chances of randomly picking the red marble from the jar? Now say that there are 37 (or 67) such jars and you'll pick one marble from each. What are the chances that at least one marble will be red? What are the chances of two red marbles?

Now, for the advanced experiment, say that you pick from the jars multiple times, replacing the marbles each time (but recording their colors). What are the chances of no red (all white) marbles? One red marble? Two? More?

One DNA Transmission Event

For one TE, this is a matter of the probability of multiple independent events (PMIE). Any marker is free to mutate, though the chances of it doing so are small. The mathematical expressions are

For simplicity of the mathematics, it is convenient to use a constant, average probability of marker mutations than to calculate each marker individually. See note 1 for details

Those chances range

µNo µ
Slowest marker 0.0090%  99.991%
Fastest marker 3.4375%  96.563%
Average (geometric
mean)
 0.4889%  99.511%
Average (arithmetic
mean)
 0.4932% 99.507%

The average represents the chance of each marker in a haplotype of 37 or 67 markers undergoing a mutation in one TE vs. none. For the entire haplotype, we  multiplied by the number of markers to derive a probability of any of the markers mutating. See note 2 for more.

To account for  the chance of a marker undergoing a forward-and-back pair of mutations, resulting in no observable change, we adjusted the mutation rate as described in note 3.

That gives us the chance, q = 1-p, of no observable (see note 3) mutations occurring

37 mkrs67 mkrs
Mutation  ~16.61%  ~40.29%
No Mutations  ~83.38%  ~59.71%

For one TE, the chance of no mutations (or a forward/back pair) is large; the chance of one or more observable mutations is smaller.

More Transmissions

Although the chance of a haplotype experiencing no observable change in one transmission event is almost 100%; the chances decrease with each additional transmission. This is a problem in binomial probability. (k occurrences in n trials)

We will let:

The binomial probability formula is to the right:
  (The exclamation point, !, means to take the factorial of the number. 2!=2*1, 3!=3*2*1, 4!=4*3*2*1, etc.)
 

Excel Formulas

In Excel, the worksheet functions are

Note: Calculations and graphs were done with an Excel spreadsheet. Other spreadsheet programs can perform the same function but syntax may differ.

Back mutations

We recognize that a marker may mutate away from an initial allele value and then back to that value; two changes have occurred but appear as no change. (They are not "observable".) We have attempted to adjust the probabilities downward to account for this.

Mutation steps

We are not concerned here with degrees of mutation, only with the fact of occurrence (or non-occurrence). Whether a marker follows the step-wise or infinite alleles model is a separate matter.

No Observable Mutations

The special case of k=0 simplifies the formula to P(k=0) = (1-p)n. The chance of no mutations shrinks exponentially with each new transmission.

Fortunately for k=0, the factorial of zero = 1, any number raised to the power of zero, x0= 1 and n-0=n, allowing us to remove all the k terms.

The graph and table below show the probabilities for a haplotype remaining observably unchanged over a series of transmission events. (Here, k = 0 or zero)


 
 
Figure 1: Chance of No Observable Mutations

 

P(k= 0 mutations)
TE 37 mkrs 67 mkrs
n= 1 81.32% 73.32%
n= 2 66.13% 53.75%
n= 3 53.78% 39.41%
n= 4 43.74% 28.89%
n= 5 35.57% 21.18%
n= 6 28.93% 15.53%
n= 7 23.52% 11.39%
n= 8 19.13% 8.35%
n= 9 15.56% 6.12%
n=10 12.65% 4.49%
n=11 10.29% 3.29%
n=12 8.37% 2.41%
n=13 6.80% 1.77%
n=14 5.53% 1.30%
n=15 4.50% 0.95%
n=16 3.66% 0.70%

The calculations show these patterns:

The calculations show one reason for exact matches between two living men being relatively uncommon. Between two men  sharing the same 2nd-great-grandfather as a MRCA, five DNA transmissions ("trials") have happened; the probabilities of the haplotype remaining unchanged are ~1:3 for 37 markers and ~1:5 for 67.

One or More Mutations

We next turn to the chances of one or more mutations (k≥1), again a matter of binomial probability -- except when k>n, it is a matter of multiple dependent events where P(n,k)≈ pn/k.

One or Two Mutations

The graph and table below show the probabilities of exactly k=1 & k=2. Probabilities for k>n (e.g., k=2, n=1; see note 6) are estimated; the binomial distribution does not solve because the factorial of (n-k)<0 is not defined.



 
 
Figure 2: Chance of one or two
observable mutations
(Irrespective of back-mutations.)

P(k= 1, 2 mutations)
TE k=1 k=2
 37 mkrs  67 mkrs  37 mkrs  67 mkrs
n= 1 18.68% 26.68% ~3.49% ~7.1%
n= 2 30.38% 39.13% 39.13% 35.6%
n= 3 37.06% 43.03% 43.03% 43.1%
n= 4 40.18% 42.06% 42.06% 34.7%
n= 5 40.84% 38.55% 38.55% 23.3%
n= 6 39.86% 33.92% 33.92% 14.1%
n= 7 37.82% 29.01% 29.01% 7.9%
n= 8 35.15% 24.31% 24.31% 4.3%
n= 9 32.16% 20.05% 20.05% 2.2%
n=10 29.06% 16.33% 16.33% 1.1%
n=11 25.99% 13.17% 13.17% 0.5%
n=12 23.06% 10.53% 10.53% 0.3%
n=13 20.32% 8.37% 8.37% 0.1%
n=14 17.79% 6.61% 6.61% 0.06%
n=15 15.50% 5.19% 5.19% 0.03%
n=16 13.45% 4.06% 4.06% ~0.00%

The probabilities of a specific number of mutations increase to a maximum, and then decrease as the chances of greater numbers of mutations are increasing.

Three or Four Mutations

The graph and table below show the probabilities of exactly k=3 & k=4. Probabilities for k>n are estimated by probability of multiple dependent events; the binomial distribution function does not solve because the factorial of (n-k)<0 is undefined.

 
 
Figure 3: Three to four
observable mutations
(Irrespective of back-mutations.

P(k= 3, 4 mutations)
TE k=3 k=4
 37 mkrs  67 mkrs  37 mkrs  67 mkrs
n= 1 ~0.0% ~0.0% ~0.0% ~0.0%
n= 2 ~0.6% ~1.9% ~0.5% ~0.5%
n= 3 0.7% ~1.9% ~3.0% ~3.0%
n= 4 2.1% 5.6% 0.1% 7.1%
n= 5 4.3% 10.2% 0.5% 21.2%
n= 6 7.0% 15.0% 1.2% 15.5%
n= 7 10.0% 19.2% 2.3% 11.4%
n= 8 13.0% 22.5% 3.7% 8.3%
n= 9 15.8% 24.8% 5.5% 6.1%
n=10 18.4% 26.0% 7.4% 4.5%
n=11 20.6% 26.2% 9.4% 3.3%
n=12 22.3% 25.6% 11.5% 2.4%
n=13 23.6% 24.4% 13.5% 1.8%
n=14 24.4% 22.8% 15.4% 1.3%
n=15 24.8% 20.8% 17.1% 1.0%
n=16 24.8% 18.8% 18.5% 0.7%

Cumulative Changes

The stability of a haplotype is revealed by how closely it remains to its original pattern as it experiences more transmission events. This is shown in the graph and table below using k≤2 & k≤4.

 
 
 

Figure 4: Cumulative Probability of k≤2 & k≤4
  (Irrespective of back-mutations.


Note that the cumulative distributions have a distinctly
different appearance from those for a specific number
of mutations.

Σ P(k≤2, k≤4)
TE k≤2 k≤4
 37 mkrs  67 mkrs  37 mkrs  67 mkrs
n= 1~100%~100% ~100%~100%
n= 2~100%~100% ~100%~100%
n= 3~100%~100% ~100%~100%
n= 4~100%~100% ~100%~100%
n= 597.8%93.9% ~100%~100%
n= 695.2%87.8% 99.98%99.9%
n= 791.7%80.3% 99.9%99.4%
n= 887.4%72.1% 99.7%98.3%
n= 982.5%63.6% 99.2%96.4%
n=1077.2%55.4% 98.5%93.7%
n=1171.7%47.6% 97.5%90.1%
n=1266.1%40.4% 96.1%85.6%
n=1360.6%34.0% 94.4%80.6%
n=1455.1%28.4% 92.2%75.0%
n=1549.9%23.5% 89.7%69.1%
n=1644.9%19.4% 86.8%63.0%
n=1740.3%15.8% 83.6%56.9%
n=1835.9%12.9% 80.2%51.0%
n=1932.0%10.4% 76.5%45.3%
n=2028.3%8.39% 72.6%39.9%
n=2125.0%6.73% 68.7%34.9%
n=2222.0%5.38% 64.6%30.3%
n=2319.3%4.29% 60.6%26.2%
n=2416.9%3.40% 56.6%22.5%

A 37- or 67-marker haplotype is almost certain to undergo two or fewer mutations for four transmissions and four or fewer for five transmissions. After which, the probabilities decrease with each transmission.

Summary

As the number of transmission events increases, it becomes increasingly unlikely that a haplotype will retain its exact pattern (i.e., undergo no observable mutations). By four TE, the probability of a 37-marker haplotype retaining its exact identity has declined to 43% and a 67-marker haplotype to 29%.

Corollary: One should not expect to find exact matches between two descendants of a common ancestor if the generations of separation between the descendants is two or more, i.e., TE≥4 (the most recent common ancestor is their grandfather or more distant). It is more likely than not that there will have been at least one mutation in a 37- or 67-markeer haplotype. 

However, it is very likely that a haplotype will retain a high degree of similarity to its original pattern through six or fewer TE. One can expect to find matches of ≤2 mutations (see note 5) for four generations of separation, TE≥8, and perhaps more.

Revised: 15 Dec 2016



Notes

  1. Mutation probability calculations
    We used published data of individual marker mutations frequencies and calculated the geometric mean for 37 markers. For 67 markers, we took the geometric mean for 37 and -- since the 38-67 makers are less volatile than 26-37 -- took the geometric mean for 1-25 and 1-37 (including 1-25 twice). Return to prior place.
     
  2. Calculations
    For 37 markers, we took, as p, the geometric mean of the marker rates. For 67 markers, as #s 38-67 rates are unpublished but believed to be less volatile, we took the geometric mean of #s1-25 & #s1-37 (1-25 included twice). To calculate P(n,k,p) we used the Excel binomial distribution formula  Return.
     
  3. Back-mutations
    The chance of a pair of mutations on the same marker in a number of TE is the square of the mutation rate but the second of the pair may more likely be toward the initial value than away from it. We arbitrarily assigned a weight of 3:1 so that, in a mutation pair, the second would more likely be toward the original value. Because the impact of this weighting is a function of the square of a small number, selecting a weighting of, say, 5:1 would only adjust p in the 4th place after the decimal point. (The chance of a mutation pair is slight, ~2.5*10-6, as compared to ~2.3*10-3 for a single mutation.) Return.
     
  4. No observable mutation
    "Observable" refers to the possible effect of a back-mutations disguising a change from the prior haplotype. If a marker mutates to a new value and then back to its initial value, no change will be observed. This is more relevant to apparent non-change of the haplotype than to instances with observable mutations. Return.
     
  5. ≤2 mutations
    This is roughly equivalent to genetic distance (GD) ≤2. However, GD is calculated in a manner in which a mutation is not always a step of GD. Our page on this complicated subject covers GD in more detail. Return.
     
  6. k>n -- e.g., k=2, n=1
    The factorial of (n-k)<0 -- e.g., (-1)! = ∞ --  is not defined, rendering the binomial probability function without a solution. Therefore, we used the following method to estimate the probabilities. While a mutation can not occur twice on the same marker in one TE, it is possible for more than one marker to mutate in one TE. We applied the probability of multiple dependent events in these cases because a 2nd mutation depends on there having been a 1st, a 3rd depends on a 2nd, etc.  Return.