We are often asked what the chances are for a Y-STR haplotype to remain
unchanged. The answer depends on how many markers are being considered and
for how many DNA Transmission events (TE).

We worked this out for the FTDNA 37 and 67 marker panels.

Caution:

Do not be misled into thinking the information here is more than
"most likely estimates". Calculations are based on averages of published average mutation rates,
with estimates for unpublished rates. They
are indicative in the aggregate but may not apply in specific instances. As
one expert (Susan Hedeen) wrote: "Some families show great haplotype
stability over many generations; others are fire-breathing mutants."

Analogy

Imagine a "jars of marbles" experiment. Say that you have a jar in
front of you, containing 1,000 marbles, 999 white and one red. What are the
chances of randomly picking the red marble from the jar? Now say that there are
37 (or 67) such jars and you'll pick one marble from each. What are the
chances that at least one marble will be red? What are the chances of two
red marbles?

Now, for the advanced experiment, say that you pick from the jars
multiple times, replacing the marbles each time (but recording their
colors). What are the chances of no red (all white) marbles? One red marble?
Two? More?

One DNA Transmission Event

For one TE, this is a matter of the probability of multiple independent
events (PMIE). Any marker is free to mutate, though the chances of it doing so are
small. The mathematical expressions are

p = chance of a mutation

q = chance of no mutation = 1 - p

For simplicity of the mathematics, it is convenient to use a constant,
average probability of marker mutations than to calculate each marker
individually. See note 1 for details

Those chances range

µ

No µ

Slowest marker

0.0090%

99.991%

Fastest marker

3.4375%

96.563%

Average (geometric
mean)

0.4889%

99.511%

Average (arithmetic
mean)

0.4932%

99.507%

The average represents the chance of each marker in a haplotype of 37 or 67
markers undergoing a mutation in one TE vs. none. For the entire haplotype, we multiplied by
the number of markers to derive a
probability of any of the markers mutating. See note 2 for
more.

To account for the chance of a marker undergoing a forward-and-back pair of
mutations, resulting in no observable change, we adjusted the mutation rate
as described in note 3.

That gives us the chance, q = 1-p, of no observable (see note 3)
mutations occurring

37 mkrs

67 mkrs

Mutation

~16.61%

~40.29%

No Mutations

~83.38%

~59.71%

For one TE, the chance of no mutations (or a forward/back pair) is large; the
chance of one or more observable mutations is smaller.

More Transmissions

Although the chance of a haplotype experiencing no observable change in
one transmission event is almost 100%; the chances decrease with each additional
transmission. This is a problem in binomial probability. (k occurrences in n
trials)

We will let:

P(k) = the probability of k mutations;

p = the probability (from above) of any one of the markers mutating;

k = the number of mutation occurrences (sometimes called "successes") and;

n = the number of trials (transmission events).

The binomial probability formula is to the right:
(The exclamation point,
!, means to take the factorial of the number. 2!=2*1, 3!=3*2*1, 4!=4*3*2*1,
etc.)

Excel Formulas

In Excel, the worksheet functions are

P(k) =BINOMDIST(k,n,p,FALSE) for
probability of the exact number of mutations (k) and

P(i≤k) =BINOMDIST(k,n,p,TRUE) for probability of the cumulative, k or fewer mutations.

Note: Calculations and graphs were done with an Excel spreadsheet. Other spreadsheet
programs can perform the same function but syntax may differ.

Back mutations

We recognize that a marker may mutate away from an initial allele value and
then back to that value; two changes have occurred but appear as no change.
(They are not "observable".) We have attempted to adjust the probabilities
downward to account for this.

Mutation steps

We are not concerned here with degrees of mutation, only with the fact of occurrence (or non-occurrence).
Whether a marker follows the step-wise or infinite alleles model is a
separate matter.

No Observable Mutations

The special case of k=0 simplifies the formula to
P(k=0) = (1-p)^{n}.
The chance of no mutations shrinks exponentially with each
new transmission.

Fortunately for k=0, the factorial of zero = 1, any number raised to the power
of zero, x^{0}= 1 and n-0=n, allowing us to remove all the k terms.

The graph and table below show the probabilities for a haplotype
remaining observably unchanged over a series of transmission events. (Here, k
= 0 or zero)

Figure 1: Chance of No Observable Mutations

P(k= 0 mutations)

TE

37 mkrs

67 mkrs

n= 1

81.32%

73.32%

n= 2

66.13%

53.75%

n= 3

53.78%

39.41%

n= 4

43.74%

28.89%

n= 5

35.57%

21.18%

n= 6

28.93%

15.53%

n= 7

23.52%

11.39%

n= 8

19.13%

8.35%

n= 9

15.56%

6.12%

n=10

12.65%

4.49%

n=11

10.29%

3.29%

n=12

8.37%

2.41%

n=13

6.80%

1.77%

n=14

5.53%

1.30%

n=15

4.50%

0.95%

n=16

3.66%

0.70%

The calculations show these patterns:

The probability of no observable mutations decreases with each additional
transmission.

By four transmissions, the probability of a 37- or 67-marker haplotype
remaining observably unchanged is less than 50%; it is more probable than
not that a marker will mutate.

A 67-marker haplotype has lesser chance than a 37-marker haplotype of
remaineing
observably unchanged due to
the greater number of markers.

The calculations show one reason for exact matches between two living men being relatively uncommon.
Between two men sharing the same 2nd-great-grandfather as a MRCA, five
DNA transmissions ("trials") have happened; the probabilities of the haplotype remaining unchanged are ~1:3 for 37
markers and ~1:5 for 67.

One or More Mutations

We next turn to the chances of one or more mutations (k≥1), again a matter of binomial
probability -- except when k>n, it is a matter of multiple dependent
events where P(n,k)≈ p^{n/k}.

One or Two Mutations

The graph and table below show the probabilities of exactly k=1 & k=2.
Probabilities for k>n (e.g., k=2, n=1; see note 6)
are estimated; the binomial distribution
does not solve because the factorial of (n-k)<0 is not defined.

Figure 2: Chance of one or two
observable
mutations
(Irrespective of back-mutations.)

P(k= 1, 2 mutations)

TE

k=1

k=2

37 mkrs

67 mkrs

37 mkrs

67 mkrs

n= 1

18.68%

26.68%

~3.49%

~7.1%

n= 2

30.38%

39.13%

39.13%

35.6%

n= 3

37.06%

43.03%

43.03%

43.1%

n= 4

40.18%

42.06%

42.06%

34.7%

n= 5

40.84%

38.55%

38.55%

23.3%

n= 6

39.86%

33.92%

33.92%

14.1%

n= 7

37.82%

29.01%

29.01%

7.9%

n= 8

35.15%

24.31%

24.31%

4.3%

n= 9

32.16%

20.05%

20.05%

2.2%

n=10

29.06%

16.33%

16.33%

1.1%

n=11

25.99%

13.17%

13.17%

0.5%

n=12

23.06%

10.53%

10.53%

0.3%

n=13

20.32%

8.37%

8.37%

0.1%

n=14

17.79%

6.61%

6.61%

0.06%

n=15

15.50%

5.19%

5.19%

0.03%

n=16

13.45%

4.06%

4.06%

~0.00%

The probabilities of a specific number of mutations
increase to a maximum, and then decrease as
the chances of greater numbers of mutations are increasing.

Three or Four Mutations

The graph and table below show the probabilities of exactly k=3 & k=4.
Probabilities for k>n are estimated by probability of multiple dependent
events; the binomial distribution function does not solve because the factorial of (n-k)<0 is undefined.

Figure 3: Three to four
observable mutations
(Irrespective of back-mutations.

P(k= 3, 4 mutations)

TE

k=3

k=4

37 mkrs

67 mkrs

37 mkrs

67 mkrs

n= 1

~0.0%

~0.0%

~0.0%

~0.0%

n= 2

~0.6%

~1.9%

~0.5%

~0.5%

n= 3

0.7%

~1.9%

~3.0%

~3.0%

n= 4

2.1%

5.6%

0.1%

7.1%

n= 5

4.3%

10.2%

0.5%

21.2%

n= 6

7.0%

15.0%

1.2%

15.5%

n= 7

10.0%

19.2%

2.3%

11.4%

n= 8

13.0%

22.5%

3.7%

8.3%

n= 9

15.8%

24.8%

5.5%

6.1%

n=10

18.4%

26.0%

7.4%

4.5%

n=11

20.6%

26.2%

9.4%

3.3%

n=12

22.3%

25.6%

11.5%

2.4%

n=13

23.6%

24.4%

13.5%

1.8%

n=14

24.4%

22.8%

15.4%

1.3%

n=15

24.8%

20.8%

17.1%

1.0%

n=16

24.8%

18.8%

18.5%

0.7%

Cumulative Changes

The stability of a haplotype is revealed by how closely it remains to its
original pattern as it experiences
more transmission events. This is shown in the graph and table below using
k≤2 & k≤4.

Figure 4:
Cumulative Probability of
k≤2 & k≤4
(Irrespective of back-mutations.

Note that the cumulative distributions have a distinctly
different
appearance from those for a specific number
of mutations.

Σ P(k≤2, k≤4)

TE

k≤2

k≤4

37 mkrs

67 mkrs

37 mkrs

67 mkrs

n= 1

~100%

~100%

~100%

~100%

n= 2

~100%

~100%

~100%

~100%

n= 3

~100%

~100%

~100%

~100%

n= 4

~100%

~100%

~100%

~100%

n= 5

97.8%

93.9%

~100%

~100%

n= 6

95.2%

87.8%

99.98%

99.9%

n= 7

91.7%

80.3%

99.9%

99.4%

n= 8

87.4%

72.1%

99.7%

98.3%

n= 9

82.5%

63.6%

99.2%

96.4%

n=10

77.2%

55.4%

98.5%

93.7%

n=11

71.7%

47.6%

97.5%

90.1%

n=12

66.1%

40.4%

96.1%

85.6%

n=13

60.6%

34.0%

94.4%

80.6%

n=14

55.1%

28.4%

92.2%

75.0%

n=15

49.9%

23.5%

89.7%

69.1%

n=16

44.9%

19.4%

86.8%

63.0%

n=17

40.3%

15.8%

83.6%

56.9%

n=18

35.9%

12.9%

80.2%

51.0%

n=19

32.0%

10.4%

76.5%

45.3%

n=20

28.3%

8.39%

72.6%

39.9%

n=21

25.0%

6.73%

68.7%

34.9%

n=22

22.0%

5.38%

64.6%

30.3%

n=23

19.3%

4.29%

60.6%

26.2%

n=24

16.9%

3.40%

56.6%

22.5%

A 37- or 67-marker haplotype is almost certain to undergo two or fewer
mutations for four transmissions and four or fewer for five transmissions. After
which, the probabilities decrease with each transmission.

Summary

As the number of transmission events increases, it becomes increasingly unlikely that a haplotype will retain its exact
pattern (i.e., undergo no observable mutations). By four TE, the probability of a 37-marker
haplotype retaining its exact identity has declined to 43% and a 67-marker
haplotype to 29%.

Corollary: One should not expect to find exact matches between two
descendants of a common ancestor if the generations of separation between the
descendants is two or more, i.e., TE≥4 (the most recent common ancestor is their
grandfather or more distant). It is more likely than not that there will
have been at least one mutation in a 37- or 67-markeer haplotype.

However, it is very likely that a haplotype will retain a high degree of
similarity to its original pattern through six or fewer TE. One can expect to
find matches of ≤2 mutations (see note 5) for four generations of separation, TE≥8, and perhaps
more.

Revised: 15 Dec 2016

Notes

Mutation probability calculations

We used published data of individual
marker mutations frequencies and calculated the geometric mean for 37 markers. For 67 markers, we took the geometric mean for 37 and -- since the 38-67 makers
are less volatile than 26-37 -- took the geometric mean for 1-25 and 1-37
(including 1-25 twice).
Return to prior place.

Calculations

For 37 markers, we took, as p, the geometric mean of the marker rates. For 67 markers, as #s
38-67 rates are unpublished but believed to be less volatile, we took the geometric mean
of #s1-25 & #s1-37 (1-25 included twice). To calculate P(n,k,p) we used the
Excel binomial distribution formula Return.

Back-mutations

The chance of a pair of mutations on the same marker in a number of TE is the square
of the mutation rate but the second of the pair may more likely be toward
the initial value than away from it. We arbitrarily assigned a
weight of 3:1 so that, in a mutation pair, the second would more likely be toward the
original value.
Because the impact of this weighting is a function of the square of a small number,
selecting a weighting of, say, 5:1 would only adjust p in the 4th place after the decimal
point.
(The chance of a mutation pair is slight, ~2.5*10^{-6},
as compared to ~2.3*10^{-3} for a single mutation.)
Return.

No observable mutation

"Observable" refers to the possible effect of a back-mutations disguising a
change from the prior haplotype. If a marker mutates to a new value and
then back to its initial value, no change will be observed. This is more
relevant to apparent non-change of the haplotype than to instances with
observable mutations. Return.

≤2 mutations

This is roughly equivalent to genetic distance (GD) ≤2. However, GD is
calculated in a manner in which a mutation is not always a step of GD.
Our page on this complicated
subject covers GD in more detail. Return.

k>n -- e.g., k=2, n=1

The factorial of (n-k)<0 -- e.g., (-1)! = ∞ -- is not defined, rendering
the binomial probability function without a solution. Therefore, we used the
following method to estimate the probabilities.
While a mutation can not occur twice on the same marker in one TE, it is
possible for more than one marker to mutate in one TE. We applied the
probability of multiple dependent events in these cases because a 2nd
mutation depends on there having been a 1st, a 3rd depends on a 2nd, etc.
Return.