The theory is the probability of multiple independent events (PMIE), combined
with the probability of multiple dependent events (PMDE).
Each time Y-DNA is passed from a father to a son, each marker has an opportunity to
change value independently from what's happening with any other marker;
these are multiple independent events.
Whether a marker changes or not is a random event described by the rules of
However, each transmission event is affected by the outcomes of prior
events. These are dependent events described by the rules of PMDE.
The hardest part about the theory is picking the right end of the problem
to begin with. What are the fundamental questions we need to ask.
What are the chances of no mutation in a marker?
What are the chances of no mutations in any of many
What is the probability that the observed match between two sets of
Y-DNA are not due to chance?
What is the probability of a common male ancestor within a certain time
There are such things as
"coincidental matches", which tend to occur most often among the
more common haplotypes. The patterns of marker/allele values may be similar,
but they don't spring from a common source. The similarity is more likely
"convergent evolution", in which organisms of different ancestral
heritage evolve to like forms.
For example, you might see that you have hundreds of reported close matches
at 37 markers. Not only are these too many for follow up, you are probably
unrelated in genealogic time to most of them.
In such instances, SNP testing is recommended to eliminate matches of
obviously different DNA inheritance. SNP testing in some depth can tell you who
you are less related to.
The passing of Y-DNA from a father to a son,
subject to uncertainty.
The set of all possible outcomes of one or more transmission
Whether or not a marker changes.
Whenever we look at Y-DNA that differs, we need to compare it to the
probability of it not changing. The probability of an
something not happening is the complement of its
happening, e.g., 1 - "happening". Statisticians use "p(x)" to
indicate the probability of an event and "q(x)" for the non-event's
The sum of the probabilities of all possible events is always 1 (or 100%).
The basic formulas are:
In the project, we observe that 67/67 matches are rare, even
in matches where both donors have excellent documentation of the CMA. 37/37
a little less so; 25/25 less rare still. And 12/12 matches are common.
Intuitively, this suggests that the chances of Y-DNA remaining completely unchanged across many markers through multiple transmission events are small.
Some tangible examples whose probabilities mirror those we want to consider.
Buckets of balls
Picture 12 to 67 buckets, each containing 250 to 400 colored balls;
in each bucket, all the balls -- except a red one -- are the same color,
blue. We'll go through the buckets taking one ball out of each bucket
without looking. If it's not a red ball, we'll replace it; if it is red, we
replace the bucket with the same number of balls, but change the colors to
one green and the rest red.
Then we repeat the trial, again and again. Every time we draw an "odd ball", we
the bucket with balls of different colors -- yellow, purple, black, white,
tangerine, striped, etc.
What this illustrates is that:
The probability of each picking from a bucket (mutation) is independent of the others;
But each trial (set of buckets) depends on the outcomes of the prior trials.
The math is the same as that used to predict the odds of rolling dice,
where you can tell each die from the other.
Years ago, I needed to generate random numbers for sampling purposes.
To get the most random sets of numbers possible, I used a set of 10-sided dice whose sides were numbered 0 to 9.
But these are some strange dice;
some have 250 sides and some have as many as 400 sides. For each die, one side says
"change" and all the other sides say "same".
We'll roll 12 to 67 dice at a time, read the top side & record the results.
Then we'll roll again and again.
Bayesian (conditional) processes can be used to narrow your search.
The Bayes Theorem holds that additional information can modify the expected
Bayes Rule: P
The Bayes process here is used to eliminate probabilities known to be
impossible by virtue of documentation. The probability of impossible things is
zero, p = 0. For example, assume that you know this much about the CMA for donors A & B:
He can not be less than 6 TE back in donor A's line, and
He can not be less than 7 TE back in donor B's line.
This additional genealogical information allows you to reassign the probabilities for <=12 TE
to TE >=13. One may think of this as eliminating TE 1 to 12 from the cumulative
probability graph and counting as though 1 was 13, 2 was 14, etc.