[an error occurred while processing this directive]
Y-DNA Match Determination
We keep encountering new and different ways of determining whether a match
exists between two sets of Y-DNA results (i.e., two men); some strike us as
bizarre. This page will attempt to catalogue and evaluate the methods we've
found as to their effectiveness and efficiency.
What is a "Match"?
Perhaps, the best first step is to define what we mean by the term "match".
- A similarity between two or more sets of DNA results which indicate, to
a high degree of probability, a shared ancestor within a period of time when documentary
research may be able to identify him or her by name, dates, places and/or other characteristics.
Note that this definition is not specific to Y-DNA STR testing, though that will be our focus
here. It allows us to re-frame the question into "Does the similarity of the sets of results allow us
to say with confidence that these men share a common male ancestor within
The best method for determining a Y-DNA match proceeds directly from the
definition and uses what is known about mutation frequencies of STR
markers. It relies on direct comparison of the two haplotypes as measured by
allele values of markers tested in common.
- Define "high probability" quantitatively. Is it 80%? 90%? Or some other
- Determine a time period amenable to research and estimate its length in
- Using marker-mutation frequencies, calculate the TMRCA for the match and
see if it fits the probability and time windows. One may use either of these calculation models:
- Infinite alleles -- assumes that markers are free to change by any
amount in either direction. Therefore, a particular marker either agrees
between two men or it doesn't.
- Stepwise -- assumes that the amount of any difference is also
significant. A difference of one is one mutation, a difference of two is
two mutations, etc.
- Combination of the two -- most markers stepwise, some infinite alleles
The calculations utilize the binomial probability theorem, a complicated formula. Alternatives
to manually calculating the probabilities include:
- Any of the TMRCA calculators on the Web, Turner, McDonald, etc.
- The FTDNA TiP feature, proprietary software of Family Tree DNA. Using specific marker
freqencies with a combination of stepwise and infinite alleles models, it is the most precise and
powerful of the TMRCA calculators.
Pre-set rules The method above can be used to establish
standards for qualifying.
For example, mismatches of two across 37 markers qualifies, but three does not.
- Simplicity -- compare the genetic distance or mismatches to a table
of qualifying matches
- Calculations need be made only in order to derive the table.
- Approximation -- it, essentially, uses the infinite alleles model.
- May result in false positives and false negatives
Comparison of haplotype to haplogroup modals
- Another layer of complexity
- Computatin of modal values for haplogroups may be in error.
Percentage of matching or mismatching