On this page:
 

Measuring Haplotype Rarity

Conclusions

Measuring the commonness or rarity of Y-STR haplotypes is possible but can be difficult and has no ready answers. There may be no perfect measurement tool but merely close approximations.

Measurement, though, is essential to understand the situations with which a project is dealing. Where do haplotypes fit on a spectrum of common to rare? How likely is it that reported close matches may be due to coincidence? For example, it leads to these recommendations:

We posit that multi-origin surnames (those with many founders) display a wide variety of haplotypes and that they will range from very common to exceedingly rare. This hypothesis is borne out by the expanded data set but needs further testing. We invite other projects with wide diversity in Y-DNA (such as for common and multi-origin surnames) to conduct their own assessments.

Plain-language Scale

A five-point scale (very common, common, average, uncommon, rare) appears a reasonable way to interpret the scores resulting from any such measurement into an easily-digested, common-sense meaning. However, the scale’s categories must be based on a sufficiently diverse sample to have general application.

The scale for measuring commonness vs. rarity must be adjusted to the particular marker sets. Scores on any of these systems will be dependent on the markers used and will be higher with more markers than fewer.

Wheaton

The specific method proposed by Kelly Wheaton (as conceptualized by Robert B. Casey) is the mathematically simplest of the methods discussed here and appears appropriate to haplogroup R1b -- provided that interpretations of the scores into the five-point scale are modified as in Table 3 above. Though involving another calculation step, we believe that the “Wheaton Average per Marker” method is better; it yields numbers more comparable across marker sets.

Ratio Index

The ratio index method is simple of interpretation but more mathematically challenging. The author is working to develop a Web-based tool to handle the math; in the meantime, an Excel tool is available on request.  

Deviation Index

The Deviation Index is the most difficult of the three, both to obtain and interpret. It performs poorly at discriminating between average and common haplotypes. Its sole advantage is to highlight uncommon and rare haplotypes.

Comparability

For each of the systems, cross-comparability across marker sets is reasonable considering that the mix of “fast” (volatile) and slow markers varies across the sets.

Correlation with numbers of close matches

Though the (Taylor project only) data yielded significant correlations, the patterns are not entirely clear. As shown in the scatter plots, there is much "noise". Ultimately, we are unable to satisfactorily answer questions of how much haplotype commonness vs. rarity contributes to numbers of haplotype matches.

A reliable answer will require a larger, more representative data set, coupled with robust analysis.