Background to
Measuring Haplotype Rarity

The author first became interested in this subject as a newly-minted DNA project administrator. Project participants were asking “Is something unusual in my DNA?” Guidance for answering was sparse but the questions persisted.

He also noticed that a sizable fraction of men had hundreds of “close matches” at high resolution levels and another fraction had none. Over time, he came to view “rare” haplotypes and “common” haplotypes, not as unique phenomena, but possibly as points on a spectrum.

Investigating on the Internet[17], he came across a discussion by Kelly Wheaton[18], citing concepts of Robert Brooks Casey[19]. Casey appears to have been the first to have proposed quantifying haplotype rarity by marker/allele values. Casey briefly mentioned his method in several pages of his (former) website but did not fully explain it. The essence appears to have been that:

  1. A rare haplotype is one containing rare values in its markers; the rarer the values and/or the more markers with rare values, the rarer the haplotype.[20]
  2. Rare values are defined by the percentage of men who do not have those values. For example, a value held by only 2% yields a “Casey score” of 98 for that marker.
  3. The marker scores are then summed across the total number of markers.

This results in “minimum scores” for each set of markers, representing the most common possible haplotype -- i.e., modal values at every marker -- for haplogroup R1b[21]:

Casey appears to have based his work on that of Leo Little, who compiled tables of frequency distributions for marker values and published them at http://freepages.genealogy.rootsweb.ancestry.com/~geneticgenealogy/yfreq.htm.[22]

Further investigation demonstrated the this method, as interpreted by Kelly Wheaton provided a legitimate means of queantifying the phenomenon and led to others means.

