Measuring Haplotype Rarity
In Kelly Wheaton’s interpretation of Casey’s method, the percentage of men who
do hold the modal value is subtracted from the percentage who do not hold the individual’s particular value.
For example, if the “Casey score” is 98
(i.e., only 2% have that value) and 33% have the modal value, the “Wheaton" score” is 98-33 = 65. Again, these marker scores are summed across all markers.
Calculate this metric by
- Determining, for each marker, the percentage of men who do not have the particular value.
- This is unity (1 or 100%) minus the percentage who do have
- Determining, for each marker, the percentage of men who do not have the modal value.
- Subtracting B from A to get a “net” score.
- If a decimal fraction, multiply by 100 to yield net >1.
- If B=A, net = 0.
- Summing scores for the set of tested markers.
Figure 4 shows the distributions of the resulting scores for four levels of resolution.
See Appendix B for details.
Distributions “march” from left to right as more markers
contribute to the scores, peaks get lower and spreads get wider; these trends also reflect
increased haplotype variety with more genetic information. (The apparent bumps near the
right tail of the 67-marker curve are due to collapsing of categories at the
Wheaton Score Summary Statistics
Advantages of Wheaton’s method include
- The determining factors of haplotype commonness vs. rarity score are not the values themselves, but the frequencies of the particular allele values in a haplotype (in relation to frequencies of modal values).
- Summing component marker scores yields a composite measure of haplotype
commonness vs. rarity.
- The metric is relatively insensitive to minor problems in the distribution frequency data used.
- Less adjustment is needed (than for raw frequency -- or "Casey" -- scores) to interpret scores for 12, 25, 37, or 67 markers. The most common possible haplotype always scores zero.
However, there are disadvantages:
- Scores can not be interpreted without an understanding of relative
scores, how a particular score compares to others.
- Total scores are cumulative and thus depend on the marker set
- Haplotypes of less than 67 markers were not considered.
- Scores are not comparable across haplogroups, because each haplogroup has a different frequency distribution of marker values.
Wheaton partially overcame the first disadvantage following an e-mail message from
Casey  (for R1b, 67 markers) with this categorization:
- 100 or less: very common marker values
- 100 to 300: common marker values,
- 300 to 500: average marker values
- 500 to 700: uncommon marker values and
- 700 and up: rare marker values”
As discussed in the “Evaluation” section below, we found these interpretations to be in error.
The interpretations were not applicable to the larger (and presumably more
Lacking Casey’s or Wheaton’s data, we presume they found this system of categories appropriate
within their projects. However, assessment using all R1b STR results in the data set showed,
more generally, a somewhat different picture.
Using the methods outlined above we calculated “Wheaton scores” for all 1,953 STR results of the eight projects in the R1b haplogroup. We then compared the theoretical distribution for the actual 67-marker score distribution, as shown in Figure
The actual distribution of scores did not match the Casey/Wheaton model. Few of the eight projects’
1,953 participants had so-called “common” haplotypes and
more than a third had “rare” haplotypes, an irrational result.
What was proposed as “uncommon” (500-700) for 67 markers includes
-- when examining more haplotypes of greater diversity -- the median score
The most important statistics in Table 2 above are the median scores; one-half of scores are higher and one-half lower. Thus, the
median denotes the mid-point on any scale! What they proposed as “uncommon” (500-700) for 67 markers includes the median score (637) when examining more haplotypes of greater diversity.
A median score should be classified as of
commonness or rarity.
We can use the median and differences from it to assign plain-language interpretations to the scores. An “Average” category must include the median and a percentage of scores (±~25%) immediately above and below it. Categories both more and less rare will include another ~20% each and the categories for most and least rare will contain ~5% each. For a fuller discussion, see
This leads us to propose the five-category interpretation in
Table 3, based on the observed data. It differs significantly from that proposed by Casey, and interpreted by Wheaton
Revised Wheaton Score Interpretation
The problem of cross-comparability across marker sets led us to consider an “average
per marker” method. This metric, as displayed in Figure 6, was derived
by dividing the haplotype Wheaton score by the minimum of
- Number of markers actually tested and scored and
- Nominal size of the marker set (12, 25, 37, or 67).
See Appendix B2 for details of the
distributions. The effect is to bring the distribution curves into similar ranges on the
the "marching" is reduced. Note that peaks are occurring at approximately the same
X value (8-12) for each marker set. Also note that peaks are higher and
variances ("spreads") reduced with increased marker set size.
Summary statistics are
Wheaton Average per Marker
Again, the median defines the mid-point of the scale and categories are
determined from it. We can roughly interpret thus:
WApM Score Interpretation
Despite the additional calculation step, this is an easier measurement to interpret.
Differences between marker sets are less and they relate to differences in frequency
distributions of the markers making up the sets.