Measuring Haplotype Rarity

Deviation Index

We also considered an index based on standard deviations[33] from the mode. This metric was developed specifically to highlight uncommon and rare haplotypes.

It is calculated by:

  1. Determining a standard deviation for the frequency distribution of each marker; this will be less than 1, ranging from near zero to ~0.17. (To avoid division by zero errors, take the maximum of the standard deviation or 0.001, 10-3.)
  2. Determining the absolute difference between the marker modal value and the subject value, resulting in numbers ≥0.
  3. Dividing difference (B) by standard deviation (A), again ≥0.
  4. Summing across all markers in the marker set.
  5. Dividing sums by nominal marker set size.

The rationale for the method is

Figure 8: Deviation Index

Figure 8 displays the distribution graphs. See Appendix D for details. Note the additional peaks, at >=16, for the 37- and 67-marker scores; these were due to this method’s tendency to emphasize less-common values. 19% of 12-marker scores, 2% of 25-marker scores, 27% of 37-marker scores and 28% of 67 marker scores are greater than 16.

Resulting scores, however, do not directly relate to probabilities. They are simply indicators of how common or rare the haplotype. Lower scores are more common; higher scores more uncommon. If scores seem abnormally high, it is because each allele difference from a modal value represents many standard deviations.

Summary statistics of the Deviation Index:

Table 8
Deviation Index Summary Stistics
Statistic 12 mkr. 25 mkr. 37 mkr. 67 mkr.
Average )mean) 10.31 7.27 16.57 9.52
Standard Deviation 9.41 6.48 19.26 7.53
Minimum 0.00 0.00 1.00 1.05
Maximum 371.50 292.56 372.09 50.93
Median 9.19 6.50 9.62 6.07
Mode # N/A # N/A # N/A # N/A
n= 4,940 4,203 3,824 1953

Comment: The deviation method highlights uncommon and rare haplotypes. It performs poorly at distinguishing common from average-rarity haplotypes.

Interpretation of the Deviation Index:

Table 9
Deviation Index Interpretation
Category 12 markers 25 markers 37 markers 67 markers
Very common =0  0-2 0-4 0-2
Common >0, <4 2-4 4-6 2-4
Average 4-14 4-10 6-16 4-10
Uncommon 14-22 10-20 16-40 10-22
Rare >22 >20 >40 >22


  • This method most clearly identifies uncommon to rare haplotypes.
  • Uncommon and rare marker values are highlighted; they contribute more heavily to the final index score than in the other systems considered here.
  • Very common haplotypes score at or near zero.
  • Scores may be rounded to the nearest integer without sacrificing precision.
  • Theoretically, the method is more accurate.
  • Disadvantages:

  • Arcane, unlikely to be easily understood
  • Complex mathematics;
  • Possible lack of comparability across haplogroups
  • Stretches mathematical concepts beyond intended applicability.
  • Sensitive to inadequacies or errors in the reference distributions used [36].
  • Disadvantages appear to outweigh the advantages.