Measuring Haplotype Rarity
Taylor Ratio Index
We considered a modification of Casey’s concept, to attain greater comparability
between marker sets and, hopefully, between haplogroups. -- using a ratio of an
individual’s score to the modal score. The most common possible haplotypes (e.g.,
WAMH) will score
exactly one (1) and less common haplotypes will have higher scores.
However, a haplotype score of 5.0 is not five times as rare as one scoring 1.0.
This metric is calculated by
- Determining, for each marker, the percentage of men who do not have the particular value.
- Determining the percentage of men who do have the modal value.
- Dividing A by B to obtain a ratio.
- Averaging scores across marker sets.
Figure 7: Taylor Ratio Index Scores
Figure 7 displays the distributions for the ratio indices. See
Appendix C for distributions.
Distributions for -- respectively, 25, 37 & 67 markers -- are highly similar; their
frequency peaks all occur at the same index value, showing the intent of this system.
(The 12-marker distribution is irregular and bi-modal.) This metric is useful for distinguishing average, uncommon and rare haplotypes at all
However, also note steep slopes on left tails and peaks at low indices (<1.35); these
aspects suggest the ratio index is imprecise in distinguishing very common from common
haplotypes, especially at lower resolutions (<37 markers).
Summary statistics of the Ratio Index:
Ratio Index Summary Statisitcs<
Interpretation of the Ratio Index:
Ratio Index Interpretation
Advantages of this “Taylor ratio index” are
- Simplicity of interpretation; an index of 1.0 indicates the most common
possible haplotype; higher scores indicate increasing degrees of rarity.
- Indices are unit-free as a result of dividing one frequency by another frequency.
- Indices are comparable across marker sets.
- Indices may be more comparable across haplogroups.
- The ratio index doesn’t sufficiently discriminate degrees of commonness, especially for the 12-marker set; too many scores are <1.1.
- The mathematics, requiring divisions then averaging, are more difficult.
Attempting to more clearly differentiate scores, we tried a modification – squaring the ratios and then taking the square roots of the sums.
We also experimented with higher powers. These proved not worth the complications; only minor differentiation was seen.