[an error occurred while processing this directive]
On this page:

Measuring Haplotype Rarity

The data presented in this section comes entirely from the Taylor project. It was not available for the other seven projects and is cumbersome to collect.

Do the measurements correlate with number of matches?

We would expect low-scoring haplotypes to have more matches in the FTDNA database than high-scoring ones. Is the expectation borne out?

We were able to analyze this question only for Taylor project participants; we did not have access to the needed data for the seven other projects.[40]

Looking further, we analyzed match data for the 325 Taylor R1b project participants who’d tested 37 or more markers. (This data has taken years to collect and record.) We are wary of offering this because we suspect it is biased, particularly by those who’ve joined the project for maternal/indirect relationships to the project surname, rather than direct paternity.

For each participant, the following items were recorded:

  1. Number of intra-project (with other project participants) matches;
  2. Number of extra-project (with non-participants) matches.
  3. Whether participant had no matches found in the FTDNA database.

The sum of A & B above represents the total number of matches in the FTDNA database.

The data is summarized in Table 13.

Table 13: Taylor Project
Number of Matches per Participant at 37 Markers
Category  Count Average Maximum
Total Intra-
Very Common 24 4.62 8.10 12.71 7 39 46
Common 80 4.75 39.13 43.38 16 390 398
Average 124 1.72 17.12 18.84 8 232 234
Uncommon 76 1.35 14.52 15.87 1 128 128
Rare 21 1.19 10.71 11.90 0 56 56
All categories 325 1.94 21.05 22.91 16 390 398
Correlation -0.943 -.0310 -0.437 -0.710 -0.248 -0.269
Significance <.0001 <.0001 <.0001 <.0001 <.0001 <.0001


Correlations between match numbers and rarity categories are significant to the p<.0001 level.

Distribution Graphs

Figure 19: Average Matches vs. Rarity

Figure 20: Maximum Matches vs. Rarity

For better visualization in Figures 19 and 20, intra-project matches are graphed on the left vertical axis; extra-project and total matches on the right vertical axis.

In Figure 19 the curves differ for average number of intra-project and extra-project matches per participant. Extra-project and total matches decline consistently as rarity increases. Intra-project matches increase until haplotypes reach average rarity, then decline. In Figure 20, the three curves follow similar patterns.

Scatter diagrams

Scatter plot diagrams for 37- & 67-marker scores[41], also illustrate:

Wheaton Ratio Deviation
scatter plot of matches vs. Wwheaton scores
Figure 21: Wheaton score vs. number matches

Figure 22: Ratio Index vs. number matches

Figure 23: Deviation Index vs. number matches
37-marker data points are represented by magenta squares, ■; 67-marker points by blue diamonds, . The linear trend lines are orange.

If the hypothesis were that rarer haplotypes have fewer matches and more common haplotypes have more matches, one would expect to see a pattern in which matches increased as scores decreased, i.e., a downward slope from left to right.

Such a pattern is not evident in the plots; we see, instead, linear trend lines with positive slopes. The data appear to refute the hypothesis, though not to statistical significance.

However, those with high numbers of matches (>100) do fit under certain score limits: 800 for Wheaton, 2.0 for Ratio and 100 for Deviation.

No Matches

More illuminating, perhaps, may be the percentages of men whose haplotypes have a complete absence of matches reported in the FTDNA database[44], as shown in Table 14 and its accompanying graphic, Figure 21.

Table 14:
No Matches
Category Count W/ No
1. Very Common 24 2 8.3%
2. Common 80 7 8.8%
3. Average 124 8 6.5%
4. Uncommon 76 12 15.8%
5. Rare 21 4 19.0%
All Categories 325 33 10.2%

Figure 21: Participants with no matches

The data show a correlation coefficient = 0.828 between rarity category (given a numerical rating) and absence of matches. This correlation accounts for 68.5% of variation in the data and is significant to p<0.0001.

A man with a rare haplotype is about twice as likely to have no matches at  resolutions >12 markers as one with a very common haplotype.

Summary of measurement correlation with number of matches:

We’ve looked at scatter plots, intra-project and extra-project matches, as well as no matches, but were limited to just those for whom we had match data. These data provide tantalizing hints, but do not satisfactorily answer the question posed.

The hints include:

A clearer signal might be gained with a larger and broader sample. We suspect inherent noise in the data obscures a clear message.