Other pages & sections of our site:
[Home]  [Y-DNA]   [Contacts   [Groups]  [Haplogroups[Trees]  [Project Blog]  [Special]   [FAQ]
On this page:
 

Haplotype Rarity

We often get asked whether a man's Y-chromosome haplotype is rare. We often get asked, too, how it is that some men have very many matches.

Until recently there was no good way to answer such questions; we were reduced to vague generalities, based on subjective impressions. Now, we've developed methods for measuring the rarity or commonness of a Y-chromosome haplotype, explained here.

Study

We conducted an extensive study of this aspect of Y-DNA in the Spring of 2015. It and its findings are described at this link.

The Concept

Kelly Wheaton mentioned an idea by Robert Brooks Casey (see link) that a haplotype could be assessed in terms of how many men in their haplogroup shared their marker values. We've taken the basic idea and developed it further.

For each Y-STR marker there exists a most frequent value, that is, a number of alleles which more men of a haplogroup have than any other value. This is called the modal value and it varies from one haplogroup to another. Some markers are highly concentration around particular values and for others the concentration is less. Taking two markers as examples:

 
Haplogroup R1b
DYS393   CDYa
Value Re:
Mode
Freq Value Re:
Mode
Freq
≤11 ≤-2 ~0%   ≤31 ≤-5 ~0%  
12 -1 4%   32 -4 <0.05%
13 0 91%   33 -3 1%  
14 +1 5%   34 -2 4%  
15 +2 <0.05% 35 -1 14%  
≥16 +3 ~0%   36 0 30%  
Other Var ~0%   37 +1 29%  
  38 +2 16%  
39 +3 4%  
40 +4 1%  
≥41 ≥+5 ~0%  
Other Var ~0%
*

The concentration is greater for DYS393 than for CDYa. Only 9% of R1b men have a value of other than the modal 13 for DYSD393; 70% have a value of other than the modal 36 for CDYa.

Actually, our method determines how many men do not share your marker values. The higher your score, the less common (more rare) is your haplotype. The lower your score, the more closely it resembles the modal haplotype for your haplogroup.

"Casey" Scores

Mr. Casey did not explain his method fully, but the idea -- as we've adapted it -- is this:

  1. First, calculate how uncommon is the value for a particular marker.
  2. Then sum the scores for all markers considered. This gives a composite score for the haplotype.
Casey Scores FTDNA Marker Panels
R1b (1)   I (2)
1-12 1-25 1-37 1-67 1-12 1-25 1-37 1-67
Minimum  (3) 254 592 1,033 1,414 462 948 1,408 2,108
Maximum  (4)   1,200 2,500   3,700   6,700   1,200  2,500  3,700  6,700
Average (5) 387 872 1,458 2,866 ? ? ? ?
99th Percentile (6) 910 1,960 3,010 6,600 ? ? ? ?
Notes
  1. Includes all subclades, but heavily influenced by R-M269 (R1b1a2); appears strongly weighted toward R-P312 (R1b1a2a1a2).
  2. Includes I*, I1, I2 and all subclades
  3. "Spread" of values varies by haplogroup, so minimum possible scores will vary. The Western Atlantic Modal Haplotype (WAMH) scores 254 for its 12 defined markers.
  4. Maximum is the same for all haplogroups. At the maximum value, none of the haplotype's marker values are shared by any others.
  5. Average observed in Taylor Family Genes project
  6. Average plus 2 standard deviations.

A total score equal to the minimum indicates that your haplotype exactly matches the haplogroup modal pattern; a score near the minimum indicates that your haplotype diverges slightly from the haplogroup modal. A high score indicates that your haplotype diverges more from the haplogroup modal.

Wheaton Interpretation

Kelly Wheaton implemented Casey's idea by subtracting the frequency for the particular value from the modal frequency.

In this system, the minimum score for each marker is zero when it exactly matches the modal pattern and the maximum score is the same as for the minimum in the above method. As before, the haplotype score is the sum of the marker scores.

We think the math is more complicated, but leave it to her website, https://sites.google.com/site/wheatonsurname/beginners-guide-to-genetic-genealogy/lesson-14-more-with-the-y, to explain further.

Taylor Index

The Casey &/or Wheaton scores are a bit hard to interpret and are particular to the haplogroup. (Distributions for each marker vary by haplogroup.) So we developed an index which does much the same thing and has the advantage of being roughly comparable across haplogroups. We call it the Taylor Index and it works like this:

  1. Calculate the Casey scores for each marker as above but do not sum them.
  2. Divide each marker's score by the minimum Casey for the marker. For example, on DYS393 the minimum Casey score is 9, so the Taylor Index is
  3. Repeat for each marker. (Unless your haplotype is very rare, most indexes will be 1.0.)
  4. Average the index scores across all markers considered.

A Taylor Index score of 1.00 indicates that your haplotype matches the haplogroup modal pattern; 

Taylor Index
(R1b)
FTDNA Marker Panels
R1b   I (incl. I*, I1 & I2)
1-12 1-25 1-37 1-67 1-12 1-25 1-37 1-67
Minimum 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
Maximum  12.8  11.8  9.7  16.8 10.9 7.6 7.5 13.3
Average 1.6 1.6 1.6 6.0 ? ? ? ?
99th Percentile 5.9 5.9 5.2 8.4 ? ? ? ?

To assess your haplotype's Casey score and Taylor Index, click the button below to get the Excel spreadsheet tool.

Implications

So what does all this mathematical folderol mean for you?