Other pages & sections of our site:
[Home]  [Y-DNA]   [Contacts   [Groups]  [Haplogroups[Trees]  [Project Blog]  [Special]   [FAQ]
On this page:

Measuring Haplotype Rarity


Measurement Problems

Most men’s haplotypes consist primarily of modal values for their haplogroup. Only a few of their STR markers[9] will display unusual values. Measurement of commonness vs. rarity depends on highlighting the less-common values.


Among the problems in measuring commonness vs. rarity is haplotype diversity.

There is also diversity among the STR markers tested; they vary in how tightly observed values concentrate around their modes[12]. Some markers have “tight distributions” with small variances and some have “loose distributions” with relatively large variances[13]. For example, in haplogroup R1b[14].

Distribution YCAIIa
Figure 1: Frequency distribution YCAIIb
freq. distribution dys449
Figure 2: Frequency distribution DYS449
freq. dist. dys390
Figure 3: Frequency distribution DYS390

A third measurement problem concerns haplogroup and subclade diversity. Frequency distributions (and modal values[15]) vary from one haplogroup or subclade to another. A measurement scale for R-U106 is not necessarily appropriate for R-P312.


And, we have a definitions problem: How alike must haplotypes be to qualify as “similar”? Exactly matching? One marker differing? Two? Without a definition of similarity, we can not compare matches to commonness or rarity. When required for this article, we take the “close match” genetic-step reporting windows of FTDNA as our definition of similarity. That is


There is, too, a dimensional problem. Each marker is free to independently vary (up or down) and thus represents a separate dimension. A haplotype may be viewed as having as many dimensions as markers tested. Attempting to reduce this space to a one-dimension scale ignores this complexity. A thing’s shadow is not the thing.

Therefore, regard this study as abstracting just one aspect of Y-chromosomal DNA - the extent to which haplotypes resemble others (commonness) or are distinctive (rare).


Biases surely exist -- both in the reference data we use as a standard and in the sample data we compare to the reference data. Sources of bias include, but are not limited to,

We try, in this study, to balance biases by broadening the comparison sample but can not fully correct for inherent biases. We can merely recognize that conclusions drawn are tentative and subject to revision.


Finally but not least, there are issues of perspective and interest. A project administrator can fully examine only the results of his or her own project[16] and can not readily (nor care to) analyze other projects’ results.

To perceive large patterns, we can not retain blinders but must take a broad view.

Return to main page.