Y-DNA and Surname Association
by Ralph Taylor, project administrator for Taylor Family Genes
with assistance of co-administrators,
Lalia Wilson and George West
Revised to: 15 October 2011
This paper examines the association between surnames and Y-chromosome DNA STR matches for
participants in the Taylor Family Genes project Its goal was to test the degree of truth to a
commonly-expressed belief that these phenomena are strongly associated with each other.1
An initial
phase, of matches by participant, established that a positive association exists, but is weak. A
second phase tested the association more rigorously, by type and quality of match and found,
generally, a weak association but stronger for participants with small numbers of matches.
The data show that Y-chromosome DNA is less strongly associated with specific surnames than is
oftnt0en believed. Surname differences may account for less than 25% of variance in Y-DNA haplotype
patterns for those with more-common surnames.
The paper also presents concepts and methodologies for other investigations of the
association .
Turi E. King’s and Mark A. Jobling’s article, “Founders, Driftnt0, and Infidelity: The Relationship
between Y Chromosome Diversity and Patrilineal Surnames”2
is a ground-breaking development on this
subject. They studied the relationship of surnames to Y-haplotype similarity in Britain and found an
inverse correlation between surname frequency and haplotype similarities, a strong relationship for
rarer surnames and a weaker relationship for more common surnames. However, they did not find a
similar correlation in reviewing the 2006 McEvoy and Bradley study of Irish surnames and Y-DNA.
3
The current study differs from King’s and Jobling’s in that it focuses primarily on one, high-frequency
surname.
Also, in a 1999 article for the Florida State University Law Review4
, Chris W. Altenbernd reviewed
legal ambiguities and pointed to increased genetic testing as indicating a need for statutory reform.
He proposed new terminology to replace pejoratives – “marital child” for the biological father being
the mother’s husband, “non-marital child” for unmarried mothers and “quasi-marital child” for children
with a biological father other than the mother’s lawful husband at the child’s birth. He contends that
”The term ‘paternity’ should only refer to biological fatherhood.”
That surname and Y-DNA do not have a generally strong association for a common surname is a
statement which may provoke incredulity within the genetic genealogy community. Such statements
require proof. However, data from one DNA surname project shows only weak association between
surnames of participants and DNA matches found. There is, in fact, more disassociation than
association at most match qualities.
Analyses of data from Taylor Family Genes (a DNA surname project of Family Tree DNA) were
conducted relative to Y-DNA matches and both project membership and members’ surnames. These are
described under Methods.
The data show that disassociation between surname and DNA goes beyond the allowances usually
made for non-paternity events for matches of lesser quality than 37 markers with a genetic
distance of less one or zero and 25 markers with genetic distance (GD) of 0. For all 25-marker
matches and for 37-markers matches with GD >2, disassociation was stronger than could be
accounted for by NPE At 37 markers, GD<2, and 25 markers, GD=0, the data became inconclusive as to association.
The study is limited by two factors
- A sample size of one DNA surname project is statistically inadequate; it does not
demonstrate variation. It is, however, the only data source available to the author in
sufficient detail to permit the required analyses. Other project administrators are invited
to conduct independent analyses to see if these findings hold up.
- The data source may not be representative of the surname studied. The project participants
are highly concentrated in North America and other unknown selection may be at work.
A father passing down his surname to his children and his sons passing down the same surname to
their children, etc. is now a staple practice in Europe, the New World and other places.5
This practice, though widely believed so, is neither universal nor ancient.6
It dates, in England, to the 11th century for some families and to no earlier than the mid-14th
century for most commoners. Some areas of the world do not use surnames, in some family names
precede given names, and in a few matriarchal family names are the practice.7
A characteristic of surnames is variety and diversity. Even the highest-frequency name in the
English-speaking world, Smith, was carried by just over 1% of the US population in 1990; the 50
most-frequent names, together accounted for less than 14% of the population. Further, the trend
appears to be toward increasing diversity. Smith declined to 0.88% in 2000 and the most common 50
declined to 12.4% cumulative.6
7

Figure 1: Frequency of 50 most common surnames
King and Jobling8
found “a remarkably strong relationship between these patrilinearly inherited cultural markers
and Y-chromosomal haplotypes.” and “a clear genetic signal of coancestry can be observed.” They
attributed this largely to multiple founders for common names, versus single or few founders for
rare names. Taylor – 4th in surname frequency in England, 13th in USA – would fall between the
names included in their study of Smith (the most common) and King (37th in England).
"Common" names vs. "Rare"
One might ask "What is a 'common' surname, as compared to a 'rare' one?"
One measure could be the median frequency; at the median, half the
population bears names that are more frequent and half have less frequent
names. This
mid-point frequency -- in the United States -- is ~7,000 per million
(0.007%). The two US names which comes closest to this median point are Varner
(49.996% have more frequent names) and Spangler (50.003% have more frequent
names).
Taylor is not the most common surname in English-speaking countries; Smith is more than twice
as popular. But Taylor has a high frequency -- among the top few; it is carried by more than one
of every 200 English subjects and almost one of every 300 US citizens (331 per 100,000). It ranked
4th in frequency in England in the 1998 Electoral Register and 13th in the US 2000 census.10
It has
an occupational origin, coming from the French tailleur for a cutter of cloth. In written records,
it is seen as a sobriquet as early as the late 12th century and as an inherited family name from
the late 14th century. It has several spelling variants, but most are rare; Tyler is the most
common variant (415th in the census at 0.027%).
The number of founders of the Taylor surname is unknown, but is estimated (by a variety of
techniques) to range from a few hundred to, perhaps, as many as 2,500.11
Some of those paternal
Taylor lines would undoubtedly have been extinguished over the centuries and no longer exist
today.
A biological father transmits his Y-chromosome to his son (only males have Y-chromosomes)
almost without change. Short-tandem repeat alleles on Y-chromosome loci (markers) change
infrequently, an average frequency of once in every 250 to 400 transmission events.12
The
organizing principle of DNA surname projects is that common paternal ancestors may be revealed by
means of high-quality Y-DNA matches.
In the course of administering the Taylor Family Genes project, the author observed that, while
many of the participants had matches, large numbers of these were with neither Taylor-named
individuals13
nor non-Taylors who had joined the project15.
He formed a general impression that the
numbers were too great to be accounted for by recent generations’ non-paternity events. Nor, were
they accounted for by a multiple founders theory.
In short-tandem repeats (STR) testing, the number of STR motif repetitions of are counted for a
number of loci (markers) on the Y-chromosome; the allele value for each locus is its STR count. To
determine whether a match between two men exists, and its quality, the marker-allele values tested
in common are compared and their absolute differences summed (or, for some markers, the fact of a
difference is given a value of 1). The resulting sum is referred to as genetic distance (GD) and
indicates the dissimilarity between two haplotypes to the extent measured.
Another way to think of this is as strings of STR results being words and the marker/allele
values being the words’ letters; for example, “tailor” and “taylor” disagree in only one letter.
If we assign the value 9 to “I” and 10 to “y”, their distance is 1.
The number of markers it is possible to compare (e.g., letters in the word) and genetic
distance work together to determine match quality (similarity of the words). In general, the more
markers tested in common the more confident one can be about statements of a shared paternal
ancestor and the less the genetic distance the greater the likelihood a shared ancestor was more
recent. A pair of men for whom one can compare 37 markers and arrive at GD=2 are more likely to
share a common ancestor more recently than a pair with 25 markers, GD=2 or 37 markers, GD=4.
Non-paternity events (NPE):
It is recognized that non-paternity events (an event or series of events resulting in a child
not carrying the surname his or her biological father was born with) cause disassociation between
a son’s Y-haplotype and his surname. Abbreviated NPE, they include adoption, name change,
illegitimacy, etc.15
The rate per hundred births appears to vary, depending on culture and economics. King & Jobling
16
found it to be 1.00% to 4.54% for certain British surnames, with a median nearer the lower figure.
A phenotype study in the state of Nuevo Leon, Mexico found it between 9.8% and 13.8% (0.118 ± 0.020).
17
In Switzerland, another study18
put it at 0.3% to 1.3%. A Michigan, USA study 19
had it at 1.4% of white children and 10.1% among black children.
NPE are often undocumented, presenting difficult problems in genealogical research. Genealogical
effects of NPE are cumulative through generations, often estimated at 35-40% of participants for
many projects, and a possible reason for growth of genetic genealogy as paper trails turn cold.
This section describes the methods and procedures employed in the study.
Choice of measurements
Correlation is not possible
Correlation – a specific type of association between variables – can exist only for
scalar quantities.
Weight can correlate with height, but hair color (a qualitative variable) can only
associate with
eye color or other variables.
If there did exist one Y-DNA haplotype for the Taylor surname, another
for Smith and yet another for Anderson -- we still could not say that Y-DNA
"correlates" with surname. It is not statistically possible.
Measuring association is possible
Names, “cultural markers”, are categorical (nominal or qualitative) variables, rather than numeric
(ordinal). The name Adams is neither more nor less than Zaun; they are merely different
distinguishing labels, as apples and oranges. We can not measure a linear correlation between
names and other variables; we can, though, measure the broader concept – association
20 – and test
it statistically. For these measures and tests, we need non-parametric statistics.
Also, any individual European surname is much less frequent than other markers, such as eye
color. For example, about 10% of European-ancestry persons have the rarest eye color, green, and
less than 1% of English persons have the most common surname, Smith. This fact will have its
effect when it comes time to measure.
Note: Re-work using TiP scores
In the second phase, we developed quantified variables, amenable to
correlation -- a rank order of match types by quality and percent of matches in
agreement with surname. .
Measuring Y-chromosome DNA
Y-DNA haplotypes may be regarded either as nominal variables or as sets of marker/allele values
which can, in turn, be treated in either nominal or ordinal fashion. The analysis can be taken
beyond Adams ≠ Zaun; similarity and dissimilarity can be quantified. To quantify Y-DNA similarity,
we used the “match” concept and considered only limited degrees of similarity as qualifying; in
Phase 2, we ranked match types by quality.
June 2013 note: Subsequent information suggests that TiP (a FTDNA
mutation-adjusted TMRCA calculator) scores would be a better measure of
haplotype similarity than genetic distances.
Study phases
- In Phase 1 we measured association between the Taylor surname and participants’ Y-DNA
matches in the Taylor Family Genes project.
- In Phase 2 we tested association by fraction of matches’ names agreeing with the
participant’s surname with respect to the quality of Y-DNA matches.
The data was gathered from the Y-DNA matches (and absence of matches) of participants in
Taylor Family Genes – a DNA surname sponsored by Family Tree DNA (FTDNA). It is an “open
membership” project, meaning that no prior approval or proof is required to join; membership is
self-selected. However, the fact that monetary payment is required to purchase a test from FTDNA
may bias the membership toward those who have “brick-walled” on their documentary genealogical
research.
Total project membership at the time of data collection was 437, of whom 96% are USA residents.
382 had Y-DNA tests and 264 had available results for 37 or more STR markers. These 264 participants
formed the basis of the study.
The restriction of project membership to FTDNA clients enabled us to use FTDNA tools for
quickly finding matches, without respect to surname, throughout a large database of comparable
Y-DNA results.
King and Jobling observe that “sample ascertainment bias (in particular self-selection of men
who may be closely related and self-reporting of data) remains a serious and unquantified problem
that could affect interpretation.”21
In comment, self-reporting of results is absent here and
self-selection bias toward closely-related men would tend to strengthen associations beyond those
found. (Self-selection may tend to bias in the opposite direction.)
The study began by eliminating participants without Y-DNA results or fewer than 37 markers of
results from sampling. Fewer markers than 25 were considered unreliable for assessing match quality.
To summarize the population from which samples were drawn:
|
Taylor surname? |
≥ 37 mkrs |
Y-DNA Matches |
Yes |
No |
Any Match |
Taylor or In-
project |
non-Taylor, non-project |
Taylor only |
Non-Taylor only |
Both |
Neither |
Total |
229 |
34 |
263 |
245 |
148 |
188 |
57 |
97 |
91 |
19 |
Pct. |
87.1% |
12.9% |
100% |
92.8% |
56.1% |
71.0% |
21.7% |
36.7% |
34.3% |
7.3% |
Table 1: Population Description

Figure 2: Percent of matches
- Two separate classification schemes were used:
- In-project vs. out-of-project; and
- By surname matched
- Classifications:
- “Any Match” means 1+ matches of any type (Taylor, non-Taylor, in-project, out of-project)
- “In-project” means 1+ matches w/project member or Taylor-surnamed man
- “non-Taylor, non-project” (x-proj in graph) means 1+ matches with non-Taylor who is
not a project member.
- “Taylor only” means 1+ matches with Taylor-surnamed person, but none with any other surname.
- “non-Taylor only” means 1+ matches with persons not named Taylor but none with a Taylor.
- “Both” means 1+ matches with both Taylors and non-Taylors.
- § “Neither” or “None” means no matches of any classification
For the surname matched category, the most frequent value (mode) was “non-Taylor only”

Figure 3: Haplogroups within project
The data set was restricted to those with at least 37 markers in order to facilitate
comparisons in number of matches at successively higher quality levels; of a total of 436
participants, 54 were eliminated for having no Y-STR results, 96 for having only 12 markers and
another 23 for having only 25 markers. This gave a qualifying population of 263 participants at
the time of the study.
“Index person”: The study involved searching for matches one participant at a time and
counting the number of matches whose surnames agreed and disagreed with the participant’s name.
Each participant for whom a search was conducted was designated the index person for that search.
Name Variants: Spelling variations (e.g., Taylor, Tailor, Tayler, Taler, Talor) were treated
as equivalent. No instances were encountered of foreign-language words with the same meaning (e.g.,
Schneider). Similarly, spelling variants of other surnames were accepted as equivalent and no
foreign-language versions were encountered.
Match: Matches recorded were those reported by FTDNA while conducting member-by-member searches.
These were for 25 markers, GD<3; for 37 markers, GD<5; and, for 67 markers, GD<8.
- 25 markers – genetic distances respectively of two (2), one (1) and zero (0);
- 37 markers – genetic distances respectively of four (4), three (3), two (2), one (1) and zero (0).
- 67 markers – genetic distances respectively of seven (7), six (6), five (5), four (4),
three (3), two (2), one (1) and zero (0). Only 150 participants had tested 67 markers at the
time of study.
- For Phase 2, these match types were ranked by quality from worst to best according to the
number of generations required to reach a MRCMA probability as follows: 2:25, 4:37, 1:25,
7:67, 6:67, 3:37, 5:67, 2:37, 4:67, 0:25, 3:67, 1:37, 2:67, 1:67, 0:37,
0:67.22
- Then, to show patterns more clearly, the match types were collapsed into four quality ranks:
- Quality 1 – 2:25, 4:37, 1:25, 7:67;
- Quality 2 – 6:67, 3:37, 5:67, 2:37, 4:67;
- Quality 3 – 0:25, 3:67, 1:37, 2:67;
- Quality 4 – 0:37, 1:67, 0:67.
Name Agreement: This is a “surname match”. Matches for each sampled participant at various
quality levels were counted by whether the matching person bore the same surname as the
participant (“Agree”) or a different surname (“Disagree”). Thirty-four (34)
project participants
bear a surname other than Taylor; comparison was to the surname the kit donor bears.
Phase 1: By Participant
An initial survey took place from 5th to 8th August 2011 with the object of quantifying
relationship, if any, between the Taylor surname &/or project participation and Y-DNA
matches.23
Data was gathered as to two questions:
- Did the participant (index person) have a match with one or more project participants
(or non-participants bearing the Taylor surname) sufficient to yield a probability of 90% or
better of a common paternal ancestor within the past 55 transmission events? (This is the
standard adopted by the project for declaring a high-quality match and translates to a 25-marker
match with genetic distance less than two or a 37-marker match with genetic distance less than
three.)
And
- Did the participant have one or more matches reported by FTDNA with persons who neither
were project participants nor bore the surname? (This equates to a 25-marker match with
genetic distance less than three or a 37-marker match with genetic distance less than five.)
The Chi-squared (Χ2 ) statistic24, sometimes described as a “badness of fit” test, yields a
confidence level; the worse the data fits a hypothesis, the higher the chi-squared value will be.
It is calculated by the equation to the right where O represents observed values and E expected
values.
The Χ2 distribution (for more than one degree of freedom) looks something like the graph to
the right.25
The small colored area on the right tail represents the remaining probability.
Χ2 = Sum of (observed-expected)2/(expected). The critical values for
a one-tailed test are
Critical X2
Values |
Degrees of Freedom |
1 |
2 |
3 |
4 |
5 |
p < 0.10 |
> 2.706 |
> 4.605 |
> 6.251 |
> 7.779 |
> 9.236 |
p < 0.05 |
> 3.841 |
> 5.991 |
> 7.815 |
> 9.488 |
> 11.070 |
p < 0.01 |
> 6.635 |
> 9.210 |
> 11.345 |
> 13.280 |
> 15.090 |
p < 0.001 |
> 10.827 |
> 13.815 |
> 16.268 |
> 18.465 |
> 20.517 |
For p < 0.001, we will reject the null hypotheses.
When Χ2 is inconclusive: A chi-squared test is intended to disprove a hypothesis, but not to
prove it. If a null hypothesis is not rejected; that doesn’t necessarily mean it is accepted.
Names being nominal variables, parametric statistics (such as Spearman’s correlation) are
typically not applicable; non-parametric tools to measure the association include:
-
Cramer's V
-- calculated by Φc = √[χ2/(N(k-1))], with k the
lesser of rows or columns. – ranges from 0 to 1;
- Lambda (λ) test (Goodman-Kruskal lambda) yields a “proportionate reduction of
error”, a measure of dependence between the variables; it indicates the extent to which the
independent variable reduces error of predicting a dependent variable. Also ranges from 0
to 1 and, multiplied by 100, represents percent reduction in error.
- The
odds ratio (OR), used for dichotomous measurements is the ratio of the odds of an event occurring in one group to the
odds of it occurring in another group. OR = p1q2/p2q1, where q = 1-p. OR=1 indicates the
event is equally likely to happen in either group; OR>1 indicates more likelihood of
occurrence in the first group; OR<1 indicates more likelihood in the second group. Calculating
an odds ratio would require data from more than one surname project.
- Contingency coefficient (C), C = (Χ2/(n+Χ2) -- is not applicable because our table is
not symmetric; number of rows (3) does not equal the number of columns (2).
Statistical Measures
The following statistical tools are available for measuring associations:
Question 1: Does any association exist between surname and Y-DNA?
If there were no association, we would expect matches by members of the project with other
Taylors to be no more frequent than the name’s frequency in the general population. Counting the
two most common variants (Taylor at 311 per 100,000 US residents and Tyler at 27 per 100,000);
this is about 0.338%.
The conceptual problem was to construct mutually exclusive categories for the name/match
variable. These categories fit the requirement:
- Taylor-only matches – The member has matches only with Taylor-surnamed persons.
- Non-Taylor only matches – The member has matches only with persons who do not have the
Taylor surname.
- Both Taylor and non-Taylor matches – The member has matches with both Taylors and on-Taylors.
We can asses the hypothesis of no association with a chi-squared test, but it is necessary to
first define the expected values:
- Taylor-only matches – no more frequent than the occurrence of the name in the
population,
- Non-Taylor only matches – the remainder after subtracting Taylor-only and Both from total
matches observed.
- Both – Taylor-only times 1.526
Chi-Square Calculation |
Odds of 1+ matches = |
92.8% |
|
|
Odds of a match with a Taylor = |
0.338% |
0.314% |
Category |
Expected |
Observed |
|O-E|-0.5
27 |
O-E)^2 |
(O-E)^2/E |
Taylor-only matches |
0.83 |
57 |
55.7 |
3,099 |
3,743 |
Non-Taylor only |
262.93 |
105 |
157.4 |
24,784 |
94 |
Both Taylor & non-Taylor |
1.24 |
91 |
89.3 |
7,967 |
6,414 |
|
chi-squared = |
10,251 |
df = 2 |
p <= |
<0.00001 |
Table 2: Phase 1, probability of no association
Meaning
There is some association between the Taylor surname and Y-DNA matches. There are more
matches with Taylor-surnamed men and fewer with non-Taylors than would be expected if the
variables were completely independent and this finding is statistically significant. The direction
and strength of the association will be explored below.
Aside: Due to the low frequencies of all surnames28,
other DNA surname projects are likely to see the same findings.
Question 2: How strong is the association and what is its direction?
Now that we’ve proved an association, the strength & direction of the association is to be
found. A positive direction is indicated by an excess of participants with Taylor matches over the
expected, random value.
We invented a tool to estimate overall strength of the association; we tried a series of probability
assumptions using chi-squared, goodness-of-fit trial calculations. We multiplied the ~0.3% probability
of a match with a Taylor (thus altering expected values) by factors of 1, 10, 20, 30, etc. to see --
with the observed actual values -- where Χ2 and its component parts reached their minima. In short,
we successively adjusted expectations to see where they most closely corresponded to observations.
Better correspondence is indicated by a smaller Χ2 and its contributing components.

Figure 4: X2 contributions
The graph above shows the results of the trials. Note that for multiplier factors <20 some Χ2
values are off the scale. The bottom scale represents the multiplying factors used. At this scale,
we see a declining trend in all contributors as the factor increases. We can not tell where the
minima occur.

Figure 5: X2 contributions detail
This version of the graph focuses in on the area where X2 minima occur and adds the total X2
series. Total Χ2 reaches minimum at a multiplier of ~80. The Taylor-only category reaches minimum when expected
probability of a Taylor match is ~75 times the random p. The non-Taylor category reaches a
minimum at ~80. The Both category also reaches its minimum at ~80.
/The Χ2 minima suggest that the odds of a random new member of the project matching others are
approximately 80 times greater than the random probability of 0.3%, as follows:
- No matches – 7.8% of all members;
- Taylor-only matches – ~22% of those with matches or ~21% of all members;
- Non-Taylor matches – ~37% of those with matches, ~34% of all members;
- Both Taylor and non-Taylor -- ~34% of those with matches, ~31% of all members
An Excel CHITEST indicated that the probability of these being the correct expected percentages
is ~0.93, a relatively high probability.
Association measurements:
Substituting the above percentages for expected values, we can estimate the association.
- Cramer’s V – V = √(0.1364/{263*1}) = 0.0227, a weak association.
- λ = 0.00913 and is interpreted as a weak association, explaining ~0.9% of the variance.
Summary of Phase 1:
An association between surname and Y-DNA matches does exist within the
Taylor Family Genes project. Matches with the Taylor surname are more
frequent than would be expected by chance.
The consensus of the measures of association (V=0.0343 and λ=0.00913) is that the association
between surname and Y-DNA matches is positive but weak for members of Taylor Family Genes.
The association is so weak that some experts would interpret the V and λ values as “little to
none”. “No association”, however, was ruled out.
Phase 2: By Match Type and Quality
Design:
The objective of Phase 2 was to test -- by match quality -- association between the Taylor
surname and Y-DNA matches within the project. We may see association become stronger as quality
increases.
A simple time-to-most recent-common-ancestor calculator29
determined a rank order for quality of the types of matches examined, from low to high:
2:25, 4:37, 1:25, 3:37, 2:37, 0:25, 1:37, 0:37. (Genetic distance is given first, before the
colon; then the number of markers compared; “2/25”, for example, means a genetic distance of 2
over 25 markers.)
The “expected value” for total matches was taken to be the sum of those actually found at the
respective match qualities, as we had no better basis for establishing expected match numbers
than to conduct participant-by-participant searches and record the number found. Expected values
for those which agree or disagree with the index persons’ surname is a portion of this total.
Our null hypothesis30
is that a moderate to strong association exists between surnames and Y-DNA matches; but we need
to state that quantitatively in order to test it.
- We will take a percentage of matches for a particular type or quality in which 50% or
more show agreement with the index person’s surname as indicating a strong association.
- A Cramer’s V (phi for 2x2 tables, symbol Φ) of 0.20 to 0.30 is considered moderate to
moderately strong; 0.30 to 0.40 is strong to very strong; and >0.40 is either “worrisomely
strong” or the two variables are measuring the same concept.31
- A V or Φ of greater than 0.2 will indicate a moderate to “worrisomely strong” association.
Our alternative hypothesis, HA, is that weak or no association exists between surnames
and Y-DNA matches.
- That <50% of matches will bear the same surname (i.e., “Agree”) as the index person for
the matches; equivalent to “>50% will not bear the same surname (i.e., “Disagree”) of the
index person for the matches.
- Cramer’s V <0.2 will indicate weak association.
Having established our null hypothesis, we can apply a chi-squared test to the observed and
expected frequencies. The observed values will be the number of matches actually found in the
searches, categorized by whether they agree or disagree with the index person surname, yielding
a two-by-two table with degrees of freedom = 3;
By arranging the data by quality of match type and quality band, we have data amenable to
correlation; we quantified both variables . We can calculate a correlation coefficient between
match quality and the percent of match surnames agreeing with the index persons’.
Quality Bands
Note: Redefine quality bands by TiP scores.
- Quality 1: 2:25 (GD= 2 for 25 markers), 4:37, 1:25, 7:67,
- Quality 2: 6:67,
- Quality 3:
- Quality 3:
Data collection took place 8 August to 18 September 2011. A temporary reference number was
assigned to the participants with at least 37 markers of results, to disguise identities from
public disclosure.
Data was collected on 264 participants, the entire qualifying population, though only 150 had
tested 67 markers. Data collection followed this form:
ID |
2:25 |
1:25 |
0:25 |
4:37 |
3:37 |
2:37 |
1:37 |
0:37 |
Agr |
Dis |
Agr |
Dis |
Agr |
Dis |
Agr |
Dis |
Agr |
Dis |
Agr |
Dis |
Agr |
Dis |
Agr |
Dis |
8 |
2 |
1 |
1 |
5 |
5 |
1 |
0 |
0 |
2 |
4 |
1 |
1 |
0 |
0 |
0 |
0 |
46 |
0 |
205 |
0 |
55 |
0 |
4 |
0 |
1 |
0 |
3 |
0 |
1 |
0 |
2 |
0 |
0 |
|
ID |
7:67 |
6:67 |
5:67 |
4:67 |
3:67 |
2:67 |
1:67 |
0:67 |
|
Agr |
Dis |
Agr |
Dis |
Agr |
Dis |
Agr |
Dis |
Agr |
Dis |
Agr |
Dis |
Agr |
Dis |
Agr |
Dis |
8 |
0 |
0 |
0 |
0 |
1 |
1 |
0 |
2 |
1 |
0 |
1 |
0 |
0 |
0 |
0 |
0 |
46 |
_ |
_ |
_ |
_ |
_ |
_ |
_ |
_ |
_ |
_ |
_ |
_ |
_ |
_ |
_ |
_ |
Table 3: Phase 2 data collection format
“Agr” means the surname on the match agreed with the index parson’s name; “Dis“ means the
surname on the match disagreed with the index parson’s name. An underscore (“_”) indicates no
match is possible because the markers were not tested.
Phase 2 data:
The table and graphs below summarizes the Phase 2 data. Detailed data is available upon request
or this link.
By Match Type & Quality |
Quality 1 |
Quality 2 |
2:25 |
4:37 |
1:25 |
7:67 |
6:67 |
3:37 |
5:67 |
2:37 |
4:67 |
Agree w/Surname |
77 |
32 |
137 |
7 |
1 |
50 |
7 |
96 |
24 |
Disagree |
12203 |
887 |
2748 |
564 |
345 |
414 |
208 |
229 |
114 |
Total Matches |
12280 |
919 |
2885 |
571 |
346 |
464 |
215 |
325 |
138 |
Participants w/ matches |
237 |
238 |
237 |
142 |
142 |
239 |
142 |
237 |
237 |
Participants w/ no matches |
26 |
23 |
26 |
8 |
8 |
24 |
8 |
26 |
26 |
|
|
Quality 3 |
Quality 4 |
|
0:25 |
3:67 |
1:37 |
2:67 |
1:67 |
0:37 |
0:67 |
Agree w/Surname |
145 |
34 |
94 |
19 |
34 |
45 |
5 |
Disagree |
387 |
56 |
133 |
45 |
21 |
62 |
8 |
Total Matches |
532 |
90 |
227 |
64 |
55 |
107 |
13 |
Participants w/ matches |
237 |
142 |
237 |
142 |
142 |
237 |
142 |
Participants w/ no matches |
26 |
8 |
26 |
8 |
8 |
26 |
8 |
Table 4: Phase 2 data summery
Note: Of the total 263 participants, 26 (~10%) had no matches at 25 markers for any
surname. This is a higher fraction without matches than the 8% observed for all matches because
five (5) of the 26 had matches at 37 markers.
Phase 2 Analysis:
Overall, the association between surname and
Y-DNA matches is weak. Only 4% of all DNA-match surnames agreed with that of the
index person – far lower than our null hypothesis that at least half would
agree. The graph to the right depicts the percentage
of match surname agreement by number of markers compared; 25-marker matches have the lowest
percentage at 2%; 37-marker matches are highest at 26%; 67-marker matches are next highest at 8%.

Figure 11: Surname Agreement
by number of markers compared
Chi-square values for an expected half of matches agreeing with the index persons’ surnames
are 7147, 1079, and 508 respectively and would be greater for an expectation of more than half.
With one degree of freedom, these translate to probabilities of approximately zero (0) for the
null hypothesis (of moderate to strong association) and support the alternative hypothesis
of weak association.
Meaning:
The overall surname/Y-DNA association is weak for 37-marker matches and almost completely
absent for 25- and 67-marker matches.
By match quality band:
We next reassembled the data by match quality band32,
obtaining
|
Quality 1 |
Quality 2 |
Match Type |
2:25 |
4:37 |
1:25 |
7:67 |
6:67 |
3:37 |
5:67 |
2:37 |
4:67 |
Agree |
77 |
32 |
137 |
7 |
1 |
50 |
7 |
96 |
24 |
Disagree |
12203 |
887 |
2748 |
564 |
345 |
414 |
208 |
229 |
114 |
Total |
12280 |
919 |
2882 |
571 |
346 |
464 |
215 |
325 |
138 |
Pct. Agree |
0.6% |
3.5% |
4.7% |
1.2% |
0.3% |
11% |
3% |
30% |
17% |
Pct. Disagree |
99.4% |
96.5% |
95.3% |
98.8% |
99.7% |
89% |
97% |
70% |
83% |
Agree |
Quality 3 |
Quality 4 |
|
Match Type |
0:25 |
3:67 |
1:37 |
2:67 |
1:67 |
0:37 |
0:67 |
Agree |
145 |
34 |
94 |
19 |
34 |
45 |
5 |
Disagree |
387 |
56 |
133 |
45 |
21 |
62 |
8 |
Total |
532 |
90 |
227 |
64 |
55 |
107 |
13 |
Pct. Agree |
27% |
38% |
41% |
30% |
62% |
42% |
38% |
Pct. Disagree |
73% |
62% |
59% |
70% |
38% |
58% |
62% |
Table 5: Data by quality band
Applying a chi-squared test to our null hypothesis that on-half or more of matches will have
surname agreement:
|
Quality 1 |
Quality 2 |
Match Type |
2:25 |
4:37 |
1:25 |
7:67 |
6:67 |
3:37 |
5:67 |
2:37 |
4:67 |
Pct. Agree |
0.6% |
3.5% |
4.7% |
1.2% |
0.3% |
11% |
35 |
30% |
17% |
X2 = |
5987 |
398 |
1181 |
272 |
171 |
143 |
94 |
27 |
29 |
p <= |
<0.001 |
<0.001 |
<0.001 |
<0.001 |
<0.001 |
<0.001 |
<0.001 |
<0.001 |
<0.001 |
|
Quality 3 |
Quality 4 |
|
Match Type |
0:25 |
3:67 |
1:37 |
2:67 |
1:67 |
0:37 |
0:67 |
Pct. Agree |
27% |
38% |
41% |
30% |
62% |
42% |
38% |
X2 = |
55 |
2.7 |
3.4 |
5.3 |
1.5 |
1.4 |
0.3 |
p <= |
<0.001 |
0.1 |
0.1 |
<0.001 |
0.2 |
0.2 |
0.6 |
Table 6: X2 test for HO: Exp=Obs/2
We may reject the null hypothesis of a moderate to strong association between surname and
Y-DNA for all but the following types of matches
- 3:67, 1:37 – with p<=0.1 – a 10% chance exists that the null hypothesis may, in fact, be
true.
- 1:67, 0:37 – with p<=0.2 – the chance of the null hypothesis being true is 20%.
- 0:67 – with X2=0.3 – the chance of the null hypothesis being true is 60%.
Meaning: For most types of matches, the association is proven weak to none. For a few types of
matches, the association may be moderate to strong.
Correlation between Match Quality and Surname Agreement:
A pattern emerged, as shown in Figure 12:

Figure 12: Name agreement by match type
As match quality improved, the percentage increased of matching persons whose surnames agreed
with that of the index person, reaching as high as 60% for 37-marker matches with genetic distance
= 0.
The pattern is seen more clearly here in Figure 13, with match types grouped into four ranked bands:

Figure 13: Name agreement by match quality band
Grouping into bands produces a strong correlation. The square of the correlation coefficient indicates the strength of the relationship between
match quality and surname agreement. Our correlation coefficients and their squares are
- By individual match type: r=0.732, r2 = 0.535;
- By quality band: r= 0.993, r2 = 0.987.
Meaning:
Surname agreement on Y-DNA matches depends highly on the quality of the match; match quality
accounts for more than half the variance in surname agreement.. The better the match, the
stronger the association between Y-DNA and surnames. However, this does not mean that a majority
of surnames will agree for the best matches; in only one type of match (0:37) did the majority of
surnames agree with the index persons’.
Furthermore, association between surname and Y-DNA may not be apparent at lower quality levels.
It is nonsensical to treat a 2:25 match in the same way as a 0:67.
Analysis by surname:
We analyzed the data by whether the participant’s surname was Taylor or another.
|
Quality 1 |
Quality 2 |
Quality 3 |
Quality 4 |
Taylor,
n=229 |
Total Matches |
13,899 |
1132 |
271 |
74 |
Pct. Agree |
1.4% |
12.7% |
35.1% |
54.8% |
X2 |
6563 |
315 |
34.5 |
0.63 |
p <= |
~0 |
~0 |
~0 |
0.7 |
Non-Taylor,
n=34 |
Total Matches |
879 |
356 |
223 |
40 |
Pct. Agree |
4% |
10% |
9% |
25% |
X2 |
367 |
116 |
73.5 |
5 |
p <= |
~0 |
~0 |
~0 |
0.0025 |
Table 7: Match agreement by surname and quality band

Figure 14: Name agreement by surname & quality band
We reject the null hypothesis as it pertains to the four quality bands and for most types of matches.
Exceptions:
- For Taylor
- 3:67, X2 = 0.8, p<= 0.4; a 40% chance of moderate to strong association;
- 2:67, X2 = 2.3, p<= 0.1;
- 1:67, X2 = 3.7, p<= 0.1;
- 0:37, X2 = 0.1, p<= 0.8; 80% chance of moderate to strong association;
- 0:67, X2 = 0.1, p<= 0.8; 80% chance of moderate to strong association.
- For other than Taylor
- 3:67, X2 = 3.6, p<= 0.1;
- 1:67, X2 = 1.1, p<= 0.3;
- 0:37, X2 = 3.4, p<= 0.1;
- 0:67, X2 = 0.5, p<= 0.5; 50% chance of moderate to strong association.
However, for the match types listed as exceptions, the numbers of matches are small and
the apparent association can be affected by anomalies such as ascertainment bias
Meaning: We’ve proven that, overall, association is weak to none but have not proven it for
some specific types of matches.
Analysis by haplogroup:
We repeated the analysis by major haplogroup
|
Quality 1 |
Quality 2 |
Quality 3 |
Quality 4 |
E,
n=9 |
Total Matches |
49 |
41 |
58 |
8 |
Pct. Agree |
20.4% |
9.8% |
17.2% |
37.5% |
X2 |
8.58 |
13.3 |
12.4 |
0.25 |
p <= |
0.003 |
0.003 |
<0.001 |
0.62 |
G,
n=9 |
Total Matches |
108 |
13 |
25 |
0 |
Pct. Agree |
1.9% |
0.0% |
16.0% |
NA33 |
X2 |
50.1 |
6.5 |
5.8 |
NA |
p <= |
<0.001 |
<0.001 |
0.011 |
NA |
I,
n=46 |
Total Matches |
2201 |
258 |
213 |
52 |
Pct. Agree |
1.3% |
14.0% |
32.4% |
42.3% |
X2 |
1043 |
67.0 |
13.2 |
0.615 |
p <= |
<0.001 |
<0.001 |
<0.001 |
0.43 |
J,
n=4 |
Total Matches |
60 |
20 |
13 |
0 |
Pct. Agree |
0.0% |
0.0% |
0.0% |
NA |
X2 |
30 |
10 |
6.5 |
NA |
p <= |
<0.001 |
<0.001 |
0.011 |
NA |
R1a, n=5 |
Total Matches |
148 |
11 |
3 |
6 |
Pct. Agree |
0.0% |
0.0% |
66.7% |
66.7% |
X2 |
74 |
5.5 |
0.17 |
0.33 |
p <= |
<0.001 |
0.019 |
0.68 |
0.56 |
R1b,
n=190 |
Total Matches |
14,089 |
1145 |
601 |
109 |
Pct. Agree |
1.5% |
12.1% |
34.4% |
50.5% |
X2 |
6627 |
330 |
29.1 |
0.005 |
p <= |
<0.001 |
<0.001 |
<0.001 |
0.95 |
Table 8: Name agreement by haplogroup and quality band
We reject the null hypothesis as pertains to the four quality bands and for most types of matches. Exceptions:
- E haplogroup (includes three sub-clades): Quality 4 (1:67,0:37, 0:67) plus match
types 1:37 & 2:67, where X2 values do not reach a critical threshold and p>= 0.1;
- G haplogroup (includes four sub-clades): Match type 3:67, X2 values do not reach a
critical threshold and p>= 0.1;
- I haplogroup (includes four sub-clades): Quality 4 plus 1:37, where X2 values do not
reach a critical threshold and p>= 0.1;
- J haplogroup (includes three sub-clades): Number of total matches is inadequate to draw
conclusions
- R1a haplogroup (includes one sub-clade): Quality 3 and 4, X2 values do not reach a
critical threshold and p>= 0.1; for 0:37 and 0:67 matches there is a 68% chance the null
hypothesis is true.
- R1b haplogroup (includes 14 sub-clades): Quality 4 plus 1:37, X2 values do not reach a
critical threshold and p>= 0.1; for Quality 4 matches, there is a 95% chance the null
hypothesis is true.

Figure 15: Name agreement by haplogroup
and quality band
Meaning: We’ve proven that, overall, association between surname and Y-DNA is weak to
very weak but have not proven it for some specific types of matches in some haplogroups. The
caveat about anomalies affecting higher match qualities, due to smaller numbers of matches,
remains.
Correlation:
We see a correlation between match quality and surname agreement for some haplogroups.
- E haplogroup: Relationship appears non-linear; this may be due to the group’s small size
of 9 participants. r = 0.645, r2 = 0.418
- G haplogroup: Non-rectilinear, r=0.808, r2 = 0.653
- I haplogroup: Linear; r=0.994, r2 =0.988
- J haplogroup: No relationship; r and r2 can not be calculated.
- R1a haplogroup: Non-linear; this may be due to the group’s small size of 5 participants.
r=0.894, r2=0.800
- R1b haplogroup: Linear; r=0.992, r2=0.984; this is a very strong correlation.
Analysis by number of matches
We also repeated the analysis relative to participants’ total number of matches at 25 markers.
The average number of matches per participant was 59.68 with a standard deviation of 139.6. Not
only was there a large variance, the distribution was non-normal.
We stratified the data by total number of 25-marker matches (chosen to avoid double-counting)
as follows
- 26 (10%) of participants had no matches at 25 markers. It includes 1 each in haplogroups E, G, J &
R1a; 4 in I and 18 in R1b. No conclusions about name/Y-DNA association can be drawn for this
group.
- 118 (45%) had from 1 to 10 matches and is designated the Low-match group; it accounted
for 0.5% of all matches found. It includes 7 in E, 5 in G, 16 in I, 1 in J, 2 in R1a and 87
in R1b.
- 68 (26%) had from 11 to 60 matches and is designated the Medium-match group; it accounted
for 4% of all matches found. It includes none in E, 1 each in G & R1a, 2 in J, 15 in I and 45
in R1b.
- 30 (11%) had from 61 to 300 matches and is designated the High-match group; it accounted
for 19.5% of all matches found. It includes none in E or J, 1 each in G & R1a, 9 in I and
19 in R1b.
- 21 (8%) had more than 300 (up to 985) matches and is designated the Very High-match group;
it accounted for 75.9%of all matches found. It includes none in E, G, J or R1a, 1 in I and 20
in R1b.

Figure 16: 25-marker matches
by number category
Figure 16 shows the average number of matches for these five categories and the numbers for
one standard deviation above and below the average.
|
Quality 1 |
Quality 2 |
Quality 3 |
Quality 4 |
No matches,
n=26 |
Total Matches |
0 |
0 |
0 |
0 |
Pct. Agree |
NA |
NA |
NA |
NA |
Low matches,
(1-10) n= 118 |
Total Matches |
378 |
132 |
246 |
60 |
Per Participant |
3.20 |
1.12 |
2.08 |
0.51 |
Pct. Agree |
27.0% |
63.6% |
66.1% |
63.3% |
Medium,
(11-60) n= 68 |
Total Matches |
1853 |
407 |
315 |
69 |
Per Participant |
27.2 |
5.99 |
4.63 |
1.01 |
Pct. Agree |
5.5% |
17.7% |
28.6% |
37.7% |
High matches,
(61-300)
n= 30 |
Total Matches |
3854 |
225 |
113 |
7 |
Per Participant |
129 |
7.63 |
4.03 |
0.43 |
Pct. Agree |
0.4% |
1.7% |
6.6% |
46.2% |
Very High,
(301+)
n= 21 |
Total Matches |
10,552 |
718 |
232 |
33 |
Per Participant |
502 |
34.2 |
11.0 |
1.57 |
Pct. Agree |
0.3% |
2.5% |
13.8% |
42.4% |
Table 9: Name agreement by number of matches and Quality band

Figure 17: Name agreement by
number of matches & quality band
Figure 17 depicts clear differences in name agreement relative to number of matches. The L
ow-match group shows stronger association with surname and the Medium-match group shows strong
correlation between quality and surname agreement.
Here, the collapsing the data into four quality bands disguises within-match-number group
relationships; so we include Figure 18, depicting agreement rates at specific match types.

Figure 18: Name agreement
by number of matches and match type
We found these patterns within the data:
- No-match group: No pattern is possible.
- Low-match group: Match/name agreement is above 50% for 1:25, 3:37 and all matches
better than 2:37.
- Medium-match group: Match/name agreement correlates with match quality, but reaches 50% only
for 2:67 matches.
- High-match group: Match/name agreement is approximately zero until 2:37, after which it
rises rapidly with quality.
- Very High-match group: Match/name agreement is approximately zero until 0:37.
Association vs. Match Numbers:
This analysis provided a surprising finding: Surname agreement inversely correlates with
number of matches. As the number of matches a participant has increases, the less likely it is
that surnames for those matches will agree with his; the association between Y-DNA and surname
gets weaker.
For the four quality bands, the correlation coefficient between number of matches and surname
agreement ranges from -0.57 to -0.59, with r2 from 0.329 to 0.35. Combining all match types,
r=-0.66 and r2=0.44. This means that the number of matches accounts for one-third to
44% of the total variance in surname agreement.
21.5% of participants with high and very high match numbers account for 95% of all matches
and those have the least association with their own surnames. Their data tend to “swamp” that
from those with moderate and low match numbers.
We speculate that some participants share low-diversity haplotypes34
and most have haplotypes
with more diversity. Those with more low-diversity haplotypes are less likely to find matches
whose surname agrees with theirs.
Summary of Phase 2:
We have proven false, in most instances, the null hypothesis of a moderate to strong
association between surname and Y-DNA. This means that we have proven the alternative hypothesis
of a weak to absent association except in these five situations:
- Matches of very high quality (1:67, 0:37, 0:67), whether with respect to surname,
haplogroup or number of matches;
- Haplogroup J, for which no conclusions were drawn;
- Haplogroup R1a for high (0:25, 3:67, 1:37, 2:67) and very high quality matches;
- Participants with no matches, for which no conclusions were drawn;
- Participants with 1 to 10 total 25-marker matches matches, for which surname agreement ranges from an average
of 27% at the lowest quality to more than 60% for higher qualities.
We found strong positive correlations between match quality and surname agreement. Generally,
as match quality improves, surname agreement rises. Association between surname and Y-DNA --
weak overall -- strengthens with match quality and may reach “strong” at the highest qualities.
We found inverse correlations between a participant’s number of matches and surname agreement;
the name/Y-DNA association is stronger for those with few matches than for those with many matches.
We did not perform a multivariate analysis, though one would possibly be informative as to
relationships between variables such as surname, haplogroup and number of matches.
The matter is not as simple as whether a strong association exists between Y-DNA and surname.
A good answer to the question must have nuances and complexities. Overall, the association is
weak to absent; but exceptions are found.
Surnames are categorical, not numeric; they can not correlate with other variables. They can,
however, be associated with variables such as Y-DNA similarity and the association can be
measured.
A positive association between surname and Y-DNA does exist, but is often grossly overstated.
It is weak within the Taylor Family Genes project and highly dependent on the quality of the
matches. It is also inversely dependent on the number of matches a participant has; the more
matches, the weaker the association.
Like physics’ gravitational force, Y-DNA/surname association appears to diminish exponentially
with distance – here, genetic distance. This is the coin’s other side for Y-chromosome stability;
DNA has a longer time view than genealogy often does.
The association is also dependent on a participant’s total number of matches. There is a strong
inverse correlation between number of matches and surname agreement. The fewer matches a participant
finds, the stronger the name/Y-DNA association he will see; the more matches he finds, the weaker the
association. This phenomenon is worth further investigation.
It is less than 1/10 of one percent probable (p<0.001) that a majority of matches will agree
with the typical index person’s surname. As to an expectation that >=60% of matches will agree
with the index person’s surname, it is improbable except for very high quality matches.
- At lower match quality (i.e., ”quality band 1”) the DNA/surname association is so weak as to
be almost immeasurable. Matches of these qualities are more likely not to bear the index
person’s surname than to bear it.
- The association becomes stronger as the match quality increases (i.e., number of markers
compared increases and genetic distance decreases).
- This finding is not puzzling if one remembers that matching persons at higher match
qualities are more likely to be related in more times. The lesser time presents fewer
opportunities for surname to become disassociated from Y-DNA.
- The association becomes weaker as a participant’s total matches increase.
- Participants in DNA surname projects are advised not to restrict searches for matches to
their own surnames.
This study has implications for other surname projects, particularly whose origins are
occupation, location, color or other physical characteristic. As these tend to be common
surnames, they are born by an overwhelming majority of the population.
Grateful appreciation is owed to Taylor Family Genes co-administrators Lalia Wilson and
George West, without whose advice this examination would have been the poorer. However, they
deserve no blame for any errors made; those are the author’s alone.
- “often believed” examples: (1) Sewell Y-DNA Surname Project,
http://www.stonepillar.org/:
“Furthermore, there is a very high correlation between the Y-DNA and the surname in Western
societies.” (2) www.familytreedna.com/public/bigelow: “…the markers on a male's Y-DNA correlate
with his patrilineal lineage and surname.” (3) www.worldfamilies.net/what: “y-DNA correlates with
the surname, as both y-DNA and the surname are passed down from father to son in patriarchal
societies.” (4) Jobling, Mark A. at
http://www.le.ac.uk/ge/maj4/surnames.html: “..we expect some
correlation between the two, but it is not clear how strong that correlation is likely to be..”
Return to main document
- Mol Biol Evol (2009) 26 (5): 1093-1102. doi: 10.1093/molbev/msp022 First published online:
February 9, 2009 at http://mbe.oxfordjournals.org/content/26/5/1093.full.
Return
- McEvoy B, Bradley DG, “Y-chromosomes and the extent of patrilineal ancestry in Irish surnames”.
Hum Genet 2006;119:212-219.
CrossRefMedlineWeb of Science Return
- Alternbenrd, Chris W., “QUASI-MARITAL CHILDREN: THE COMMON LAW'S FAILURE IN PRIVETTE AND DANIEL
CALLS FOR STATUTORY REFORM”, Florida State University Law Review, 1999,
http://www.law.fsu.edu/journals/lawreview/issues/262/alte.html
Return
- Re: “staple practice” of surname inheritance/ Wikipedia:
http://en.wikipedia.org/wiki/Patrilineality
Return
- Re: “neither universal nor ancient”. Hartman, Jed:
http://www.kith.org/journals/jed/2004/10/08/2333.html.
Return
- "Some areas of the world do not use surnames, in some family names
precede given names, and in a few matriarchal family names are the practice”
See above. Return
- US Bureau of the Census,
http://www.census.gov/genealogy/names/dist.all.last
(1990) and
http://www.census.gov/genealogy/www/data/2000surnames/Top1000.xls (2000).
Return
- Mol Biol Evol (2009) 26 (5): 1093-1102. doi: 10.1093/molbev/msp022 First published online:
February 9, 2009 at
http://mbe.oxfordjournals.org/content/26/5/1093.full. Return
- “British Surnames and Surname Profiles”,
http://www.britishsurnames.co.uk/surnames/TAYLOR ,
and US Census Bureau,
Return
- Taylor, Ralph E. at
~taylorydna/resources/explorations/size-vs-unmatched.htm.
Return
- (1) Walsh, Bruce at
http://nitro.biosci.arizona.edu/ftdna/quick.html,
“a chromosome is a molecular clock that ticks randomly within a specified rate.”;
(2) Kerschner, Charles at
http://www.kerchner.com/dnamutationrates.htm.
Return
- Including both project participants and non-participants
Return
- Bearing the surname is not a participation requirement in this project.
Many of these have joined after discovering matches indicating the possibility of a non-parental
event.
Return
- NPE is the term in most general use. King and Jobling called these
“NPT” for non-patrilineal transmissions. Other terms include “IAP” for incorrectly assigned
paternity.
Return
- Mol Biol Evol (2009) 26
(5): 1093-1102. doi: 10.1093/molbev/ [1]
Return
- Cerda-Flores
RM, Barton SA, Marty-Gonzalez LF, Rivas F, Chakraborty R (1999). "Estimation
of nonpaternity in the Mexican population of Nuevo Leon: A validation study
with blood group markers". Am J Physical Anthropol 109 (3): 281–293.
Link. Thirty-two (32) legal fathers were excluded as biological fathers in a group of 396
children.
Return
- Sasse G, Müller H, Chakraborty R, Ott J (1994). "Estimating the frequency of
nonpaternity in Switzerland". Hum Hered 44 (6): 337–43.
doi:10.1159/000154241.
PMID 7860087
Return
- Ashton GC (1980). "Mismatches in genetic markers in a large
family study". Am J Hum Genet 32 (4): 601–13.
PMID 6930820
Return
- Wikipedia,
http://en.wikipedia.org/wiki/Association_(statistics):
: “In statistics, an association is any relationship between two measured
quantities that renders them statistically dependent The term ‘association’
refers broadly to any such relationship, whereas the narrower term
‘correlation’ refers to a linear relationship between two quantities.”
{This distinction can be found in many other sources, but is best stated
in the cited source.}
Return
- Op. cit.
Return
- The first digit of the match type represents genetic distance, the
second digit represents the number of markers compared. Thus, “2:37” means a genetic distance of
two across 37 markers.
Return
- Projects for other than surname can use the Phase 2 design -- whether
matches agree or disagree with the participant’s name.
Return
- Source: National Institute for Standards and Technology (NIST),
http://itl.nist.gov/div898/handbook/eda/section3/eda3674.htm
Return
- Source:
http://www.medcalc.org/manual/chi-square-table.php Return
- An arbitrary ratio was chosen; however, due to the low expected
frequencies, it does not affect the result.
Return
- Yates’ correction for small expected values has only a small effect.
Return
- Only Smith exceeds a frequency of one percent.
Return
- Rank order was determined by number of generations to most recent common
ancestor at a given probability.
Return
- A hypothesis to be disproved by data. Rejection of the null hypothesis
is to mean acceptance of the alternative hypothesis.
Return
- POL242 LAB MANUAL: EXERCISE 3A,
http://homes.chass.utoronto.ca/~josephf/pol242/LM-3A
Return
- Grouping matches into quality bands involved some double-counting
of matches. Matches at 67 markers also tend to appear at 37 and 25.
Return
- NA = No matches, either agreeing or disagreeing with surname -- division by zero.
Return
- Examples include those sharing the “Niall of the Nine Hostages”
haplotype. These participants tend to have very high numbers of matches and few agreeing with
their own surnames. Their matches tend to show a wide diversity of names. Niall of the Nine
Hostages is a legendary Irish character, a High King of Ireland of the 4th and 5th centuries.
Return
-- End --