Technical Aspects of DNA Surname Project Administration
This page reviews the subject-specific functions of administering a genetic
genealogy (abbreviated GG) project for a surname. The other side of the are the
The technical aspects can be divided into two categories:
Most of our attention will be on the genetics side because it is the aspect
least-understood by our members.
Note: The content of this section is a work in progress.
It will probably not be complete by the time the intended reader needs the information.
The field we work in, genetic genealogy, is complex and our state of our knowledge
is constantly changing -- hopefully advancing. Even experts are often
stumped by some of the things they see.
Do not think there is a comprehensive guide to the field. None exists
and, if one did, it would soon be obsolete.
Do not be so arrogant as to think anything is known to an absolute certainty. That is
a recipe for losing credibility. Prize uncertainty; it is your one stable
anchor. For more on this topic
see this page.
The purpose of genetics in "genetic genealogy" is to support (not take
over) the genealogy -- by which we mean traditional, documentary genealogy.
It is, therefore, important for a project admin to have a firm grasp of
genealogical methods, approaches and resources.
For a common and multi-origin surname, such as Taylor, the list of required
skills and knowledge can become quite long and varied.
Two organizations seek to improve quality in genealogy,
Board for Certification of Genealogists and
Society. Also see _
- How to proceed
Reversing the process assures frustration; you'd need to researhc many
lineages that have no relation to yourself. For example, my grandfather
(also a genealogist) tried tracing his Liles ancestors from past to
(his) present. After 40 years of work, among his many "possibles" was a line
two generations short of making the connection; he died before knowing it.
- Start with the known -- e.g., yourself, the present
- Work toward the unknown -- e.g., your ancestors, the past
- Take one step at a time from known to unknown.
Short-cuts can be disastrous. Fro example, many trees on the
Internet are wrong; they can lead you down false trails.
- Sources, information & evidence
Use the best evidence available. Conclusions by others are only clues and should be
- Sources may be of either of two types
- Original -- Original sources are the first recordings (documents,
pictures or objects) of an event.
-- Derivative sources are copies of the original. As the process of
copying can introduce departures from the original (errers, image
- The information in the sources to support genealogical
statements should be sufficiently cited so that another could independently
evaluate your work; they come in two types
A record may be a primary source for some matters but a secondary
source for others.
Primary sources are better than secondary but aren't always available.
BTW, most published trees are poor sources.
- Primary records are made by a witness close in time to
- Secondary records are made by other than a witness or long after the
- Evidence is of two types
A record may provide direct evidence of some things and indirect
evidence of others.
- Direct evidence speaks directly to the question at hand
- Indirect evidence implies an answer but is not in itself definitive.
generally, it takes more indirect evidence than direct to establish a
supposition as fact.
Only one piece of direct evidence may be necessary; several pieces of
indirect evidence may be required.
- Reasonably exhaustive search -- just because you haven't found contravening evidence doesn't
necessarily mean it doesn't exist.
- Logic: Every genealogical statement is a result of one or more
conclusions. the logic linking the pieces of evidence into a conclusion is as
important as the evidence itself.
Real genealogy, as opposed to "tree copying", is hard work. It
is, though, much more satisfying because it comes from honest work.
The more difficult a genealogical question is, the more useful the scientific method becomes. It consists of these steps.
- Have a question
- Do preliminary background research
- Formulate the question as a hypothesis which can be tested with evidence.
It should be a yes/no question, not open-ended.
- Determine the type of evidence needed to test the hypothesis
- Gather evidence. (This is where the "reasonably exhaustive search"
comes in.) )
- Analyze the evidence and draw a conclusion.
- Write a "proof statement".
Genealogical Time Frame
Genealogy is restricted to the time when we can -- by credible evidence
-- identify specific ancestors by
names, dates, places &/or other characteristics. Verifiable facts are key.
If we can not identify a specific person, it's not genealogy. Nor, without
evidence, is it genealogy; the story is then myth.
Some stories are suspect on their faces. The evidence may degrade and
disappear over centuries, the standard for proof doesn't
These identification and evidentiary requirements limit genealogy to the
period of written history and, specifically, official documentation of one's
ancestors. A pragmatic limit is universal surname (inherited family name)
adoption. This is not the same
time as initial surname use; surnames were first used by elites comprising a
small fraction of the total population. For most of us, most of our ancestors
There are so many different Taylor families, the project admin team doesn't
conduct research for the members -- except in unusual
cases when an admin or co-admin may volunteer to help. However, the team
encourages members and offers advice and tips
for them to pursue.
See some of the explorations conducted and tools developed
A TFG admin or co-admin will have the opportunity to see many
member-submitted genealogies. Some will be of excellent quality, thoroughly
researched and documented with appropriate source citations. Others will be
(to put mildly) slipshod and contain obvious errors. Examples:
- Subsequent generation born before prior generation.
- Subsequent generation born before prior generation attains puberty.
- Birth at an impossible (or highly unlikely) time and place, e.g.,
Alabama before ~1800.
Among the common errors in Taylor genealogy is the "Zachary myth", that
the member's patriline is shared with Zachary Taylor, 12th US President.
The romantic notion is false much more often than true. (For the real Zachary family, see
Related to this is the "Rowland myth", that Zachary's patriline was that of
the Protestant martyr. (See the excellent work of
Nat Taylor and
Ann Blomquist.) And, also the "Taliaferro myth" of descent from a lieutenant
of William the Conqueror.
In line with not conducting research, the project neither censors nor vouches
for members' genealogies. However, we do try to assist members to correct
A natural human tendency is to look harder for evidence of what we already believe and weight it more heavily than evidence to the contrary.
"My mind is made up; don't confuse me with facts." We prefer confirmation of
what we "know" over suggestions that we're wrong.
It is usually, BTW, unproductive to try arguing anyone out of their
confirmation biases. The more the belief is challenged, the stronger it becomes.
Once a person has invested ego into a belief, they will protect and defend that
While we don't censor member-submitted trees, project administrators are
often asked their honest opinions of those trees. There are some signals
that a tree may not be reliable; when any appear, an evaluator is right to question the
- Famous ancestors (Zachary Taylor, Rowland Taylor, Taliaferro,
etc.) or royalty: It's tempting to claim a glorious heritage, hard to prove.
Extraordinary claims require extraordinary evidence; they can not be taklen at face value.
- Back to Adam: With few exceptions, it's not really possible for a
genealogy to be accurate more than a thousand years back and in most cases
half that time; the evidence does not exist. (See Genealogical Time Frame)
- Conflict with verifiable fact. For example, James Taylor (d.
1692, K&Q Co., VA) could not -- despite Mary Taylor Brewer's claim --
have been born in Cumbria's Pennington Castle; it was destroyed in the
12th century and never rebuilt. By James' birth, if was just a mound
surrounded by a ditch.
- Copying known errors: Certain genealogical errors have gained
notoriety (among the cognoscenti); when these errors are presented a fact,
it signals sloppy work.
Common surname, conflation
Another common error arises from the commonness of the Taylor surname. It
is among the 15 most-frequent names in the US and was previously among the
top 5. In England, it still is.
When this fact is combined with the tendency to choose forenames from a
limited list (Robert, James, William, etc.), it means that records with the
same forename and surname can not be presumed to refer to the same person.
There were, for example, at least four separate men named
Abraham Taylor in Maryland during the late 1600s. And, that was among
only 20,000 Marylanders!
Additional genealogical markers (occupation, wife or child's name, etc.)
must be found to establish identity.
Confusing one man with another, "squeezing" two or more into one identity,
is called "conflation" and it occurs often with novice Taylor genealogists.
Psychologically, conflation is a symptom of dementia.
An estimated 20% (or more) of Taylors have
iNPE in their paternal pedigrees. There will be times when it seems obvious
they have no matches with any Taylor and significant matches with another
surname. Be wary; these can present delicate situations. To avoid trouble:
- Let the member take the lead. Don't mention NPE until necessary. For
example, let them ask "Why am I brick-walled at my great-grandfather?"
- Don't press the issue. If the member takes umbrage, let it go until
- Don't say illegitimacy or adultery, unless buried in a list. These
are only two of many possible NPE explanations.
- Suggest a research strategy, such as looking for a Taylor family and
a family of the other surname living in close time/place proximity of
the brick-walled ancestor's birth.
In another type, eNPE; the member will match the Taylor surname but not
others. Let them join the project before discussing.
See the page especially devoted to genealogy.
Because we're a surname project, we're primarily focused on
Y-chromosome DNA. It is, as I write this, undergoing a revolution and
there's a risk that the guidance here will be hopelessly outdated within a
few years. Nonetheless, we'll cover the 2016 basics and some advanced
A movement is currently underway to establish standards for genetic
genealogy, with the focus mostly on genetics.
See this. (I have concerns about the proposed standards and, if they go
the way feared, may choose to ignore some.)
Continual learning will be essential in order to keep up with the field. Join ISOGG and read the forum posts. The ISOGG Wiki has much useful information. Go to conferences to rub elbows with other admins; valuable tips can be
Unfortunately, most of our peers (admins for multi-origin common
surname projects) are not especially participative. Maybe, they're too
busy with their projects to look outside.
Read the project website. It's been pitched (mostly) at an introductory level
while remaining true to the science. Many advanced (even arcane) topics have been explored in
the Resources section.
The GAP (Group Administrator Pages) are how many project administration tasks are accomplished.
Read here for a more detailed
There are at least three ways (genealogical reasons) for people to test DNA.
I describe them as:
- Blind testing (sometimes called "seek mode" or "fishing") is the most popular.
People test in hope of finding a match and, through that match, finding
ancestors presently unknown to them. (Our project success rate is
>50% for finding intra-project matches, less for the necessary
- Focused testing is for the purpose of confirming or refuting a
suspected relationship. Two (or more) people test at the same time in the
same way and
compare results against each other. This is the least-used of the three
modes. It is, though, the surest for achieving the testing goal.
- Investigative testing is for the purpose of learning more about
known relationships or about one's deep ancestry. For example, more markers & SNPs may be tested to see if
branches of a genetic family can be identified.
Investigative testing has grown more popular with increased member
sophistication and availability of new tests.
The above terms are my own invention. The subject is little-discussed.
In addition to why people test, there is the matter of whether the test
can fulfill its purpose.
- Inadequate: Within our project, we consider some tests inadequate to meet participants' needs. A ySTR test of less than
37 markers or a mtDNA test less than the Full Genome Sequence are
adequate only to disprove a supposed relationship; they are not
recommended for blind testing. (This is the short answer; a more
accurate one is complicated, involving
- Sufficient -- For most, a ySTR test of 37 markers is
sufficient to identify a specific Taylor family. However, 5% to 10% of
men will need more markers and, possibly, ySNP testing.
- Advanced -- After a participant answers a basic question
(e.g., "Which Taylor family is mine?") more questions may arise (e.g.,
"Which is my branch of the family?", "What is its origin?"). These
questions may require additional testing: More markers, fine-level SNPs,
Matching is based on the principle of haplotype similarity. When two or
more haplotypes are sufficiently similar, we conclude they derive from the
same ancestral source. We do not insist the haplotypes be identical; identity is
relatively uncommon after a few DNA transmissions.
Notice that whether the similar haplotypes come
from the same source is a conclusion. This may or may not be a fact.
Subsequent evidence may disprove a previous conclusion.
Matching and its related activity, grouping into genetic families, are among the most important
tasks of project administration. Attend to them assiduously; do not let them
We've developed some standards out of necessity for our own use.
See this page.
Vocabulary is a problem throughout the genetic genealogy community;
there's a lack of standardization, inhibiting communication.
Different words can mean the same thing and the same word can mean different things.
For the way I've used terms, see the
With all the technicalities and complexities, let's not lose sight of "A match exists when two
(or more) DNA haplotypes are sufficiently similar to indicate
a high probability that two or more individuals share a common ancestor within genealogical
time." This definition is, of course relevant to other DNA than Y.
We've defined "genealogical time" as the past 24 generations. It seems fairly
well-accepted and it's as far back as most tools allow.
Further, 24 generations represents seven or more centuries, taking us back to
the approximate time of
universal surname adoption in England.
We've defined "high probability" as equal to or better than 80%; most TMRCA
we've found are either higher than 90% for 24 generations or much lower than
For yDNA, admins differ in approaches & techniques. I use these processes and rules of thumb:
- Genetic distance (GD) is the roughest cut -- equal to or better than 2:37, 3:67, or 5:111
is usually indicative of a common ancestor within genealogical time. (The number before the colon is steps of GD, after the colon is the number
of markers compared.)
- A big advantage of this approach is that GD is immediately available on members'
FTDNA match lists.
- In my experience, GD finds about 95% of true positive matches and only 5%
false negatives. As to false positives, read up on coincidental matches
and convergent evolution.
actual definition of GD is a moving target; it's more complex
than it seems and has changed over the years.
- FTDNA TiP (Time Predictor) is more precise than GD because it accounts
for varying mutation rates across markers.
- TiP (accessed through the Y-Genetic Distance page on the GAP can
find the missing 5% (the "false negatives").
- In general, we take a TiP probability of 80% for 24 generations (no
"paper trail" adjustment) at resolutions of 37 or more markers to be
indicative of a common ancestor within genealogic time.
- But TiP probabilities (reported to hundredths of a percent) are not
necessarily as precise as implied and do not take factors such as
haplogroup conflicts into account.
- McDonald: and McGee
have developed tools useful to the matching task.
Both these utilities should be taken with grains of salt.
- RCC: We also
evaluated a method called "Revised Correlation Coefficient"
(RCC) and came to the judgment that it adds little the above tools lack. It
is mathematically simpler but we question the concept.
- Surname match: Whether surnames match is an uncertain guide to
interpreting a STR match.
On the other hand, a match between surnames (including variants of a
name) adds confidence that a DNA match reflects shared paternity. And, a
non-match of names introduces questions, some of which may be unanswerable.
- Up to 30% of people have NPE in their trees, often undocumented and
- Some areas (e.g., highland Scotland) adopted universal surnames late and
extra-surname matches may reflect patrilines dating to the 18th century.
- STR matches can be invalidated by conflicting SNPs (haplogroups &
- For example, say one party to a match is R-U106 and the other is R-P312.
These two subclades of R-M269 split about 4,000 years ago.
- Haplotype commonness vs. rarity: A
study in Spring 2015, I
think, implies that match standards can be a bit looser for rare
haplotypes than common ones. More of the FTDNA-reported common-haplotype
"close matches" -- especially at greater genetic distances -- are likely
to be coincidental.
- Conversely, use caution with common haplotypes. SNP testing may be
needed to identify non-valid matches.
- The above may be related to convergence. It may be that mutations
are "pushed" toward certain values.
When one looks at the distributions of markers' allele values by haplogroup,
they're very "tight" -- concentrated around specific values. They may differ
substantially from one haplogroup to another, but not within haplogroups
- Stability: Some mutations are to be expected
within the same patriline. See
the page on haplotype stability.
- Variance in mutation rates:
Published STR marker mutation rates are merely averages; there seems to be much variance
around those averages. Some families may have very stable haplotypes while others generate
mutations as though they earned bonuses.
For example, one Taylor genetic family has GD=0 for 67 markers for three members,
though the paper trails show the CMA must have been at least 7 generations
back. This a match quality is rarely seen with more than two generations
Scientists, too, disagree as to reliability of previously published
- Why 37? STR markers in the 26-37
panel (PP3) are the most volatile tested by FTDNA; it is here that most
haplotypes begin to show their distinctiveness from each other and allow an
interpretation with reasonable confidence. Lesser resolutions are more
subject to convergence.
- About 85% of TFG members have tested to 37; few potential matches
are eliminated. Only 50% to 67 and 15% to 111.
- To interpret a match, use the highest-resolution comparison available. The more markers
used, the greater the confidence and precision.
- Sow's ear to silk purse? Let's not forget that randomness
is inherent. Not everything we see can be explained.
"Genealogical significance" is a term to distinguish matches as to
relevance to a member's patriline. To deem a match significant signals a
member to pursue it further. By contrast, deeming it "not significant" signals the
match is of less priority for follow-up.
Grouping follows matching and is perhaps the most important task of a TFG admin;
it can be the
most complex. We've developed a separate page on it
TiP (Time Predictor) is a proprietary FTDNA TMRCA calculator.
See our separate page on it.
This technique -- only to follow successful groupings -- is for the purpose of
determining the ancestral haplotype for the family. (Read
Determining the ancestral haplotype simplifies grouping decisions. Potential
members can be compared against this one haplotype instead of each others'.
- At least one set of STR results from each known branch of the family;
- This presumes genealogies identify the separate branches.
- A minimum of two sets , three or more is preferred;
- With only two sets of results, any difference in STR values will make it impossible to
determine a modal value.
- At least 37 markers for each branch, 67 are preferred. (111 would be
preferred but one is unlikely to have enough 111-marker results.)
A potential problem in triangulation is over-weighting one branch
relative to others. Ideally, each branch should be equally represented. In
actual practice, however, it's not always possible to attain equal
weighting; a modal haplotype for the group may be the next best alternative.
After a genetic family meeting the requirements below has been determined, one may
proceed to an analysis to determine the genetic differences among the
branches to infer its various branches. Let us emphasize:
This type of analysis uses only the differences between group members.
It is thus essential to ensure the analysis is applied only to a qualifying
Similarities contribute nothing to the analysis. Markers which show no
differences within the group may be ignored.
- At least one set of STR results from each known branch of the family;
- Branches may be identified by the analysis..
- A minimum of three sets , four or more is preferred;
- At least 37 markers for each branch, 67 preferred. (111 would be
preferred but we are unlikely to have enough 111-marker results.)
This author prefers to use Network software from Fluxus Engineering, but
other techniques are also valid. See our notes on using Fluxus.
At this writing, mtDNA is problematic for finding a common ancestor via
a living cousin. The best
TMRCA is 50% for 5 generations and 95% for 22 (exact match on full genome).
This seems insufficiently precise for expending much effort.
At this time, TFG groups mtDNA results only by haplogroup. No attempt is
made to identify specific maternal lineages.
Autosomal DNA, Family Finder
At this writing, the project regards this as a valuable means for individuals
to find cousins descending from indirect maternal and paternal ancestors. The operative
term is individual; it doesn't fit well into surname projects.
However, autosomal DNA, can be a useful adjunct to Y-DNA. We don't inherit
just a Y chromosome from our direct paternal ancestors, but also pieces of the
other 22 chromosomes.
- Example: A man with an unknown father matched R1b-002, placing
him firmly in this genetic family. However, the exact relationship was
unknown until he also had an autosomal match with a man whose ancestral
patriline was known. The autosomal match identified the branch of the
family, making it possible to trace down to descendants living in close
proximity to the mother.
Glitches & Anomalies
Frustratingly, much admin time and effort seems to be taken up
by system glitches and things that don't work or work irrationally.
Family Tree DNA is a good company, honest & reliable; they are admirable in
many ways. But communication is not their strongest point. Nor, are their
processes and procedures always "best practice" or straightforward. The admin is
often caught in the middle.
In the Spring of 2015, FTDNA made it more difficult for customers to contact
them. Telephone-answering hours were restricted and e-mails were required to be
submitted via an online "feedback form". The rationale for the former was to
allow more time for customer service staff to respond; for the latter, it was to
better classify & route inquiries. Of course, the limitations on contacting FTDNA led to
more inquiries to project admins.
That URL is https://www.familytreedna.com/contact.aspx#contactForm
(at least in December 2015; it may change). To use it, one must first log in to
the FTDNA site. Project admins (only) have a separate e-mail address they can
use, firstname.lastname@example.org; try to limit it to
You may also find that, when FTDNA does respond, the answers sometimes don't make sense
and the customer/member will turn to you. It's a delicate position; on the one
hand, you want to be helpful and honest. On the other hand, you don't want to
undermine FTDNA credibility. Guidelines:
- Refer complaints about FTDNA to FTDNA. Don't deprive them of the
- Be as factual as possible; check your facts.
- Own any advice offered; identify it as your opinion.
- Try to avoid generalizing; stick to the member's specific situation.
Update: In November 2015, FTDNA's new customer services manager sounded
like improvements would be made. More & higher-quality staff is being hired and
Much of your work will be done on the FTDNA website, those pages known
collectively as the "GAP" for Group Administrator Pages. They may not work
The FTDNA site uses a dynamic content management system, which differs greatly from
a static website like this one.
- In static sites, pages & content are created by authors and that same content
is delivered to all users in all sessions until
changed and uploaded anew to the site's server. The user (1) clicks on a
link and (2) the page at that location is delivered.
- In a dynamic content management system (CMS), pages
are created "on the fly" by scripts which are processed according to user
input and information stored in the server's databases. The user,
in essence, (1) completes an online form and clicks a submit button; the form
(2) invokes a
query script which (3) calls up information from the databases to (4) compile a
report which is then (5) displayed by a "presentation layer".
A dynamic system is much more versatile; content can change when the
relevant database is updated; today's content will be different than
yesterday's with no authoring needed. And, it is more secure than
"client-side processing" because all processing occurs on the site server's
not on the users' computers. (Users don't see the scripts.)
Considering the jobs that the FTDNA site must do, a static site is not an
option; a dynamic approach is
and maintenance of a dynamic system are complex and technically challenging because it's
harder to anticipate all possible conditions and interactions. Also, the
server load is much increased; with heavy traffic, this can lead to delays
and time-out errors.
While technical specifications are, of course, proprietary and not publicly disclosed, a
significant clue is that pages on the FTDNA site have the filename extension ".aspx",
indicating they use Microsoft's ASP.NET technology.
(aspx stands for "Active Server Page, Extended".)
Within the IT community,
there's a hefty debate between the rival ASP,
& PHP camps.
The major bone of contention with ASP is that it runs only on a Windows OS,
rather than the Linux (Apache) OS of most large servers.
It may be that the information quantity and complexity on the FTDNA site have
outgrown what a Windows operating system can handle.
I imagine you'll find your own.