Other pages & sections of our site:
[Home]  [Y-DNA]   [Contacts   [Groups]  [Haplogroups[Trees]  [Project Blog]  [Special]   [FAQ]
On this page:
 

Technical Aspects of DNA Surname Project Administration

This page reviews the subject-specific functions of administering a genetic genealogy (abbreviated GG) project for a surname. The other side of the are the management functions.

The technical aspects can be divided into two categories:

Most of our attention will be on the genetics side because it is the aspect least-understood by our members.

Note: The content of this section is a work in progress. It will probably not be complete by the time the intended reader needs the information.

Cautions

The field we work in, genetic genealogy, is complex and our state of our knowledge is constantly changing -- hopefully advancing. Even experts are often stumped by some of the things they see.

Do not think there is a comprehensive guide to the field. None exists and, if one did, it would soon be obsolete.

Do not be so arrogant as to think anything is known to an absolute certainty. That is a recipe for losing credibility. Prize uncertainty; it is your one stable anchor. For more on this topic see this page

Genealogy

The purpose of genetics in "genetic genealogy" is to support (not take over) the genealogy -- by which we mean traditional, documentary genealogy. It is, therefore, important for a project admin to have a firm grasp of genealogical methods, approaches and resources.

For a common and multi-origin surname, such as Taylor, the list of required skills and knowledge can become quite long and varied.

Principles

Two organizations seek to improve quality in genealogy, Board for Certification of Genealogists and National Genealogical Society. Also see _

  1. How to proceed
    1. Start with the known -- e.g., yourself, the present
    2. Work toward the unknown -- e.g., your ancestors, the past
    3. Take one step at a time from known to unknown.
    Reversing the process assures frustration; you'd need to researhc many lineages that have no relation to yourself. For example, my grandfather (also a genealogist) tried tracing his Liles ancestors from past to  (his) present. After 40 years of work, among his many "possibles" was a line two generations short of making the connection; he died before knowing it.
    Short-cuts can be disastrous. Fro example, many trees on the Internet are wrong; they can lead you down false trails. 
     
  2. Sources, information & evidence
    1. Sources may be of either of two types
      • Original -- Original sources are the first recordings (documents, pictures or objects) of an event.
      • Derivative -- Derivative sources are copies of the original. As the process of copying can introduce departures from the original (errers, image degradation, etc.)
         
    2. The information in the sources to support genealogical statements should be sufficiently cited so that another could independently evaluate your work; they come in two types 
      1. Primary records are made by a witness close in time to the event
      2. Secondary records are made by other than a witness or long after the event
      A record may be a primary source for some matters but a secondary source for others. Primary sources are better than secondary but aren't always available. BTW, most published trees are poor sources.
       
    3. Evidence is of two types
      1. Direct evidence speaks directly to the question at hand
      2. Indirect evidence implies an answer but is not in itself definitive. generally, it takes more indirect evidence than direct to establish a supposition as fact.
      A record may provide direct evidence of some things and indirect evidence of others.
      Only one piece of direct evidence may be necessary; several pieces of indirect evidence may be required.
    4. Reasonably exhaustive search -- just because you haven't found contravening evidence doesn't necessarily mean it doesn't exist.
       
    Use the best evidence available. Conclusions by others are only clues and should be independently evaluated.
     
  3. Logic: Every genealogical statement is a result of one or more conclusions. the logic linking the pieces of evidence into a conclusion is as important as the evidence itself.

Real genealogy, as opposed to "tree copying", is hard work. It is, though, much more satisfying because it comes from honest work.

Scientific Method

The more difficult a genealogical question is, the more useful the scientific method becomes. It consists of these steps.

  1. Have a question
  2. Do preliminary background research
  3. Formulate the question as a hypothesis which can be tested with evidence. It should be a yes/no question, not open-ended.
  4. Determine the type of evidence needed to test the hypothesis
  5. Gather evidence. (This is where the "reasonably exhaustive search" comes in.) )
  6. Analyze the evidence and draw a conclusion.
  7. Write a "proof statement".

Genealogical Time Frame

Genealogy is restricted to the time when we can -- by credible evidence -- identify specific ancestors by names, dates, places &/or other characteristics. Verifiable facts are key. If we can not identify a specific person, it's not genealogy. Nor, without evidence, is it genealogy; the story is then myth.

Some stories are suspect on their faces. The evidence may degrade and disappear over centuries, the standard for proof doesn't

These identification and evidentiary requirements limit genealogy to the period of written history and, specifically, official documentation of one's ancestors. A pragmatic limit is universal surname (inherited family name) adoption. This is not the same time as initial surname use; surnames were first used by elites comprising a small fraction of the total population. For most of us, most of our ancestors were commoners.

TFG Genealogy

There are so many different Taylor families, the project admin team doesn't conduct research for the members -- except in unusual cases when an admin or co-admin may volunteer to help. However, the team encourages members and offers advice and tips for them to pursue.

See some of the explorations conducted and tools developed here.

Evaluating Genealogies

A TFG admin or co-admin will have the opportunity to see many member-submitted genealogies. Some will be of excellent quality, thoroughly researched and documented with appropriate source citations. Others will be (to put mildly) slipshod and contain obvious errors. Examples:

  1. Subsequent generation born before prior generation.
  2. Subsequent generation born before prior generation attains puberty.
  3. Birth at an impossible (or highly unlikely) time and place, e.g., Alabama before ~1800.

Among the common errors in Taylor genealogy is the "Zachary myth", that the member's patriline is shared with Zachary Taylor, 12th US President. The romantic notion is false much more often than true. (For the real Zachary family, see R1b-002.)

Related to this is the "Rowland myth", that Zachary's patriline was that of the Protestant martyr. (See the excellent work of Nat Taylor and Ann Blomquist.) And, also the "Taliaferro myth" of descent from a lieutenant of William the Conqueror.

In line with not conducting research, the project neither censors nor vouches for members' genealogies. However, we do try to assist members to correct errors.

Confirmation Bias

A natural human tendency is to look harder for evidence of what we already believe and weight it more heavily than evidence to the contrary. "My mind is made up; don't confuse me with facts." We prefer confirmation of what we "know" over suggestions that we're wrong.

It is usually, BTW, unproductive to try arguing anyone out of their confirmation biases. The more the belief is challenged, the stronger it becomes. Once a person has invested ego into a belief, they will protect and defend that investment.

Red Flags

While we don't censor member-submitted trees, project administrators are often asked their honest opinions of those trees. There are some signals that a tree may not be reliable; when any appear, an evaluator is right to question the entire structure:
 

  1. Famous ancestors (Zachary Taylor, Rowland Taylor, Taliaferro, etc.) or royalty: It's tempting to claim a glorious heritage, hard to prove. Extraordinary claims require extraordinary evidence; they can not be taklen at face value.
  2. Back to Adam: With few exceptions, it's not really possible for a genealogy to be accurate more than a thousand years back and in most cases half that time; the evidence does not exist. (See Genealogical Time Frame)
  3. Conflict with verifiable fact. For example, James Taylor (d. 1692, K&Q Co., VA) could not -- despite Mary Taylor Brewer's claim -- have been born in Cumbria's Pennington Castle; it was destroyed in the 12th century and never rebuilt. By James' birth, if was just a mound surrounded by a ditch.
  4. Copying known errors: Certain genealogical errors have gained notoriety (among the cognoscenti); when these errors are presented a fact, it signals sloppy work. 

Common surname, conflation

Another common error arises from the commonness of the Taylor surname. It is among the 15 most-frequent names in the US and was previously among the top 5. In England, it still is.

When this fact is combined with the tendency to choose forenames from a limited list (Robert, James, William, etc.), it means that records with the same forename and surname can not be presumed to refer to the same person.

There were, for example, at least four separate men named Abraham Taylor in Maryland during the late 1600s. And, that was among only 20,000 Marylanders!

Additional genealogical markers (occupation, wife or child's name, etc.) must be found to establish identity.

Confusing one man with another, "squeezing" two or more into one identity, is called "conflation" and it occurs often with novice Taylor genealogists. Psychologically, conflation is a symptom of dementia.

NPE

An estimated 20% (or more) of Taylors have  iNPE in their paternal pedigrees. There will be times when it seems obvious (to you); they have no matches with any Taylor and significant matches with another surname. Be wary; these can present delicate situations. To avoid trouble:

  1. Let the member take the lead. Don't mention NPE until necessary. For example, let them ask "Why am I brick-walled at my great-grandfather?"
  2. Don't press the issue. If the member takes umbrage, let it go until they're ready.
  3. Don't say illegitimacy or adultery, unless buried in a list. These are only two of many possible NPE explanations.
  4. Suggest a research strategy, such as looking for a Taylor family and a family of the other surname living in close time/place proximity of the brick-walled ancestor's birth.

In another type, eNPE; the member will match the Taylor surname but not others. Let them join the project before discussing.

More

See the page especially devoted to genealogy.

Genetics

Because we're a surname project, we're primarily focused on Y-chromosome DNA. It is, as I write this, undergoing a revolution and there's a risk that the guidance here will be hopelessly outdated within a few years. Nonetheless, we'll cover the 2016 basics and some advanced topics.

A movement is currently underway to establish standards for genetic genealogy, with the focus mostly on genetics. See this. (I have concerns about the proposed standards and, if they go the way feared, may choose to ignore some.)

Resources

Continual learning will be essential in order to keep up with the field. Join ISOGG and read the forum posts. The ISOGG Wiki has much useful information. Go to conferences to rub elbows with other admins; valuable tips can be gleaned.

Unfortunately, most of our peers (admins for multi-origin common surname projects) are not especially participative. Maybe, they're too busy with their projects to look outside.

Read the project website. It's been pitched (mostly) at an introductory level while remaining true to the science. Many advanced (even arcane) topics have been explored in the Resources section.

GAP

The GAP (Group Administrator Pages) are how many project administration tasks are accomplished. Read here for a more detailed discussion.

Testing Modes

There are at least three ways (genealogical reasons) for people to test DNA. I describe them as:

The above terms are my own invention. The subject is little-discussed.

Testing levels

In addition to why people test, there is the matter of whether the test can fulfill its purpose.

Matching

Matching is based on the principle of haplotype similarity. When two or more haplotypes are sufficiently similar, we conclude they derive from the same ancestral source. We do not insist the haplotypes be identical; identity is relatively uncommon after a few DNA transmissions.

Notice that whether the similar haplotypes come from the same source is a conclusion. This may or may not be a fact. Subsequent evidence may disprove a previous conclusion.

Matching and its related activity, grouping into genetic families, are among the most important tasks of project administration. Attend to them assiduously; do not let them "snowbvall".

TFG Standards

We've developed some standards out of necessity for our own use.  See this page.

Terminology

Vocabulary is a problem throughout the genetic genealogy community; there's a lack of standardization, inhibiting communication. Different words can mean the same thing and the same word can mean different things. For the way I've used terms, see the Glossary page.

Match interpretation

With all the technicalities and complexities, let's not lose sight of "A match exists when two (or more) DNA haplotypes are sufficiently similar  to indicate a high probability that two or more individuals share a common ancestor within genealogical time." This definition is, of course relevant to other DNA than Y.

We've defined "genealogical time" as the past 24 generations. It seems fairly well-accepted  and it's as far back as most tools allow.

Further, 24 generations represents seven or more centuries, taking us back to the approximate time of universal surname adoption in England.

We've defined "high probability" as equal to or better than 80%; most TMRCA probabilities we've found are either higher than 90% for 24 generations or much lower than 80%. 

For yDNA, admins differ in approaches & techniques. I use these processes and rules of thumb:

Genealogical significance

"Genealogical significance" is a term to distinguish matches as to relevance to a member's patriline. To deem a match significant signals a member to pursue it further. By contrast, deeming it "not significant" signals the match is of less priority for follow-up.

Grouping

Grouping follows matching and is perhaps the most important task of a TFG admin; it can be the most complex. We've developed a separate page on it here.

TiP

TiP (Time Predictor) is a proprietary FTDNA TMRCA calculator. See our separate page on it.

Triangulation

This technique -- only to follow successful groupings -- is for the purpose of determining the ancestral haplotype for the family. (Read more here.)

Determining the ancestral haplotype simplifies grouping decisions. Potential members can be compared against this one haplotype instead of each others'.

Requirement are:

  1. At least one set of STR results from each known branch of the family;
  2. A minimum of two sets , three or more is preferred;
  3. At least 37 markers for each branch, 67 are preferred. (111 would be preferred but one is unlikely to have enough 111-marker results.)

A potential problem in triangulation is over-weighting one branch relative to others. Ideally, each branch should be equally represented. In actual practice, however, it's not always possible to attain equal weighting; a modal haplotype for the group may be the next best alternative.

Branch analysis

After a genetic family meeting the requirements below has been determined, one may proceed to an analysis to determine the genetic differences among the branches to infer its various branches. Let us emphasize:

This type of analysis uses only the differences between group members. It is thus essential to ensure the analysis is applied only to a qualifying group.

Similarities contribute nothing to the analysis. Markers which show no differences within the group may be ignored.

Requirement are:

  1. At least one set of STR results from each known branch of the family;
  2. A minimum of three sets , four or more is preferred;
  3. At least 37 markers for each branch, 67 preferred. (111 would be preferred but we are unlikely to have enough 111-marker results.)

This author prefers to use Network software from Fluxus Engineering, but other techniques are also valid. See our notes on using Fluxus.

mtDNA

At this writing, mtDNA is problematic for finding a common ancestor via a living cousin. The best TMRCA is 50% for 5 generations and 95% for 22 (exact match on full genome). This seems insufficiently precise for expending much effort.

mtDNA Grouping

At this time, TFG groups mtDNA results only by haplogroup. No attempt is made to identify specific maternal lineages.

Autosomal DNA, Family Finder

At this writing, the project regards this as a valuable means for individuals to find cousins descending from indirect maternal and paternal ancestors. The operative term is individual; it doesn't fit well into surname projects. Consider

However, autosomal DNA, can be a useful adjunct to Y-DNA. We don't inherit just a Y chromosome from our direct paternal ancestors, but also pieces of the other 22 chromosomes.

Glitches & Anomalies

Frustratingly, much admin time and effort seems to be taken up by system glitches and things that don't work or work irrationally.

FTDNA

Family Tree DNA is a good company, honest & reliable; they are admirable in many ways. But communication is not their strongest point. Nor, are their processes and procedures always "best practice" or straightforward. The admin is often caught in the middle.

Customer service

In the Spring of 2015, FTDNA made it more difficult for customers to contact them. Telephone-answering hours were restricted and e-mails were required to be submitted via an online "feedback form". The rationale for the former was to allow more time for customer service staff to respond; for the latter, it was to better classify & route inquiries. Of course, the limitations on contacting FTDNA led to more inquiries to project admins.

That URL is https://www.familytreedna.com/contact.aspx#contactForm (at least in December 2015; it may change). To use it, one must first log in to the FTDNA site. Project admins (only) have a separate e-mail address they can use, groups@ftdna.com; try to limit it to project-wide issues.

You may also find that, when FTDNA does respond, the answers sometimes don't make sense and the customer/member will turn to you. It's a delicate position; on the one hand, you want to be helpful and honest. On the other hand, you don't want to undermine FTDNA credibility. Guidelines:

  1. Refer complaints about FTDNA to FTDNA. Don't deprive them of the feedback.
  2. Be as factual as possible; check your facts.
  3. Own any advice offered; identify it as your opinion.
  4. Try to avoid generalizing; stick to the member's specific situation.

Update: In November 2015, FTDNA's new customer services manager sounded like improvements would be made. More & higher-quality staff is being hired and training emphasized.

FTDNA website

Much of your work will be done on the FTDNA website, those pages known collectively as the "GAP" for Group Administrator Pages. They may not work as intended.

The FTDNA site uses a dynamic content management system, which differs greatly from a static website like this one.

A dynamic system is much more versatile; content can change when the relevant database is updated; today's content will be different than yesterday's with no authoring needed. And, it is more secure than "client-side processing" because all processing occurs on the site server's not on the users' computers. (Users don't see the scripts.) Considering the jobs that the FTDNA site must do, a static site is not an option; a dynamic approach is required.

But development and maintenance of a dynamic system are complex and technically challenging because it's harder to anticipate all possible conditions and interactions. Also, the server load is much increased; with heavy traffic, this can lead to delays and time-out errors.

While technical specifications are, of course, proprietary and not publicly disclosed, a significant clue is that pages on the FTDNA site have the filename extension ".aspx",  indicating they use Microsoft's ASP.NET technology. (aspx stands for "Active Server Page, Extended".)

Within the IT community, there's a hefty debate between the rival ASP, JSP, & PHP camps. The major bone of contention with ASP is that it runs only on a Windows OS, rather than the Linux (Apache) OS of most large servers.

It may be that the information quantity and complexity on the FTDNA site have outgrown what a Windows operating system can handle.

Other puzzles

I imagine you'll find your own.