Other pages & sections of our site:
[Home]  [Y-DNA]   [Contacts   [Groups]  [Haplogroups[Trees]  [Project Blog]  [Special]   [FAQ]
On this page:
 

Triangulation

Triangulation is a method for deriving additional meaning from Y-DNA matches beyond the probability of a common paternal ancestor within a particular time frame. The word was coined by Bill Hurst in 2004 and refined by Charles Kerchner.

Triangulation is an advanced method in genetic genealogy analysis. It is not always possible and may not always be fruitful.

Metaphor

The term comes from navigation, where bearings (directions) to, or distances from, three or more known locations are used to establish the estimated position of another object (such as a ship sailing in a bay).

triangulation by bearing
Triangulation by bearing requires directions to reference points.
triangulation by distance
Triangulation by distance is the method employed by electronic systems such as GPS.

Notice the red triangles; the three bearing lines or distance arcs of position don't all cross at the same point. Navigators call it a "cocked hat" and it represents the area of uncertainty in the position fix. We know the target is inside the triangle, but not exactly where within it.

Uncertainty exists in the relatively simple problem of locating a physical object. Uncertainty is not lessened when dealing with the randomness inherent in DNA mutations.

Triangulating with DNA

DNA trinagulation

Triangulating with Y-DNA is similar, except that we're metaphorically on shore looking out to sea at the ship. We're trying to use known marker/allele values of descendants to:

  1. Estimate the common paternal ancestor's haplotype; &
  2. See if differences from that haplotype help identify specific branches of decent on the paternal tree.

We will use both qualitative (which markers differ) and quantitative (by how much) methods to accomplish the second part of the process.

Prerequisite

We must start with a group or cluster of results which meet specified criteria for a "match". We need a reasonable likelihood from the Y-DNA matches that all the members of the group do share a common male ancestor within genealogical time. Without confidence that an ancestral haplotype exists or once existed, there is no point in proceeding further.

Triangulation only works within genetic families!
Do not attempt unless the genetic connections have been established.

It is recommended that the group determination be made solely on the basis of DNA results. A group member who "fits" by documentary genealogy but does not have sufficient haplotype similarity with the rest of the group, should be excluded from the triangulation analysis. If included, this member will represent an isolated branch.

Fewer markers needed

If we have met the prerequisite condition we will find that -- due to high similarity between the haplotypes -- only a few markers (of the many we used to establish the matches) will show any differences in values. this will allow us to focus on those few markers.

Those markers which show no differences in values add little to the triangulation analyses. They have done their job by establishing the genetic relationship and giving us a TMRCA. They will not distinguish one member of the group from another.

Borrowing from Bayes

Meeting the prerequisite for an adequate match between all haplotypes in the dataset allows us to apply Bayesian analysis to our group (cluster). A shared paternal ancestor at some not-too-distant point in the past becomes a "given", the a priori condition.

Group Size

The size of this group matters, as does the number of markers tested by each member. Less than three points of reference per marker don't tell us much. (Imagine removing one or two of the lines in the navigation diagrams.) As we increase the number of points of reference, the uncertainty space ("cocked hat") diminishes in size. (For more discussion, see "Doubt & Uncertainty".) 

A further consideration for larger clusters is to adequately sample the individual branches, sometimes termed "sub-clusters". A minimum sample is three and more are preferable.

Sampling Procedure

Some suggest that known branches of a paternal line should be sought out for DNA testing. This fits perfectly with the navigation metaphor, in which bearings are taken on points whose positions are known exactly.

But, this ideal may be a more rigorous sampling method than can be feasibly employed. When family history is the unknown, testing all branches is an impossibility. Often, we must take the results available to us and adjust from there.

Identifying the Ancestral Haplotype

Emily Aulicino: "If two DNA tests of the descendants of two sons of a common ancestor match, you know the haplotype of the common ancestor.
If there is a mutation (not a perfect match), then one must test the branches of other sons of the common ancestor to determine which branch has the haplotype for that common ancestor and where the mutations occurred in the branches which differ from that common ancestor’s haplotype. The idea is to have two or more lines of descent for each branch that differs from the other branches of the common ancestor."

An allele value is assigned to each marker to describe the estimated haplotype of the common paternal ancestor. These allele values will serve as the "benchmark" for comparing  individual members of the group. Two cautions:

  1. This is only an estimation of the haplotype. The procedure is abstract and imperfect.
  2. This does not identify him. DNA does not tell us his name, dates or other characteristics. He may, in fact, have been undocumented.

Modal Values

The modes are usually the best measure of each marker's central tendency. Modes are less affected by variations within the data. For example, if a group of three (3) men all have DYS458=17, the mode will not be affected if a fourth man with DYS458=18 is added to the group.

The mode is more stable than other statistics. However, like the mean it may be biased as to estimating the ancestral haplotype. If one branch dominates the group, reducing the weight of this branch may be valid.

Number of Markers

It's best to have as complete descriptions as possible of the individual haplotypes. Completeness is, of course, determined by the number of markers with data; more is better.

While 37 markers may be adequate to establish that all the group members belong to the same genetic family, 37 may not be enough to show the branches (and twigs) within the family. Some experts recommend no fewer than 67 markers and some maintain that 111 are needed.

Missing marker data?

Some of your genetic family  may not have tested all available markers; you have no data for them on those markers. And, it may not be possible to get everyone to test 111 markers.

What to do? Techniques for dealing with this problem include:

Identifying Branches

triangulating the tree

Once the ancestral haplotype has been determined, each member is compared to it and should have only a few differences from this benchmark. Those differences are taken as representing mutations through the generations since the common ancestor.

Each observed mutation will have been passed down by an ancestor. When more than one member of the cluster shares a marker value, this may identify a branch. If it so proves, we may refer to this marker value as a "signature" for the branch. (Warning: This signature has no meaning outside the context of this particular matching cluster; the marker value may be common in other clusters.)

Triangulation diagram

Larger clusters may display multi-marker signatures, in which two or more members show similar departures from the ancestral haplotype at more than one marker.

Example:

Let us look at a matching group from Taylor Family Genes with 10 members, all of whom have tested at least 37 markers (5 have tested 67.). It is a tightly-matched group; all have a high probability of sharing the same common paternal ancestor within genealogically significant time. There are only 11 total mismatches in marker/allele values across all 520 tested.

Looking at the 37 markers tested by all:

Above, we see the "trunk" haplotype represented and five (5) to seven (7) main branches from it, as well as possibly two subsidiary branches. Additional men matching this group may either augment these branches or show yet more branches.

How do the paper trails compare ?
  1. Jack (on the trunk) traces his earliest known paternal ancestor (EKA) to a Robert Taylor (1688 Old Rappahannock, VA  - 1758 Edgecombe, NC ) 
    • Joe (also on the trunk) has an EKA of Robert Taylor (1715 xx - 1807 NC), the son of Jack's EKA and also in Jack's line of descent.
    • Jim's EKA is also Robert Taylor (1715 xx - 1807 NC), suggesting the -1 @ CDYb represents a main branch.
    • Hal's EKA is Jesse Major Taylor (1798 VA - 1882 MO), possibly a grandson of Joe's EKA?
    • Dan's EKA is Thomas Alexander Taylor (b.1924 Polk Co, TX)
  2. Sam's EKA is John Winchester (on a 1768 tax list)
    • Tom's EKA is William D. Taylor (1806 ? -1870 NC)
    • Dave's EKA is __ Winchester (1790 Rowan Co., NC - ?).
  3. Verne's EKA is William Ballard Taylor (b.1840, Forsyth Co. GA)
  4. John has not provided his EKA.
  5. Lou's EKA is George Edward Taylor (ca1797/98 Brunswick Co., VA - 1841/49 Brunswick Co., VA)

Cladograms

A cladogram is simply a diagram of the genetic relationships within the group; its primary purpose is to help in visualizing the genetic branches. Here is an example for the group described above.

Fluxus Network diagram
Fluxus Network Diagram

In this diagram, the multi-colored circle in the center represents the "trunk" and the arms represent branches.

Age of Branches

A key question in the triangulation analysis is "When did this branch diverge from the ancestral haplotype?"

Chain of transmission events

The diagram on the left shows how two descendants are connected to each other through a common ancestor (CMA) by chains of DNA transmission events (TE). The TE chains are not necessarily of equal lengths.

Each TE introduces the possibility of mutations, which may occur at any point in the chains. A particular mutation, however, which is found in one chain but not the other may help to

  1. Identify the particular branch, and
  2. Estimate the age of the branches' divergence from the ancestral haplotype.

Using mutation rates

While the prospective chance of any particular mutation happening within genealogic time is small, we know, retrospectively, that this mutation did happen and can use that fact to roughly estimate how many generations have elapsed between the ancestral haplotype and the current descendants.

The marker's average mutation rate -- usually given in mutations per generation -- is a guide to branch age. The inverse of this fraction is generations per mutation and the 50% probability number; that is, the mutation is equally as likely to have happened in that interval of generations as before.

Non-mathematical Approaches

Mutation rates are affected by the father's age at son's conception.; they double in risk from age 20 to age 58. (Source.) Some very rough guides:

Probability Approaches

At each TE in a particular chain, the mutation either happened or did not. How many generations are needed for us to have some level of confidence that the number includes the generation in which the mutation happened?

Comment: Mathematics still in progress.

Binomial -- We might call this a Bernoulli experiment; how many trials (TE, we'll call that number "n") does it take to get  a decent probability of the number of mutations we see. It is similar to drawing the one red marble out of a jar with dozens to thousands of white marbles.

The general formula for the number of "successes" (call that "k") in n trials is
P(k,n) = n!/[k!(n-k)!] pk(1-p)n-k
where p is the probability (mutation rate) on each trial.

Conclusion

Triangulation does not eliminate the necessity of document-based genealogical research. It can, however, help more closely focus that research on specific times and places.

Triangulation methodology does not yield exact or certain results.


Revised: 27 Mar 2014