Triangulation is a method for deriving additional meaning from Y-DNA matches
beyond the probability of a common paternal ancestor within a particular
time frame. The word was coined by Bill Hurst in 2004 and refined by
Triangulation is an advanced method in genetic genealogy analysis. It is
not always possible and may not always be fruitful.
The term comes from navigation, where bearings (directions) to, or distances from, three or more known locations are used to establish the estimated position
of another object (such as a ship sailing in a bay).
Triangulation by bearing requires directions to reference points.
Triangulation by distance is the method employed by electronic systems such as GPS.
Notice the red triangles; the three bearing lines
or distance arcs of position don't all cross at the same point. Navigators call it a
"cocked hat" and it represents the area of uncertainty in the position fix.
We know the target is inside the triangle, but not exactly where within it.
Uncertainty exists in the relatively simple problem of locating a physical object.
Uncertainty is not lessened when dealing with the randomness inherent
in DNA mutations.
We must start with a group or cluster of results which meet specified criteria
for a "match". We need a reasonable likelihood
from the Y-DNA matches that all the members of the
groupdo share a common male
ancestor within genealogical time. Without confidence that an ancestral haplotype exists or once existed,
there is no point in proceeding further.
Triangulation only works within genetic families!
Do not attempt
unless the genetic connections have been established.
It is recommended that the group determination be made solely on the basis of DNA
results. A group member who "fits" by documentary genealogy but does
not have sufficient haplotype similarity with the rest of the group, should be
excluded from the triangulation analysis. If included, this member will
represent an isolated branch.
Fewer markers needed
If we have met the prerequisite condition we will find that -- due to high similarity between the haplotypes --
only a few markers (of the many we used to establish the matches) will show any
differences in values. this will allow us to focus on those few markers.
Those markers which show no differences in values add little to the
triangulation analyses. They have done their job by establishing the genetic
relationship and giving us a TMRCA. They will not distinguish one member of the
group from another.
Borrowing from Bayes
Meeting the prerequisite for an adequate match between all haplotypes in
the dataset allows us to apply Bayesian analysis to our group (cluster). A
shared paternal ancestor at some not-too-distant point in the past becomes a "given",
the a priori condition.
The size of this group matters, as does the number of markers tested by each
member. Less than three points of reference per marker don't tell us much.
(Imagine removing one or two of the lines in the navigation diagrams.) As we
increase the number of points of reference, the uncertainty space ("cocked hat")
diminishes in size. (For more discussion, see "Doubt
A further consideration for larger clusters is to adequately sample the individual branches,
sometimes termed "sub-clusters". A minimum sample is three and
more are preferable.
suggest that known branches of a paternal line should be sought out for DNA testing.
This fits perfectly with the navigation metaphor, in which bearings are
taken on points whose positions are known exactly.
But, this ideal may be a more rigorous sampling method than can be feasibly employed. When family history is the unknown, testing all branches is
an impossibility. Often, we must take the results available to us and adjust from
"If two DNA tests of the descendants of two sons of a common ancestor match,
you know the haplotype of the common ancestor.
If there is a mutation (not a perfect match), then one must test the
branches of other sons of the common ancestor to determine which branch has
the haplotype for that common ancestor and where the mutations occurred in
the branches which differ from that common ancestor’s haplotype. The idea is
to have two or more lines of descent for each branch that differs from the
other branches of the common ancestor."
An allele value is assigned to each marker to describe the estimated
haplotype of the common paternal ancestor. These allele values will serve as
the "benchmark" for comparing individual members of the group.
This is only an estimation of the haplotype. The procedure is abstract
This does not identify him. DNA does not tell us his name, dates or other characteristics.
He may, in fact, have been undocumented.
The modes are usually the best measure of each marker's
central tendency. Modes are less affected by variations within the data.
For example, if a group of three (3) men all have DYS458=17, the mode will
not be affected if a fourth man with DYS458=18 is added to the group.
The mode is more stable than other statistics. However, like the mean
it may be biased as to estimating the ancestral haplotype. If one branch dominates
the group, reducing the weight of this
branch may be valid.
Number of Markers
It's best to have as complete descriptions as possible of the individual haplotypes.
Completeness is, of course, determined by the number of markers with data; more is better.
While 37 markers may be adequate to establish that all the group members
belong to the same genetic family, 37 may not be enough to show the branches
(and twigs) within the family. Some experts recommend no fewer than 67 markers and some maintain that
111 are needed.
Missing marker data?
Some of your genetic family may not have tested all available markers; you
have no data for them on those markers. And, it may not be possible to get
everyone to test 111 markers.
do? Techniques for dealing with this problem include:
Exclude them from the analysis. This may significantly reduce your
Include and assume that -- on the missing markers -- they have no differences from the ancestral haplotype. This usually suspicious procedure allows you to proceed when you
otherwise could not.
Once the ancestral haplotype has been determined, each member is compared
to it and should have only a few differences from this benchmark. Those
differences are taken as representing mutations through the generations
since the common ancestor.
Each observed mutation will have been passed down by an ancestor. When more
than one member of the cluster shares a marker value, this may identify a
branch. If it so proves, we may refer to this marker value as a "signature" for
(Warning: This signature has no meaning outside the context
of this particular matching cluster; the marker value may be common in other
Larger clusters may display multi-marker signatures, in which two or more
members show similar departures from the ancestral haplotype at more than one
Let us look at a matching group from Taylor Family Genes with 10 members,
all of whom have tested at least 37 markers (5 have tested 67.). It is a
tightly-matched group; all have a high probability of sharing the same
common paternal ancestor within genealogically significant time. There are only 11 total mismatches in
marker/allele values across all 520 tested.
Looking at the 37 markers tested by all:
Dan, Hal, Jack &
Joe have no differences from the modals of
the marker values in the first 37 markers they tested. They may be considered as representing the ancestral
haplotype (the "trunk" of the tree) for those 37 markers..
Sam & Tom are +1 from the modal @ DYS534. Consider them as representing a branch
from the trunk.
In addition, Tom is -1 @ DYS537 -- a branch of a branch?
Verne is +1 @ DYS444 -- another branch.
Dave is -1 @ DYS19, a fourth main branch (This is the only mismatch
in the 12-marker panel.)
John has two differences from the modals, -1 at DYS464a and +1 at
DYS570. These two mutations from the modal suggest that John represents
a subsidiary branch of a fifth main branch.
Lou is -1 @ CDYa. This marker is so volatile that many do not
consider it. If we did, it would be a sixth branch
Jim is -1 @ CDYb. This marker, also, is often not considered. It
may be a seventh branch.
Above, we see the "trunk" haplotype represented and five (5) to seven (7) main
branches from it, as well as possibly two subsidiary branches. Additional
men matching this group may either augment these branches or show yet more
How do the paper trails compare ?
Jack (on the trunk) traces his earliest known paternal ancestor (EKA) to
a Robert Taylor (1688 Old Rappahannock, VA - 1758 Edgecombe, NC )
Joe (also on the trunk) has an EKA of Robert Taylor (1715 xx - 1807 NC), the son of Jack's EKA
and also in Jack's line of descent.
Jim's EKA is also Robert Taylor (1715 xx - 1807 NC), suggesting the
-1 @ CDYb represents a main branch.
Hal's EKA is Jesse Major Taylor (1798 VA - 1882 MO), possibly a grandson of Joe's EKA?
Dan's EKA is Thomas Alexander Taylor (b.1924 Polk Co, TX)
A key question in the triangulation analysis is "When did this branch
diverge from the ancestral haplotype?"
Chain of transmission events
The diagram on the left shows how two descendants are connected to each other
through a common ancestor (CMA) by chains of DNA transmission events (TE).
The TE chains are not necessarily of equal lengths.
Each TE introduces the possibility of mutations, which may occur at any
point in the chains.
A particular mutation, however, which is found in one chain but not the
other may help to
Identify the particular branch, and
Estimate the age of the branches' divergence from the ancestral haplotype.
Using mutation rates
While the prospective chance of any particular mutation happening within genealogic
time is small, we know, retrospectively, that this mutation did happen and can use that fact to roughly estimate
how many generations have elapsed between the ancestral haplotype and the
The marker's average mutation rate -- usually given in mutations per
generation -- is a guide to branch age.
The inverse of this fraction is generations per mutation and
the 50% probability number; that is, the mutation is equally as likely to
have happened in
that interval of generations as before.
Mutation rates are affected by the father's age at son's conception.; they double
in risk from age 20 to age 58.
Some very rough guides:
A generation in which the father is older than 45
Mutations are 1.5 times as likely as average.
A generation in which sons' births are separated by decades.
At each TE in a particular chain, the mutation either happened or did not. How many generations are
needed for us to have some level of confidence that the number includes the generation in which the
Comment: Mathematics still in progress.
Binomial -- We might call this a Bernoulli experiment; how many trials
(TE, we'll call that number "n") does it take to get a decent
probability of the number of mutations we see. It is similar to drawing the one red marble out of
a jar with dozens to thousands of white marbles.
The general formula for the number of
"successes" (call that "k") in n trials is P(k,n) = n!/[k!(n-k)!] pk(1-p)n-k
where p is the probability (mutation rate) on each trial.