[an error occurred while processing this directive]

On this page:

Item 1

Item 2

The TiP (Time Predictor) Tool

This page is about the Family Tree DNA TiP (Time Predictor)
tool. TiP is made available to the company's customers and project
administrators for their uses but its inner workings are a proprietary
secret of FTDNA and little documentation is provided.

We have made no attempt to "reverse engineer" TiP nor peer inside its
"black box". We have, though, examined the inputs and outputs and try to
explain them here.

What is TiP?

TiP is an online algorithm for calculating the time to the most recent common
ancestor (TMRCA) for ySTR matches. We
believe it is the most reliable of the TMRCA calculators available. In the
set of tools that FTDNA provides for genetic genealogists, we feel it is one
of the most useful and valuable.

Presumably, TiP runs as an Aspect (.aspx) script on the FTDNA servers.

TMRCA calculators, including TiP, have two purposes:

To assess the quality of a match; the higher the probability for a
given number of generations, the better the match and

To predict the probable time (in generations) of a MRCA.

TiP will calculate cumulative probabilities out to 24 generations. We feel this
limit is adequate because it's roughly
consistent with universal surname adoption.

We distinguish between first use of surnames (~1000 AD in much
of Europe) and universality of surnames
for all people, nobles and commoners alike. Surname universality seems to have
occurred
between 1350 and 1400 in England. Before 1350, few English commoners had surnames; by 1400
all did.

How is TiP different?

Unlike other TMRCA calculators,
TiP uses
specific mutation rates of ySTR markers. (Other TMRCA calculators use
average mutation rates for marker sets or no mutation rates.) A step of
genetic difference on a "fast" marker is given less weight by TiP than a
step on a "slow" marker.

TiP results

TiP calculations result in reports (in the more detailed form) like
this:

In comparing Y-DNA67 markers, which show 3 mismatches, the probability that (XXXXXX)
__ Taylor and (YYYYYY)
__Taylor shared a common ancestor within the last...
...1 generation is 4.82%.
...2 generations is 15.93%.
...3 generations is 30.34%.
...4 generations is 45.10%.
...5 generations is 58.40%.
...6 generations is 69.44%.
...7 generations is 78.11%.
...8 generations is 84.63%.
...9 generations is 89.39%.
..10 generations is 92.79%.
..11 generations is 95.15%.
..12 generations is 96.78%.
..13 generations is 97.88%.
..14 generations is 98.61%.
..15 generations is 99.10%.
..16 generations is 99.42%.
..17 generations is 99.63%.
..18 generations is 99.76%.
..19 generations is 99.85%.
..20 generations is 99.90%.
..21 generations is 99.94%.
..22 generations is 99.96%.
..23 generations is 99.98%.
..24 generations is 99.99%.
* Assuming (XXXXXX) __ Taylor and (YYYYYY) __ Taylor do not share a common ancestor in the last 1 generation.

The default form of the report gives probabilities only for 4, 8, 12, 16, 20 & 24 generations.

Comment: Follow-up research by this pair of men revealed that the actual MRCA
(Abraham Taylor, 1685-1751) lived eight generations ago,
for which TiP calculated a cumulative probability of ~85%.

Caveat: The 4
significant digits (two decimal places after the percent) are misleading;
in small-number cases (e.g., a pair) probabilities should not be trusted
to this precision.

The default interval between generations is 4. Probabilities can also be
calculated in generation intervals of 2 and 1 for a finer-grained picture.
Though it's an extra step to recalculate, we recommend single-generation
intervals.

Figure 1:
Four typical cumulative p score curves are shown
for generations numbering up to 24.

The legend for Figure 1 labels the four curves by their 24-generation cumulative probabilities.
These are roughly related to genetic distances, but more related to mutation rates of markers on
which the genetic distances occur.

Some characteristics

Probabilities can approach 100% but never actually reach (especially
not exceed) 100%.

Because these are cumulative probabilities, the curves will always have a positive slope;
the probability for 5 generations will be greater than for 4, etc.

The exact-match curve is a special case; it will have a
"hockey-stick" shape, steep on the left and quickly flattening

Other types of matches will have an inclined S shape. The slopes
will be initially small, then steepen, then flatten out.

As match quality declines, slopes will flatten and final (24-gen)
probabilities decrease.

How do I invoke TiP?

TiP calculations may be performed in either of three ways

From a member's match list: Click on the orange icon to the right
of the matching name.

From the Y-DNA Genetic Distance report: After selecting the member
to match against, click the TiP report link by another member's name. or

From the GAP Home page, select Y-DNA TiP. Then select (from pick
lists) a pair of men to compare.

Initial reports will be in 4-generation intervals. We recommending
then selecting the "every generation" option and recalculating to
get the detailed form shown above.

Paper trail adjustment

It is possible to apply a Bayesian adjustment to the TiP calculation; its effect is to
shift the probability curve to the right. For example, if you absolutely
know that the MRCA
must be at least five (5) generations in the past, you can enter that. number to eliminate the
most recent five generations.

However, the calculation cutoff will still be 24 generations; TiP will not
calculate beyond 24 generations.
We therefore recommend applying this adjustment later, while working with the TiP scores.
Simply add 5 to each generation: 1 becomes 6, 2 becomes 7, etc.

Does the number of markers compared matter?

Yes and no. One should always use the highest available resolution for
TiP calculations instead of lower resolutions. More markers yield greater
confidence that the calculated probabilities reflect reality and they add
somewhat to the precision of the probabilities.

Further, resolutions of 12 and 25 markers (for complex, technical
reasons) are not adequate to yield sufficient confidence in the
TiP-calculated probabilities. We do not recommend ussing resolutions less than 37
markers.

However, the generated probability curves do not significantly differ in shape or other characteristics.

How reliable is TiP?

We believe it is very reliable, based on a fairly thorough examination of
the algorithm's results.

In summer 2012, we did a study in which we compared the cumulative
37-marker, 24-generation TiP
scores for all 238 R1b project members who had at that time tested at least 37 markers. This resulted in a matrix of 57,120 pairs of scores in this form:

Tip Scores:
37 markers, 24 generations

ID

A

B

C

D

A

__

99%

5%

1%

B

99%

__

0.5%

2%

C

5%

0.5%

__

20%

D

1%

2%

20%

__

Figure 2: Entire distribution
Figure 3: Right tail

We found that these scores discriminated well between genealogically
significant and insignificant matches. We saw a sharp delineation between
high scores and low. More than 99% of scores were less
than p=80%; only 6% were higher than p=30%. Further, there was a sharp distinction between low scores and
high, as seen in these graphs:

Figure 2 shows the entire frequency distribution for all pairs of
scores; notice that the frequency distribution peaks at p<=1% and has a long right tail. Figure 3 expands the view of the
tail for p=>30%.

At the time of the study, there was an insufficient number of members with resolutions >37 for full assessment.
However, limited sampling suggested frequency distributions would be similar for 67 & 111 markers.

We concluded that TiP is as reliable (or more) as any TMRCA
calculator based solely on ySTR markers.

Since our study, TiP underwent another revision in Spring 2016. This latest revision has us questioning its reliability and discriminatory capability.

The Mathematical Model

TiP is based on a family of probability distributions called "gamma distributions". The number and type of mismatches are
modeled by assigning two parameters which govern the shape and scale of the
probability curves.

Limitations

TiP, as with all probability estimates, only approximates the "truth".
In the small numbers with which we typically deal,
the approximations are not exact. Do not over-interpret the probabilities to
be as precise as four significant digits implies.

The foundations of TiP (marker mutation rate data) are less robust than
we'd prefer. There are variances (possibly large) in mutation rates between
and within patrilines. While TiP can give us most likely estimates (MLEs) it
can not tell us how much confidence to put in an MLE or when/how it should
be adjusted.

TiP does not account for haplotype convergence, AKA "coincidental matches".
With more and better ySNP testing since 2013, we've observed that some
matches (<=5%?) are between men of different R1b subclades; these pairs
of men can not have a MRCA within thousands of years. Therefore, the matches
are not due to shared paternal ancestry but to something we call
"convergence". (Presently, we have insufficient evidence as to
whether, or to what extent, convergent matches are found in other
haplogroups than R1b.)

We recommend ySNP testing as a validity check on ySTR matches.
Should the testing show that a pair of men can not share a common
ancestor within genealogic time, the match is not valid.

What can I do with TiP?

The first use of TiP is to evaluate match quality. A TiP score of 95% at
X generations indicates a better match than one of 70% at the same number of
generations.

TiP can help focus research (by which we mean traditional, documentary
genealogy) to identify the MRCA. Such research requires substantial "bets"
of time, effort and sometimes money to investigate -- often intensively --
particular times and places. It helps to know which bets are most likely to
pay off in positive findings.

Example

To take the report from above, we can calculate the probability that the MRCA lived in a particular generation, as follows

Figure 4: Example of Cumulative Probability

Figure 5: Example of per-gen Probability

Generations

Cumulative Probability

Per Gen

Per 3 Gen

1

4.82%

4.82%

4.82%

2

15.93%

11.11%

15.393%

3

30.34%

14.41%

30.34%

4

45.10%

14.76%

40.28%

5

58.40%

13.30%

42.47%

6

69.44%

11.04%

39.10%

7

78.11%

8.67%

33.01%

8

84.63%

6.52%

28.23%

9

89.39%

4.76%

19.95%

10

92.79%

3.40%

14.68%

11

95.15%

2.36%

10.52%

12

96.78%

1.63%

7.39%

13

97.88%

1.10%

5.09%

14

98.61%

0.73%

3.46%

15

99.10%

0.49%

2.32%

16

99.42%

0.32%

1.54%

17

99.63%

0.21%

1.02%

18

99.76%

0.13%

0.66%

19

99.85%

0.09%

0.43%

20

99.90%

0.05%

0.27%

21

99.94%

0.04%

0.18%

22

99.96%

0.02%

0.11%

23

99.98%

0.02%

0.08%

24

99.99%

0.01%

0.05%

What we've done above is to use the cumulative probabilities to establish
a most likely estimate (MLE) for the generation of the MRCA. It's done by,
for each generation, subtracting from its cumulative probability the
probability for the next lower generation. This number is
only a rough guide.

You may notice that TiP gives a most likely estimate of 4 generations for the MRCA
but the actual number is 8. Such a difference is fairly typical and points up the need to take
these numbers with a grain of salt. The difference suggests that this
patriline has experienced fewer than the average (expected) number of
mutations.

TiP History

The TiP algorithm has not remained static over the years. We know of at least three editions,
each producing somewhat different results, depending on the differences
between the pair of haplotypes compared. We've named them TiP1 (before
December 2012), TiP2 (Dec. 2012 to Feb. 2013) and TiP3 (since Feb. 2013);
TiP3 is, we think, still the current version.

The change from TiP1 to TiP2 was intended to better handle mutations in
multi-copy markers and null values. (Changes not reflected in FTDNA counting
of genetic distance until Summer 2016.) However, anomalous results were seen
in some instances, caused by bugs. The change from TiP2 to TiP3 was to fix
the bugs.