Estimating a Surname's Patrilines

This page describes a method for estimating the number of paternal lineages for a surname from data contained in a DNA surname project. It is assumed that the actual number can not be determined by other means.


There are minimum requirements to be met for using this method.

  1. Penetration

    The project must have attained a minimum Y-DNA penetration, or tests per 100,000 males with the surname.
  2. Groups, genetic families

    The number of genetic families identified in the project must be known. This should not be a problem for project administrators paying attention.
  3. Singletons

    The number of unmatched "singleton" members must be known.
  4. NPE Rate

    An overall, average estimate must be developed for non-paternal events (AKA, not the parent expected, surname discontinuity). This rate will act a an adjustment to the singletons number because at least some of the singletons will not be refelctive of the surname patrilines.


Each group (genetic family found) represents one patriline.

Add the number of groups to the adjusted singletons number. The formula is

P = G + S*(1-NPE), where

If tracked over time, the number may fluctuate wildly, especially for projedcts with low penetrations rates. A moving average over several periods will smooth out the fluctuations nad more clearly display trends.

Note that, in the above graph, the patrilines moving average tends to parallel (though on different scales) the match rate until a match rate of 50% is attained.