Relationships of S & G
This page is about possible relationships between S (number of singletons)
and G (number of groups)  both variables being components of F (found
lines). This is a further exploration, to see what relationships there might
be.
Caveats
Penetration
James Irvine (focusing on YDNA surname projects) defines a project's
penetration as number of participants relative to world population and
assigns the designator P to it.
Penetration for most projects is very low, on the order of 1 to 100
participants per
100,000 bearing the surname. Low penetration may disguise any relationship effects from
appearing in actual project data.
Actual Data
Actual data at the moment, consists of  essentially  one project. The author knows of no other projects with
data for comparison. It is, of course, foolhardy to generalize from a sample of
one.
Other projects are invited to contribute data for comparison.
Singletons
It is sometimes assumed (unconsciously?) that singletons are artifacts of
either NPEs or insufficient sampling ("penetration"). Let me propose, as an alternate view,
the "Last Mohican" perspective that some project participants may represent
the last surviving male of their patrilines. If true, singletons are essential for a
project to have completed surveyed its surname's YDNA.
Update Nov. 2016: The above hypothesis holds true only for
project members with direct p[aternalancestry of the surname. It does
not hold for those who claim maternal relationship to the surname.
Symbols & Notation
Review:
F = G + S
Where F = found lines (an integer), G = number of groups (not total membership
of groups, also an integer), S = number of singletons (an integer).
Definitions: A group consists of two or more participants with
matching YDNA, according to the project's match rules. A singleton is a
participant with no matches found within the project.
New Symbols
 N = Total project participants (an integer)
 P = Project penetration relative to world's total target population,
a fraction. 0>= P <=1.
 Δ = finite change in a variable; ΔS is a finite change in S. By
definitions of N, S & G, this is an integer.
 ΔS/ΔN = rate of (finite) change in S with respect to change in N.
 d = an infinitesimal change in a variable, dS is a tiny (not
necessarily integer) change in S.
 dS/dN = "instantaneous rate of change in S with respect
to change in N".
 This is, in analytic geometry, the slope of a function's curve;
the shape of the curve can be obtained by integration, e.g.,
∫(dS/dN)dN
T.
Limits of G & S
S & G are
inversely related. For equal project size, if G (through some magical
process) is to increase, S must decrease. And if (through another
magical process) S is to increase, G must decrease.
To remove the "magical process", we might imagine
different projects with the same target population, but different
samplings.
Let’s call
the project size N, for number of participants; a change in project size is
ΔN; and average Group size is GS. There are three hypothetical scenarios
illustrating the limits of S & G.

The “no
one matches anyone” scenario: All participants are singletons;
there are no groups.
S will have an absolute maximum (upper
limit) of N, in which case G=0.
Lim(S, Gà0)
= N

The
“everyone matches someone” scenario: There are no singletons; all
participants belong to groups.
G will have an upper limit of N/2
because the minimum size group is 2.
Lim(G, Sà0) = N/2.
(This is a case in which GS=2.)

In the real world, group sizes are =>2; average group size is GS>2.
 Diana Gale Mathieson has said (paraphrased) that a project’s DNA survey phase
can’t be considered complete until S=0 & minimum group size >=3. (I
question the statement’s truth and its basis.)
 The “everyone matches everyone” scenario: There is one group. to
which all participants belong; there are no singletons. This may apply to
singlesource (usually, rare) surnames.
The lower limit of G = 1; lower limit of S = 0.
 Upper & lower limits summary
Upper limits 
Lower limits 
Lim(S, Gà0)
= N 
Lim(S, 1>=G<=N/2)=0 
Lim(G, Sà0)=N/2 
Lim(G, 0>=S<=N )=0 
In the real world, N is not fixed but, mostly & hopefully, increases over time. For ΔN,
 Most probable ΔS ≈ (1F/A)*ΔN, dS/dN → (1F/A) and
 Most probable ΔG ≈ F/A*(ΔN/GS), dS/dN → F/A/GS
 Because F=G+S, we have G & S on both sides of the derivatives. this problem could be
sorted out through algebraic manipulation.
 Most probable ΔS ≈ [1(G+S)]*ΔN = ΔN  GΔN  SΔN, ΔSΔN+S ≈ ΔN*(1G)
 Most probable ΔG ≈ ΔN/[A/GS]
 However, describing curve slopes in probabilistic
terms doesn't warrant further refinement; “most probable” is too often a crapshoot.
At the extremes of F/A, (0 <= F/A <= 1):
 At F/A=0, ΔS/ΔN(1F/A) = 1, ΔG/ΔN = F/A = 0.
(This would seem to be an impossible case;
it supposes a project with no participants, i.e.. S=0, G=0.)
 A more
realistic limit is at N=1; S=1, G=0, F/A=1/A. ΔS/ΔN ≈ (11/A), ΔG/ΔN ≈ 1/A.
 Assuming A=10, dG/dN ≈ 1/10 ≈ 10%, dS/dN ≈ (11/10) ≈ 90%
 Assuming A=100, dG/dN ≈ 1/100 ≈ 1%, dS/dN ≈ (1
1/100) ≈ 99%.
 At F/A=1, ΔS/ΔN=(1F/A)=0,
ΔG/ΔN=F/A=1.
At low values of N:
 At N=1, G=0, S=1,
 N=2, 0<=G<=1, 0<=S<=2
 N=3, 0<=G<=1, 0<=S<=3
 N=4, 0<=G<=2, 0<=S<=4
 N=5, 0<=G<=2, 0<=S<=5

 N=6, 0<=G<=3, 0<=S<=6
 N=7, 0<=G<=3, 0<=S<=7
 N=8, 0<=G<=4, 0<=S<=8
 N=9, 0<=G<=4, 0<=S<=9
 N=10, 0<=G<=5, 0<=S<=10

G_{max} follows this law: For even numbers G_{max} = N/2 and for odd numbers G_{max}
= (N1)/2.
And, S_{max} = N.
Rates of change between S & G
From the definition of a group as >=2:
 Minimum ΔS/ΔG = 1 (Singleton joins a prior
group.)
Projects with differing target populations
For projects of different target populations, is there any reason why there should be any SG
relationships between one project and another. Should Adair YDNA behave
the same as Baker or Cruse?
High penetration levels
Until now, we have assumed low penetration levels which fit with present
experience, that is P<<<1, in a range of approximately
10^{5} < P < 10^{3}, .
Within this range, the population being sampled is statistically "infinite"
relative to sample size.
As N → W (for world population)  i.e., P→ 1  we might
expect G & S to behave differently than at present low penetration
levels. This supposes participation in the thousands for rare surnames, in the
hundreds of thousands or millions for common surnames.
At P=1 or P≈1, what is the relationship between N, S & G?
 as N → W & P→ 1, the ratio F/A → 1.
 F (=G+S) will have reached its maximum (upper limit), i.e., F=F_{max}
&
ΔF/ΔN=0, for the reason that the population to be sampled
is exhausted. .
 G will have reached its maximum; G=G_{max}
& ΔG/ΔN=0, for the
same reason.
 S will consist only (or, at P<1, primarily) of those
whose lines have no other surviving descendants.
S will have reached its maximum, S=S_{max} & dS → 0
At 0.5>= P <=0.9, what is the relationship between N, S & G?
This is a samplingwithoutreplacement problem in which the sample is large
in relation to the population being sampled. .Binomial probabilities apply.
Imagine a bucket of balls of different colors: The number originally in the
bucket is W, the number of colors is A (number of ancestral lines), and the
number of balls already picked (selected & examined) is N. What are the chances
that a ball picked at random will match one or more of the balls previously
picked?
With success defined as matching one of the existing groups or singletons 
thus creating a new group 
Pr(k=x), for exactly "k" successes (matches) in "n"
trials (new participants):
Pr(k) = n!/[k!*(nk)!]
* (F/A)^k * (1F/A)^(nk)
Let k = ΔN = 1.
As nk → 0, (nk)! → 1