Measuring Haplotype Rarity
Comparison of Measurement Systems
Comparing the measurement methods we see these distributions:
Comparing the measurement methods we see these distributions:
They appear very different in Figures 1012 (repetitions of 4, 6, 7 & 8). Are they as different as they look?
Do the systems consistently with each other identify haplotype positions on the spectrum of commonness/rarity? Scatter plots partially answer,
Scatter Plots for 67 Markers s  

Figure 13 
Figure 14 
Figure 15 
Scatter Plots for 37 Markers  
Figure 16 
Figure 17 
Figure 18 
There is considerable "noise" in the data but the scatter plots show three different patterns:
We also attempted to assess comparability of the three systems, using a Pearson’s correlation. These results were obtained:
Table 10 Correlation of System Scores< 


12 mkrs  25 mkrs  37 mkrs  67 mkrs  
n=  4940  4203  3824  1953  
Wheaton  Ratio  Wheaton  Ratio  Wheaton  Ratio  Wheaton  Ratio  
Wheaton  *  0.35  *  0.42  *  0.4395  *  0.582 
Ratio  0.35  *  0.417  *  0.44  *  *  
Deviation  0.52  0.36  0.517  0.3  0.254  0.0797  0.2469  0.031 
Significance ^{[37]}  
Wheaton  *  <0.0001  *  <0.0001  *  <0.0001  *  <.0001 
Ratio  <0.0001  *  <0.0001  *  <.0001  *  <0.0001  * 
Deviation  <0.0001  <0.0001  <0.0001  <.0001  <0.0001  <0.0001  0.0854 
Correlation coefficients are generally small, exceeding 0.5 in only three instances. They can account for only about onethird (or less) of the variances. However,
With Taylor data only^{[38]}, we also ranked haplotypes by scores in each system in order of lowest scores first; tied scores were assigned the same rank and subsequent ranks adjusted to account for the ties.
This produced larger correlation coefficients. We concluded:
Again with Taylor data only, we investigated the question of whether the measurement systems were consistent across marker sets, Did haplotypes score the same at different numbers of markers? It was, of course, possible to look at only the 50% who had results for all marker sets.
At first glance, scores seemed inconsistent across the marker sets. However, correlation coefficients were mostly high, ranging from a low of 0.179 to a high of 0.998. The least significance observed was p=0.0123^{[39]} (Wheaton: 25 vs. 67).
Table 11 Comparability Across Marker Sets 


N= 196  Wheaton  Ratio  Deviation  
12 mkrs  37 mkrs  12 mkrs  37 mkrs  12 mkrs  37 mkrs  
12 markers  *  0.665  *  0.68  *  0.755 
25 markers  0.731  0.931  0.703  0.984  0.998  0.695 
37 markers  0.665  *  0.680  *  0.755  * 
67 markers  0.233  0.265  0.285  0.341  0.695  0.923 
Significance 

12 markers  *  =0.001  *  <.0001  *  <.0001 
25 markers  <.0001  <.0001  <.0001  <.0001  <.0001  <.0001 
37 markers  <.0001  *  <.0001  *  <.0001  <.0001 
67 markers  0.001  =.0002  <.0001  <.0001  <.0001  * 
Haplotypes were ranked by scores for each marker set in each of the three methods and then the rankings were compared. Correlation coefficients for rankings ranged from 0.181 to 0.9384 and least significance was p=0.0113,
Table 12 Crosscomparability of Rankings 


Wheaton  Ratio  Deviation  
n= 196  12 mkrs  37 mkrs  12 mkrs  37 mkrs  12 mkrs  37 mkrs 
12 markers  *  0.588  *  0.662  *  0.211 
25 markers  0.685  0.890  0.759  0.938  0.776  0.210 
37 markers  0.588  *  0.662  *  0.181  0.249 
67 markers  0.214  0.316  0.285  0.324  0.211  * 
Significance 

12 markers  *  <0.0001  *  <0.0001  *  =0.011 
25 markers  <0.0001  <0.0001  <0.0001  <0.0001  <0.0001  =0.001 
37 markers  <0.0001  *  <0.0001  *  =0.011  
67 markers  =0.0027  <.0001  <0.0001  <0.0001  =0.003  =0.0004 
Comment:
While the correlations are significant, at least some is due to
inclusion of smaller marker sets within larger; the 67marker set includes the 37, as
37 includes 25, etc. It is specifically not suggested that scores for markers 112
are related to scores for markers 1325, 2637 or 3867. We suspect that scores for
markers 112 will not correlate with 1325, etc.
§ Correlation between Wheaton & Ratio scores is positive and significant at all resolution levels. Observed correlations are less than 1/100 percent due to chance.
§ Correlation of Deviation scores with Wheaton are positive & significant at all resolution levels. Observed correlations are less than 1/100 percent due to chance.
§ Correlation of Deviation scores with Ratio scores is weaker, though positive. It is less significant at 67 markers; the correlation is less than 1/10 due to chance.
With Taylor data only^{[1]}, we also ranked haplotypes by scores in each system in order of lowest scores first; tied scores were assigned the same rank and subsequent ranks adjusted to account for the ties.
This produced larger correlation coefficients. We concluded:
We also attempted to assess comparability of the three systems, using Pearson’s correlation. These results were obtained:
Table 10 Correlation of System Scores< 


12 mkrs  25 mkrs  37 mkrs  67 mkrs  
n=  4940  4203  3824  1953  
Wheaton  Ratio  Wheaton  Ratio  Wheaton  Ratio  Wheaton  Ratio  
Wheaton  *  0.35  *  0.42  *  0.44  *  0.58 
Ratio  0.35  *  0.42  *  0.44  *  0.58  
Deviation  0.52  0.36  0.52  0.30  0.25  0.080  0.25  0.031 
Significance ^{ [1] } 

Wheaton  *  <0.0001  *  <0.0001  *  <0.0001  *  <.0001 
Ratio  <0.0001  *  <0.0001  *  <.0001  *  <0.0001  * 
Deviation  <0.0001  <0.0001  <.0001  <0.0001  <0.0001  <0.0001  0.0854 
With Taylor data only[1], we also ranked haplotypes by scores in each system in order of lowest scores first; tied scores were assigned the same rank and subsequent ranks adjusted to account for the ties.
This produced larger correlation coefficients than with the above correlation of actual scores. We concluded:
Again with Taylor data only, we investigated the question of whether the measurement systems were consistent across marker sets, Did haplotypes score the same at different numbers of markers? It was, of course, possible to look at only the 50% who had results for all marker sets.
At first glance, scores seemed inconsistent across the marker sets. However, correlation coefficients were mostly high, ranging from a low of 0.179 to a high of 0.998. The least significance observed was p=0.0123^{[39]} (Wheaton: 25 vs. 67).
Table 11 Comparability Across Marker Sets/th>  

Wheaton  Ratio  Deviation  
12 markers  12 mkrs  37 mkrs  12 mkrs  37 mkrs  12 mkrs  37 mkrs 
*  0.665  *  0.68  *  0.755  
0.931  0.703  0.984  0.998  0.695  ?  
0.665  *  0.680  *  *  
0.233  0.265  0.285  0.341  0.695  0.923  
Significance  
* 
=0.001 
* 
<.0001 
* 
<.0001 

<.0001 
<.0001 
<.0001 
<.0001 
<.0001 
<.0001 

<.0001 
* 
<.0001 
* 
<.0001 
<.0001 

0.001 
=0.0002 
<.0001 
<.0001 
<.0001 
* 
As to rank correlation, correlation coefficients ranged from 0.181 to 0.9384 and least significance was p=0.0113,
Table 12 Crosscomparability of Rankings, 


Wheaton  Ratio  Deviation  
n= 196  12 mkrs  37 mkrs  12 mkrs  37 mkrs  12 mkrs  37 mkrs 
12 markers  *  0.588  *  0.662  *  0.211 
25 markers  0.685  0.890  0.759  0.938  0.776  0.210 
37 markers  0.588  *  0.662  *  0.181  0.249 
67 markers  0.214  0.316  0.285  0.324  0.211  * 
Significance  
12 markers  *  <0.0001  *  <0.0001  *  =0.011 
25 markers  <0.0001  <0.0001  <0.0001  <0.0001  <0.0001  =0.001 
37 markers  <0.0001  *  <0.0001  *  =0.011  * 
67 markers  =0.0027  <.0001  <0.0001  <0.0001  =0.003  =0.0004 
While the correlations are significant, at least some is due to inclusion of smaller marker sets within larger; the 67marker set includes the 37, as does 37 include 25, etc. It is specifically not suggested that scores for markers 112 are related to scores for markers 1325, 2637 or 3867. We suspect that scores for markers 112 will not correlate with 1325, etc.