On this page:
 

Measuring Haplotype Rarity

 

Comparison of Measurement Systems

Comparing the measurement methods we see these distributions:


Figure 10: Wheaton

Figure 10a: WApM

Figure 11: Ratio

Figure 12: Deviation

They appear very different in Figures 10-12 (repetitions of 4, 6, 7 & 8). Are they as different as they look?

Do the systems consistently with each other identify haplotype positions on the spectrum of commonness/rarity? Scatter plots partially answer,

Scatter Plots for 67 Markers s

Figure 13

Figure 14
scatter plot
Figure 15
Scatter Plots for 37 Markers

Figure 16

Figure 17

Figure 18

There is considerable "noise" in the data but the scatter plots show three different patterns:

Correlation

We also attempted to assess comparability of the three systems, using a Pearson’s correlation. These results were obtained:

<0.0001
Table 10
Correlation of System Scores<
  12 mkrs 25 mkrs 37 mkrs 67 mkrs
n= 4940 4203 3824 1953
  Wheaton Ratio Wheaton Ratio Wheaton Ratio Wheaton Ratio
Wheaton   * 0.35   * 0.42   * 0.4395   * 0.582
Ratio 0.35   * 0.417   * 0.44   *   *
Deviation 0.52 0.36 0.517 0.3 0.254 0.0797 0.2469 0.031
Significance [37]
Wheaton * <0.0001 * <0.0001 * <0.0001 * <.0001
Ratio <0.0001 * <0.0001 * <.0001 * <0.0001 *
Deviation <0.0001 <0.0001 <0.0001 <.0001 <0.0001 <0.0001 0.0854

Meaning:

Correlation coefficients are generally small, exceeding 0.5 in only three instances. They can account for only about one-third (or less) of the variances. However,

Rank Correlation

With Taylor data only[38], we also ranked haplotypes by scores in each system in order of lowest scores first; tied scores were assigned the same rank and subsequent ranks adjusted to account for the ties.

This produced larger correlation coefficients. We concluded:

Cross-comparability across marker sets

Again with Taylor data only, we investigated the question of whether the measurement systems were consistent across marker sets, Did haplotypes score the same at different numbers of markers? It was, of course, possible to look at only the 50% who had results for all marker sets.

At first glance, scores seemed inconsistent across the marker sets. However, correlation coefficients were mostly high, ranging from a low of 0.179 to a high of 0.998. The least significance observed was p=0.0123[39] (Wheaton: 25 vs. 67).

Table 11
Comparability Across Marker Sets
N= 196 Wheaton Ratio Deviation
  12 mkrs 37 mkrs 12 mkrs 37 mkrs 12 mkrs 37 mkrs
12 markers   * 0.665   * 0.68   * 0.755
25 markers 0.731 0.931 0.703 0.984 0.998 0.695
37 markers 0.665   * 0.680   * 0.755   *
67 markers 0.233 0.265 0.285 0.341 0.695 0.923

Significance
12 markers   * =0.001   * <.0001   * <.0001
25 markers <.0001 <.0001 <.0001 <.0001 <.0001 <.0001
37 markers <.0001   * <.0001   * <.0001 <.0001
67 markers 0.001 =.0002 <.0001 <.0001 <.0001   *

Rank Correlation

Haplotypes were ranked by scores for each marker set in each of the three methods and then the rankings were compared. Correlation coefficients for rankings ranged from 0.181 to 0.9384 and least significance was p=0.0113,

  *
Table 12
Cross-comparability of Rankings
  Wheaton Ratio Deviation
n= 196 12 mkrs 37 mkrs 12 mkrs 37 mkrs 12 mkrs 37 mkrs
12 markers   * 0.588   * 0.662   * 0.211
25 markers 0.685 0.890 0.759 0.938 0.776 0.210
37 markers 0.588   * 0.662   * 0.181 0.249
67 markers 0.214 0.316 0.285 0.324 0.211   *

Significance
12 markers   * <0.0001   * <0.0001   * =0.011
25 markers <0.0001 <0.0001 <0.0001 <0.0001 <0.0001 =0.001
37 markers <0.0001   * <0.0001   * =0.011
67 markers =0.0027 <.0001 <0.0001 <0.0001 =0.003 =0.0004

Comment:
While the correlations are significant, at least some is due to inclusion of smaller marker sets within larger; the 67-marker set includes the 37-, as 37- includes 25-, etc. It is specifically not suggested that scores for markers 1-12 are related to scores for markers 13-25, 26-37 or 38-67. We suspect that scores for markers 1-12 will not correlate with 13-25, etc.

Meanings:

§        Correlation between Wheaton & Ratio scores is positive and significant at all resolution levels. Observed correlations are less than 1/100 percent due to chance.

§        Correlation of Deviation scores with Wheaton are positive & significant at all resolution levels. Observed correlations are less than 1/100 percent due to chance.

§        Correlation of Deviation scores with Ratio scores is weaker, though positive. It is less significant at 67 markers; the correlation is less than 1/10 due to chance.

Rank Correlation

With Taylor data only[1], we also ranked haplotypes by scores in each system in order of lowest scores first; tied scores were assigned the same rank and subsequent ranks adjusted to account for the ties.

This produced larger correlation coefficients. We concluded:

Correlation

We also attempted to assess comparability of the three systems, using Pearson’s correlation. These results were obtained:

<0.0001 <0.0001
Table 10
Correlation of System Scores<
  12 mkrs 25 mkrs 37 mkrs 67 mkrs
n= 4940 4203 3824 1953
  Wheaton Ratio Wheaton Ratio Wheaton Ratio Wheaton Ratio
Wheaton   * 0.35   * 0.42   * 0.44   * 0.58
Ratio 0.35   * 0.42   * 0.44   *  0.58
Deviation 0.52 0.36 0.52 0.30 0.25 0.080 0.25 0.031

Significance [1]
Wheaton * <0.0001 * <0.0001 * <0.0001 * <.0001
Ratio <0.0001 * <0.0001 * <.0001 * <0.0001 *
Deviation <0.0001 <0.0001 <.0001 <0.0001 <0.0001 <0.0001 0.0854

Meaning:

Rank Correlation

With Taylor data only[1], we also ranked haplotypes by scores in each system in order of lowest scores first; tied scores were assigned the same rank and subsequent ranks adjusted to account for the ties.

This produced larger correlation coefficients than with the above correlation of actual scores. We concluded:

Cross-comparability across marker sets

Again with Taylor data only, we investigated the question of whether the measurement systems were consistent across marker sets, Did haplotypes score the same at different numbers of markers? It was, of course, possible to look at only the 50% who had results for all marker sets.

At first glance, scores seemed inconsistent across the marker sets. However, correlation coefficients were mostly high, ranging from a low of 0.179 to a high of 0.998. The least significance observed  was p=0.0123[39] (Wheaton: 25 vs. 67).

0.731 0.755
Table 11
Comparability Across Marker Sets/th>
Wheaton Ratio Deviation
12 markers 12 mkrs 37 mkrs 12 mkrs 37 mkrs 12 mkrs 37 mkrs
    * 0.665   * 0.68   * 0.755
  0.931 0.703 0.984 0.998 0.695 ?
  0.665   * 0.680   *   *
  0.233 0.265 0.285 0.341 0.695 0.923
Significance
 

  *

=0.001

  *

<.0001

  *

<.0001

 

<.0001

<.0001

<.0001

<.0001

<.0001

<.0001

 

<.0001

  *

<.0001

  *

<.0001

<.0001

 

0.001

=0.0002

<.0001

<.0001

<.0001

  *

As to rank correlation, correlation coefficients ranged from 0.181 to 0.9384 and least significance was p=0.0113,

Table 12
Cross-comparability of Rankings,
  Wheaton Ratio Deviation
n= 196 12 mkrs 37 mkrs 12 mkrs 37 mkrs 12 mkrs 37 mkrs
12 markers   * 0.588   * 0.662   * 0.211
25 markers 0.685 0.890 0.759 0.938 0.776 0.210
37 markers 0.588   * 0.662   * 0.181 0.249
67 markers 0.214 0.316 0.285 0.324 0.211   *
Significance
12 markers   * <0.0001   * <0.0001   * =0.011
25 markers <0.0001 <0.0001 <0.0001 <0.0001 <0.0001 =0.001
37 markers <0.0001   * <0.0001   * =0.011   *
67 markers =0.0027 <.0001 <0.0001 <0.0001 =0.003 =0.0004

Comment:

While the correlations are significant, at least some is due to inclusion of smaller marker sets within larger; the 67-marker set includes the 37-, as does 37- include 25-, etc. It is specifically not suggested that scores for markers 1-12 are related to scores for markers 13-25, 26-37 or 38-67. We suspect that scores for markers 1-12 will not correlate with 13-25, etc.