|
Genetic reference populations are defined as panels of related mouse strains with
fixed genotypes. Population structure is determined by the breeding history of these
populations and particularly, generations of out crossing, randomization of genotype
segregation and progenitor diversity. Linkage disequilibrium (LD) is a measure of
statistical dependence between genetic markers. It depends on the recombination
frequency, the genealogy of the populations, natural selection and other factors.
Using large publicly available SNP sets, we applied graph analysis to compare the
structure of multiple populations and sub-populations of mice. Linkage disequilibrium
was evaluated by three metrics: correlation coefficients, mutual information coefficients
and p-values for Lewontin’s D’. A high-pass filter was applied to these metrics
to construct an unweighted graph of genotype associations. Maximal complete subgraphs
(cliques) were extracted at several thresholds and the resulting graphs were analysed
for number of cliques, clique size and chromosomal representation by clique members.
These analytic approaches provide a quantitative comparison of populations and can
be used to optimize genetic equidistance of sub-populations. Results indicate that
the genotype structure of standard inbred strains, which have had longer periods
of outcrossing, consists of smaller blocks of linked loci than recombinant inbreds.
Moreover, it appears that the non-random breeding history of standard inbreds has
resulted in the infiltration of non-syntenic linkage at high LD thresholds. These
results are consistently observed across all three LD metrics. Long-range linkage
disequilibrium and the presence of LD blocks in mouse inbred strains have the ability
to confound SNP haplotype association analysis, despite the large size of the existing
standard inbred strain set. Large syntenic LD blocks in the BXD recombinant inbred
strain, though relatively uncorrelated with other genome regions, limit the power
and precision of this population for genetic analysis. These same limitations apply
to other correlation based methods including systems genetic analysis of high-throughput
phenotypes such as gene expression. The 8-way collaborative cross is designed to
have both smaller LD blocks than existing RI panels, and less long-range (non-syntenic)
association of genotypes.
|