Coding SNPs, evolution and disease phenotype: genome-wide bioinformatics
predictions and experimental functional studies

Many diseases, both Mendelian and complex, have been associated with single-nucleotide changes (SNPs) that lead to an amino acid substitution in the encoded protein (nonsynonymous SNPs, or nsSNPs). Because of ascertainment bias, nsSNPs may not necessarily be the dominant cause of human disease. Nevertheless, nsSNPs provide an excellent testing ground for using evolutionary analysis to predict the functional effects of genetic variation, as computational methods for inferring selective pressure in protein-coding sequences are well-established.

We have applied models of both negative and positive selection. One signature of negative selection is that in groups of related protein sequences, many positions in the protein are “conserved”; for instance, all serine proteases must possess the catalytic serine residue. To quantify this negative selection, we developed a “substitution position-specific evolutionary conservation” (subPSEC) score, a statistical measure of the likelihood of a particular amino acid substitution being deleterious. We then analyzed a large number of nsSNPs from a number of data sets: “normal” variation, Mendelian disease associated mutations and complex disease associated variation. We find that while Mendelian disease-associated nsSNPs tend to occur at highly conserved positions in proteins, complex disease nsSNPs do not. In contrast, applying a method for estimating positive selection (Ka/Ks ratios between human and mouse orthologs), we show that complex disease nsSNPs tend to occur in genes under recent positive selection (or relaxed negative selection).

ABCA1 has been implicated in both Mendelian and complex disease. Mutations in the ABCA1 gene are causative for Tangier disease, an autosomal recessive disease associated with almost complete absence of HDL-cholesterol (HDL-C). Heterozygous individuals have decreased HDL-C and an increased risk of coronary artery disease, and common genetic variation in ABCA1 is associated with changes in HDL levels and disease risk. We used the subPSEC score to predict the functional effect on specific mutations in the ABCA1 gene, and in collaboration with Dr. M.R. Hayden and colleagues at UBC, have followed up these predictions with functional tests of cell lines expressing the mutant genes. We find that the subPSEC score appears to be a robust predictor of basal biochemical function in ABCA1. ABCA1 has also been a target for recent positive selection, consistent with its association with complex disease.

-------------------------------------------------------------------------------

Biosketch
Paul D. Thomas, Ph.D., is Sr. Director of the Computational Biology research group at Applied Biosystems.
The group focuses on evolutionary analysis of proteins and protein families, especially with respect to
conservation and divergence of protein function and biological pathways. The PANTHER system for
large-scale classification of protein function, and inference of functional amino acids is freely accessible
to the biological research community at http://panther.appliedbiosystems.com. Previously Dr. Thomas
directed the Protein Informatics group at Celera Genomics (Applied Biosystems’ sister company),
where he led the functional analysis of the repertoire of human genes (Venter et al., Science 291:1304, 2001).
Prior to joining Celera, Dr. Thomas led the research group at Molecular Applications Group, after holding
a research position in bio-and chemi-informatics at SmithKline Beecham Pharmaceuticals. Dr. Thomas
earned his PhD. in biophysics with Ken Dill at UCSF, studying computational structural biology.