Scalability in Managing Phenotypic Data

Bruce Whitehead
Computer Science
Univ. of Tennessee Space Institute

bwhitehe@utsi.edu

When the existing tools for searching genotypic and phenotypic data are working well, there is a constant interaction between the immense raw searching capability of the tools and the expert human's biological understanding, heuristic reasoning ability, pattern recognition ability, experience, and judgment. As we gather data on RNA expression and the resultant proteome, investigators will become increasingly interested in

·        networks of gene regulation responsible for the patterns of RNA expression in various tissues,

·        networks of biochemical pathways determined by proteome expression, and

·        cause-and-effect relationships, which unfold over phenotypic developmental time to relate networks of regulatory genes to RNA co-expression and to proteomic pathways.

The volume and complexity of these relationships challenge the scalability of the process by which biomedical investigators interact with computational resources to solve problems and make discoveries.

To address this challenge, biomedical investigators need to be able to automate the rote parts of the ways they interact with the set of genotypic and phenotypic databases necessary to solve each given problem. A database management system (DBMS) is proposed to provide a rigorous and measurable environment for developing, testing, and launching similarity-based search, retrieval and pattern matching tools. An open-source DBMS with extensible data types and operators can provide a scalable framework with transaction integrity to serve as a back end for such search and pattern analysis tools.