TERMINUS: Telomeric End Reads Mining In Unassembled Sequences

Eukaryotic chromosomes terminate in specialized structures called telomeres, which consist of a simple, tandemly-repeated DNA sequence. Telomeres have vital roles in cellular metabolism. Recent study results suggest that telomere-proximal genomic regions (subtelomeres) may harbor genes that facilitate niche adaptation. Comparative genomics in subtelomeric regions holds great promise in the elucidation of telomere functions. Even though telomere repeats are often abundant among raw, unassembled sequence traces, they are usually highly under-represented in “whole” genome assemblies. As a result, it is difficult to identify the locations of subtelomeres from hundreds or even thousands of draft sequence contigs.

We developed TERMINUS, a set of tools to map telomeres on draft sequences of whole genome shotgun sequencing projects. It mines raw sequence reads (trace archive) for telomeric reads and their paired reads, assembles them into contigs representing individual chromosome ends, and BLASTs the resulting consensus sequences against the genome assembly to identify telomere-proximal genomic contigs. Finally, it estimates the sizes of telomeric gaps and identifies clones for gap closure. TERMINUS is implemented as a set of Perl scripts that requires two sets of inputs: the NCBI Trace Archive files for a given genome project; and ancillary genome assembly information. Results are output in spreadsheets containing information that facilitates manual validation.

-------------------------------------------------------------------------------

Weixi Li1, Cathryn J. Rehmeyer2, Chuck Staben1 and Mark L Farman*,2
1Department of Biological Sciences and 2Department of Plant Pathology, University
of Kentucky, Lexington, KY 40546