Reranking candidate gene models with cross-species comparison for improved gene prediction

Loading...
Thumbnail Image
Penn collection
Departmental Papers (CIS)
Degree type
Discipline
Subject
HIDDEN MARKOV MODEL
SEQUENCE ALIGNMENT
HUMAN GENOME
ANNOTATION
MOUSE
RECOGNITION
DROSOPHILA
JIGSAW
Funder
Grant number
License
Copyright date
Distributor
Related resources
Author
Contributor
Abstract

Background: Most gene finders score candidate gene models with state-based methods, typically HMMs, by combining local properties (coding potential, splice donor and acceptor patterns, etc). Competing models with similar state-based scores may be distinguishable with additional information. In particular, functional and comparative genomics datasets may help to select among competing models of comparable probability by exploiting features likely to be associated with the correct gene models, such as conserved exon/intron structure or protein sequence features. Results: We have investigated the utility of a simple post-processing step for selecting among a set of alternative gene models, using global scoring rules to rerank competing models for more accurate prediction. For each gene locus, we first generate the K best candidate gene models using the gene finder Evigan, and then rerank these models using comparisons with putative orthologous genes from closely-related species. Candidate gene models with lower scores in the original gene finder may be selected if they exhibit strong similarity to probable orthologs in coding sequence, splice site location, or signal peptide occurrence. Experiments on Drosophila melanogaster demonstrate that reranking based on cross-species comparison outperforms the best gene models identified by Evigan alone, and also outperforms the comparative gene finders GeneWise and Augustus+. Conclusion: Reranking gene models with cross-species comparison improves gene prediction accuracy. This straightforward method can be readily adapted to incorporate additional lines of evidence, as it requires only a ranked source of candidate gene models.

Advisor
Date Range for Data Collection (Start Date)
Date Range for Data Collection (End Date)
Digital Object Identifier
Series name and number
Publication date
2008-10-14
Journal title
Volume number
Issue number
Publisher
Publisher DOI
Journal Issue
Comments
This article (as defined below) is provided under the terms of this BioMed Central Open Access License. The Work is protected by copyright and/or other applicable law. Any use of the work other than as authorized under this license is prohibited. By exercising any rights to the Work provided here, you accept and agree to be bound by the terms of this license. The Licensor grants you the rights contained here in consideration of your acceptance of such terms and conditions. Reprinted from: Qian Liu, Koby Crammer, Fernando CN Pereira and David S Roos. Reranking candidate gene models with cross-species comparison for improved gene prediction. BMC Bioinformatics 2008, 9:433. doi:10.1186/1471-2105-9-433
Recommended citation
Collection