Maximum Entropy Methods for Biological Sequence Modeling

Buehler, Eugen C; Ungar, Lyle H

Maximum Entropy Methods for Biological Sequence Modeling

Files

u14.pdf (140.3 KB)

Penn collection

Departmental Papers (CIS)

Subject

maximum entropy
amino acids
sequence analysis

Permalink

https://repository.upenn.edu/handle/20.500.14332/6163

View all metadata

Author

Buehler, Eugen C

Ungar, Lyle H

Abstract

Many of the same modeling methods used in natural languages, specifically Markov models and HMM's, have also been applied to biological sequence analysis. In recent years, natural language models have been improved upon by using maximum entropy methods which allow information based upon the entire history of a sequence to be considered. This is in contrast to the Markov models, whose predictions generally are based on some mixed number of previous emissions, that have been the standard for most biological sequence models. To test the utility of Maximum Entropy modeling for biological sequence analysis, we used these methods to model amino acid sequences. Our results show that there is significant long-distance information in amino acid sequences and suggests that maximum entropy techniques may be beneficial for a range of biological sequence analysis problems.

Date of presentation

2001-08-26

Conference name

Departmental Papers (CIS)

Conference dates

2023-05-16T22:28:22.000

Comments

Presented at the Workshop on Data Mining in Bioinformatics 2001 (BIOKDD 2001).
Presented at the Workshop on Data Mining in Bioinformatics 2001 (BIOKDD 2001).

Collection

Presentations