Statistics Papers

Document Type

Journal Article

Date of this Version

2010

Publication Source

Journal of Computational Biology

Volume

17

Issue

12

Start Page

1697

Last Page

1709

DOI

10.1089/cmb.2008.0233

Abstract

The use of Monte Carlo evaluation to compute p-values of pattern counting test statistics is especially attractive when an asymptotic theory is absent or when the search sequence or the word pattern is too short for an asymptotic formula to be accurate. The drawback of applying Monte Carlo simulations directly is its inefficiency when p-values are small, which precisely is the situation of importance. In this paper, we provide a general importance sampling algorithm for efficient Monte Carlo evaluation of small p-values of pattern counting test statistics and apply it on word patterns of biological interest, in particular palindromes and inverted repeats, patterns arising from position specific weight matrices, as well as co-occurrences of pairs of motifs. We also show that our importance sampling technique satisfies a log efficient criterion.

Copyright/Permission Statement

Final publication is available from Mary Ann Liebert, Inc., publishers http://dx.doi.org/10.1089/cmb.2008.0233.

Comments

At the time of publication, author Nancy R. Zhang was affiliated with Stanford University. Currently, she is a faculty member at the Statistics Department at the University of Pennsylvania.

Keywords

importance sampling, biological sequence analysis, motif analysis

Share

COinS
 

Date Posted: 27 November 2017

This document has been peer reviewed.