Departmental Papers (CIS)

Date of this Version

January 2004

Document Type

Conference Paper


Postprint version. Copyright ACM, 2004. This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version was published in Proceedings of the 15th Annual ACM-SIAM Symposium on Discrete Algorithms 2004 (SODA 2004), pages 910-918.
Publisher URL:


We are given a collection of m random subsequences (traces) of a string t of length n where each trace is obtained by deleting each bit in the string with probability q. Our goal is to exactly reconstruct the string t from these observed traces. We initiate here a study of deletion rates for which we can successfully reconstruct the original string using a small number of samples. We investigate a simple reconstruction algorithm called Bitwise Majority Alignment that uses majority voting (with suitable shifts) to determine each bit of the original string. We show that for random strings t, we can reconstruct the original string (w.h.p.) for q = O(1/ log n) using only O(log n) samples. For arbitrary strings t, we show that a simple modification of Bitwise Majority Alignment reconstructs a string that has identical structure to the original string (w.h.p.) for q = O(1/n1/2+ε) using O(1) samples. In this case, using O(n log n) samples, we can reconstruct the original string exactly. Our setting can be viewed as the study of an idealized biological evolutionary process where the only possible mutations are random deletions. Our goal is to understand at what mutation rates, a small number of observed samples can be correctly aligned to reconstruct the parent string. In the process of establishing these results, we show that Bitwise Majority Alignment has an interesting selfcorrecting property whereby local distortions in the traces do not generate errors in the reconstruction and eventually get corrected.



Date Posted: 11 September 2005