Multiband statistical learning for f0 estimation in speech
Penn collection
Degree type
Discipline
Subject
statistical learning
Funder
Grant number
License
Copyright date
Distributor
Related resources
Author
Contributor
Abstract
We investigate a simple algorithm that combines multiband processing and least squares fits to estimate f0 contours in speech. The algorithm is untraditional in several respects: it makes no use of FFTs or autocorrelation at the pitch period; it updates the pitch incrementally on a sample-by-sample basis; it avoids peak picking and does not require interpolation in time or frequency to obtain high resolution estimates; and it works reliably, in real time, without the need for postprocessing to produce smooth contours. We show that a baseline implementation of the algorithm, though already quite accurate, is significantly improved by incorporating a model of statistical learning into its final stages. Model parameters are estimated from training data to minimize the likelihood of gross errors in f0 as well as errors in classifying voiced versus unvoiced speech. Experimental results on several databases confirm the benefits of statistical learning.