
Departmental Papers (CIS)
Title
Real time voice processing with audiovisual feedback : toward autonomous agents with perfect pitch
Date of this Version
December 2002
Document Type
Conference Paper
Recommended Citation
Lawrence K. Saul, Daniel D. Lee, Charles L. Isbell, and Yann LeCun, "Real time voice processing with audiovisual feedback : toward autonomous agents with perfect pitch", . December 2002.
Abstract
We have implemented a real time front end for detecting voiced speech and estimating its fundamental frequency. The front end performs the signal processing for voice-driven agents that attend to the pitch contours of human speech and provide continuous audiovisual feedback. The algorithm we use for pitch tracking has several distinguishing features: it makes no use of FFTs or autocorrelation at the pitch period; it updates the pitch incrementally on a sample-by-sample basis; it avoids peak picking and does not require interpolation in time or frequency to obtain high resolution estimates; and it works reliably over a four octave range, in real time, without the need for postprocessing to produce smooth contours. The algorithm is based on two simple ideas in neural computation: the introduction of a purposeful nonlinearity, and the error signal of a least squares fit. The pitch tracker is used in two real time multimedia applications: a voice-to-MIDI player that synthesizes electronic music from vocalized melodies, and an audiovisual Karaoke machine with multimodal feedback. Both applications run on a laptop and display the user’s pitch scrolling across the screen as he or she sings into the computer.
Date Posted: 18 October 2004
This document has been peer reviewed.
Comments
Advances in Neural Information Processing Systems 15 (NIPS 2002), pages 1205-1212. Publisher URL: http://books.nips.cc/nips15.html