Real time voice processing with audiovisual feedback : toward autonomous agents with perfect pitch

Saul, Lawrence K; Lee, Daniel D; Isbell, Charles L; LeCun, Yann

Real time voice processing with audiovisual feedback : toward autonomous agents with perfect pitch

Files

0-matlab.txt (4.42 KB)

realtimevoice.pdf (575.49 KB)

Penn collection

Departmental Papers (CIS)

Isbell, Charles L

LeCun, Yann

Abstract

We have implemented a real time front end for detecting voiced speech and estimating its fundamental frequency. The front end performs the signal processing for voice-driven agents that attend to the pitch contours of human speech and provide continuous audiovisual feedback. The algorithm we use for pitch tracking has several distinguishing features: it makes no use of FFTs or autocorrelation at the pitch period; it updates the pitch incrementally on a sample-by-sample basis; it avoids peak picking and does not require interpolation in time or frequency to obtain high resolution estimates; and it works reliably over a four octave range, in real time, without the need for postprocessing to produce smooth contours. The algorithm is based on two simple ideas in neural computation: the introduction of a purposeful nonlinearity, and the error signal of a least squares fit. The pitch tracker is used in two real time multimedia applications: a voice-to-MIDI player that synthesizes electronic music from vocalized melodies, and an audiovisual Karaoke machine with multimodal feedback. Both applications run on a laptop and display the user’s pitch scrolling across the screen as he or she sings into the computer.

Date of presentation

2002-12-09

Conference name

Departmental Papers (CIS)

Conference dates

2023-05-16T21:38:04.000

Comments

Advances in Neural Information Processing Systems 15 (NIPS 2002), pages 1205-1212. Publisher URL: http://books.nips.cc/nips15.html

Collection

Presentations