Real-Time Pitch Determination of One or More Voices by Nonnegative Matrix Factorization

Loading...
Thumbnail Image
Penn collection
Departmental Papers (CIS)
Degree type
Discipline
Subject
Funder
Grant number
License
Copyright date
Distributor
Related resources
Author
Contributor
Abstract

An auditory "scene", composed of overlapping acoustic sources, can be viewed as a complex object whose constituent parts are the individual sources. Pitch is known to be an important cue for auditory scene analysis. In this paper, with the goal of building agents that operate in human environments, we describe a real-time system to identify the presence of one or more voices and compute their pitch. The signal processing in the front end is based on instantaneous frequency estimation, a method for tracking the partials of voiced speech, while the pattern-matching in the back end is based on nonnegative matrix factorization, an unsupervised algorithm for learning the parts of complex objects. While supporting a framework to analyze complicated auditory scenes, our system maintains real-time operability and state-of-the-art performance in clean speech.

Advisor
Date of presentation
2004-12-13
Conference name
Departmental Papers (CIS)
Conference dates
2023-05-16T22:31:30.000
Conference location
Date Range for Data Collection (Start Date)
Date Range for Data Collection (End Date)
Digital Object Identifier
Series name and number
Volume number
Issue number
Publisher
Publisher DOI
Journal Issue
Comments
Copyright MIT Press. Postprint version. Published in Advances in Neural Information Processing Systems 17, pages 1233-1240. Proceedings of the 18th annual Neural Information Processing Systems (NIPS) conference, held in Vancouver, Canada, from 13-18 December 2004.
Copyright MIT Press. Postprint version. Published in Advances in Neural Information Processing Systems 17, December 2003.
Recommended citation
Collection