Departmental Papers (ESE)


The objective of voice conversion methods is to modify the speech characteristics of a particular speaker in such manner, as to sound like speech by a different target speaker. Current voice conversion algorithms are based on deriving a conversion function by estimating its parameters through a corpus that contains the same utterances spoken by both speakers. Such a corpus, usually referred to as a parallel corpus, has the disadvantage that many times it is difficult or even impossible to collect. Here, we propose a voice conversion method that does not require a parallel corpus for training, i.e. the spoken utterances by the two speakers need not be the same, by employing speaker adaptation techniques to adapt to a particular pair of source and target speakers, the derived conversion parameters from a different pair of speakers. We show that adaptation reduces the error obtained when simply applying the conversion parameters of one pair of speakers to another by a factor that can reach 30% in many cases, and with performance comparable with the ideal case when a parallel corpus is available.

Document Type

Conference Paper

Date of this Version

May 2004


Copyright 2004 IEEE. Reprinted from Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing 2004 (ICASSP 2004) Volume 1, pages I-1 - I-4.
Publisher URL:

This material is posted here with permission of the IEEE. Such permission of the IEEE does not in any way imply IEEE endorsement of any of the University of Pennsylvania's products or services. Internal or personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution must be obtained from the IEEE by writing to By choosing to view this document, you agree to all provisions of the copyright laws protecting it.


voice conversion, gaussian mixture model, text-to-speech synthesis, speaker adaptation



Date Posted: 19 November 2004

This document has been peer reviewed.