University of Pennsylvania Working Papers in Linguistics


The value in working with natural language corpora is the ability to collect large volumes of emprical data with which to test research hypotheses. The challenge in generating these data is how to quickly and accurately, with some degree of objectivity, identify linguistic units as data points. This paper offers a description of how to adapt the Penn Phonetics Lab Forced Aligner for use with a corpus of Qu\'ebec French and how to extract meaningful data from the alignment results. The results of adapting the aligner for use with this corpus of French are encouraging. Two illustrations demonstrate how to profitably use this emprical data to evaluate several hypotheses concerning the relationship and effects of syllable position on allophonic variation of /R/. The literature review indicates that, along with sociolinguistic variables such as age, and, to a lesser extent, social class, gender, and education, the most commonly cited factors potentially influencing /R/ allophony are syllabic position followed to a lesser extent by phonetic environment. There are two observations that help to motivate the current research question. Most recent sociolinguistic studies conclude that the frequency of occurences of the apical trill is rapidly decreasing and that in the corpus used for the current study, [r] is no longer the dominant variant and posterior variants should be expected. In addition to the loss of the apical trill, a uvular approximant is now noted as a frequently occurring allophone of /R/, most common intervocalically. The results presented here do not support the hypothesis that allophonic variation is related to, or effected by, syllable position. Approximants and trills were equally likely to occur in either onsets or codas when intervocalic, and only slightly more likely to occur in complex onsets when post-consonantal. The results do support the hypothesis that approximants and trills are more sonorous than fricatives as measured by the amount of energy in their first formant and their centre of gravity. Approximants and trills had significantly higher values for energy in their first formant frequency, and significantly lower values for centre of gravity than fricatives.