The flow of speech consists not only of consonants and vowels, but also of pitch, loudness, and rhythmic properties which bind spoken language together. These prosodic properties vary substantially across languages and how they are structured and constrained motivates my research in phonetics and linguistic fieldwork/documentation on indigenous languages of Southern Mexico. In these languages, prosodic properties, notably tone, are used in subtle ways to distinguish word meaning and grammatical structure. For instance, in the Itunyoso Triqui language, linguistic tone, glottalization, and length are used to distinguish and inflect words (see Table 1).

Trulli
Table 1: Example words in Itunyoso Triqui (DiCanio 2008, 2012a, 2016). Numbers indicate pitch on a five note scale (‘5’ is high and ‘1’ is low.) Combinations of numbers indicate pitch movement.

The use of pitch and glottalization in languages like these is linguistically important. Spoken language requires both that speakers exert careful control in speech production to distinguish words and that listeners be attuned to subtleties. Yet, speakers in most languages of the world do not carefully control pitch to the extent required in Itunyoso Triqui. This means that such languages are of theoretical interest as well as of interest for understanding human diversity. My research examines how speakers of Triqui and other languages produce, perceive, and carefully control pitch in speech. In doing this work, I examine both (i) how low-level acoustic and articulatory detail in speech production can shape the phonological representation of words and (ii) how discourse context and informativity can influence the phonetic signal.

Since 2004, I have conducted fieldwork and language documentation with three previously undescribed indigenous language communities in Southern Mexico. In addition to these languages being the focus of scientific inquiry, my fieldwork research has resulted in descriptive linguistic publications, an archived, annotated corpus of spoken language texts, and, for one of the communities, an online dictionary. In a complementary vein, the creation and elaboration of spoken language corpora for minority/indigenous communities in Mexico has led to research exploring how computational tools can be used to improve access to existing archival recordings. Below, I describe my past and current research in all these areas alongside future plans.

I. Research on word prosody

All human languages use prosody to convey meaning. In languages like English, stress distinguishes words like récord (the noun) vs. recórd (the verb). Languages may also distinguish words using lexical tone as the Itunyoso Triqui data in Table 1 illustrates. Here, the acoustic cue for distinguishing words is not the relative difference in pitch across syllables, as in stress, but pitch level/slope within a syllable. About 42% of the world's languages are tonal (Maddieson 2013). This includes many well-studied languages like Mandarin Chinese or Serbo-Croatian as well as many lesser-known languages throughout Africa, Asia, and the Americas. Languages also can distinguish words using voice quality, length, and other features. The combination/interaction of these prosodic properties is both empirically and descriptively valuable (because it has not been studied) and theoretically pertinent to long-standing debates within phonetic science.

My early work explored two important questions in phonetic science. First, how do languages with prosodically complex systems coordinate different prosodic contrasts. For instance, if it is necessary to produce both a particular tone and a particular voice quality, how do speakers do both at the same time, especially given that the latter influences pitch? Doing phonetic fieldwork in rural Thailand on Chong (Austroasiatic) and in rural Mexico on Itunyoso Triqui (Otomanguean), I answered this question and discovered interesting aspects about the timing relation between tone and voice quality (DiCanio 2009, 2012b).

Second, what acoustic cues do listeners of complex tone languages use to perceive their language's contrasts? And which cues are more important? I addressed these questions in a series of speech perception experiments in the field with native Triqui listeners and in a lab with native French listeners (DiCanio 2012c, 2014). After spending several years examining contrastive prosodic properties on words, I became interested in examining how different discursive contexts influenced tone. In languages like English, a combination of pitch and lengthening is used to distinguish pragmatic meaning in interactive contexts like "A: Who has arrived? B: JOHN arrived." vs. "A: I even saw John! B: JOHN arrived?" If you are a speaker of a complex tone language, you might not be able to adjust pitch like an English speaker does to indicate these meanings. How speakers of tonal languages express these meanings remains an open question. Under an NSF grant, I investigated both how discourse interacts with tone and how computational tools can be used to extract acoustic information from endangered language corpora in two complex tone languages, Itunyoso Triqui and Yoloxóchitl Mixtec. In both languages there is less freedom for speakers to use pitch to indicate finality, uncertainty, or other meanings. In a series of experiments, I found that Mixtec speakers raise their overall pitch range when they wish to emphasize key words in an utterance (DiCanio et al. 2018, 2021), but speakers of Itunyoso Triqui do not (DiCanio & Hatcher, submitted).

The findings are relevant to theories of speech intonation in two ways. First, researchers have assumed that pitch range was not linguistically meaningful (Ladd, 2008), but the Mixtec data suggests that it is. Second, the Triqui data suggests that pitch does not have to be used for indicating finality or emphasis at all. This suggests that intonation, itself, may not be universal in human language - a novel finding further demonstrating the importance of research on indigenous languages towards our understanding of the universal properties of human speech.

My future research on prosody and discourse involves both the UB phonetics laboratory and phonetic fieldwork. In the phonetics laboratory we are currently investigating how speakers of tonal and non-tonal languages (English, Korean, Mandarin Chinese) highlight information in speech using articulographic and acoustic methods. These methods are novel since they involve looking at parameters one can not see from the acoustic signal alone (such as speech of articulator movement) and this provides a deeper picture of how words are emphasized for pragmatic purposes across languages. I am also pursuing fieldwork examining how sensitive native Triqui speakers are to pitch level when asked to imitate different tones. Along these lines, I intend to submit a grant proposal investigating cross-linguistic speech perception in rural Mexico. What ties these future goals together is a general interest in how language use and context can interact in the phonetic signal.

II. Linguistic fieldwork and language documentation

My phonetic fieldwork has focused mostly on Otomanguean languages in Southern Mexico. Though their complex phonetic properties have remained one of my research interests, these languages also have intricate grammatical processes, deep cultural histories, and unique perspectives that enrich our knowledge of human language. In-depth linguistic fieldwork is not only theoretically-driven, but has humanistic value. My early research involved descriptive and instrumental fieldwork on Itunyoso Triqui, a previously undescribed and undocumented language (DiCanio, 2008, 2010, 2012a). This laid the groundwork both for later in-depth language documentation and an expansion of my research into other distantly-related Otomanguean languages.


The Otomanguean language family is one of the oldest and most diverse indigenous language families in the Americas; and these languages possess some of the most complex tonal systems found in the world (DiCanio & Bennett, 2020). In my NSF DEL/RI grant, my Triqui collaborators and I recorded and transcribed 27 hours of oral narratives, folklore, histories, and cultural practices, comprising 290 distinct recordings from more than thirty Triqui community members. Starting in 2015, I taught three Triqui collaborators literacy in their language using a previously developed orthography (DiCanio & Cruz Martínez 2010). The Triqui corpus has been archived at the Archive of Indigenous Languages of Latin America (DiCanio, 2019). An intense focus on the details of text transcription here led to publications on the grammatical aspects of tone in the language (DiCanio 2016, 2020a, 2020b; DiCanio et al. 2020) and to the creation of an online bilingual dictionary with approximately 3,600 entries (DiCanio 2020b). In a recent publication, collaborators and I examine how the computational methods that I have devised can be used to examine under-studied aspects of consonant phonetics in Yoloxóchitl Mixtec (DiCanio et al. 2021). Co-authors and I survey and review the best practices for phonetic documentation across three collections of phonetic publication types (Whalen, DiCanio, and Dockum, 2020). I am currently working on creating an improved online portal for the Triqui dictionary. In the future, I plan to examine more pragmatic aspects of Triqui grammar, via fieldwork and corpus analysis, and investigate grammatical aspects of tone in Ixcatec, an additional language I have done fieldwork on via collaboration with an NSF grant.

III. Corpus and computational methods in phonetics

One of the emerging areas of research in linguistics is corpus phonetics. It involves mining spoken language corpora for acoustic data that can be used to answer theoretical linguistic questions pertaining to word use, speech production, speech style, and ongoing language change. The primary motivation for this research is that it allows linguists to examine unscripted, naturalistic language data. While corpus phonetics has often been limited to well-studied languages, my own research focuses on how linguists are able to make use of spoken language corpora of endangered languages. In earlier work, we found that automatic speech segmentation software (forced alignment) can work reasonably well to segment endangered language recordings (DiCanio et al. 2013) and using these methods, we examined how stylistic differences influenced the production of vowel sounds in Yoloxóchitl Mixtec (DiCanio et al, 2015) and in Arapaho, an endangered Algonquian language spoken in Wyoming (DiCanio & Whalen 2015). We found that a more casual speech style involved more vowel reduction and contraction of the vowel space for speakers. These findings add to the growing literature demonstrating that speakers adjust their speech production in nuanced, phonetically-predictable ways in different contexts. The methods researched here also serve the broader goal of ensuring that human language technologies work well for all languages.


Exploring corpus phonetics with endangered language corpora that I have annotated or collected, I have begun to delve deeper into questions regarding language variation and change. I have examined variation in consonant production in a Yoloxóchitl Mixtec corpus and explored how it can be modelled using deep neural networks (DiCanio et al, 2022). Recently, I have also explored variation in grammatical tone in my Itunyoso Triqui corpus and found that when tone carries particularly important grammatical information, it may lead to the reduction of speech segments (DiCanio, 2022). This finding suggests that, even when speaking quickly, speakers evaluate the importance of low-level phonetic information. With all of the annotations of substantial corpora complete, a future project will examine the interplay of prosody with speech reduction.