Grant: $639,694 - National Science Foundation - Jul. 14, 2009
No votes have been cast for this award yet
Award Description: Award Description Despite large acoustic differences in the speech of various talkers, humans are generally able to understand each other very quickly. The mechanisms by which we are able to map the great acoustic variability we encounter onto a small set of phonemes has been the subject of research for more than half a century. This 'speaker normalization' problem has been approached in a number of ways, but is generally thought of as a way to appropriately equate the formant frequencies of a particular speaker with a reference set of formants. In the proposed project, we identify a novel and innovative approach to speaker normalization. Recent work by the project's personnel has shown that subglottal resonances (SGRs) form a set of acoustic boundaries in the frequency space of an individual's speech, thereby defining frequency bands within which formants may vary and yet retain the same phonemic vowel quality. Rather than normalize formant frequencies, which are known to vary significantly even for a given vowel produced by a given speaker, we propose to test a theory of speaker normalization in which SGRs are normalized, thereby normalizing the frequency bands within which vowel classes are defined. Our interdisciplinary team of researchers (with expertise in linguistic phonetics, speech production and perception, and automatic speech recognition (ASR)) is uniquely qualified to pursue this project. The proposed project is a transformative one. It will help us better understand and model human speech production, especially aspects pertaining to the subglottal system. Additionally, the work will inform central issues in human speech perception, especially in the area of speaker normalization. From a theoretical perspective, the proposed studies will advance our understanding of human speech perception and production and function to constrain models of these two processes by highlighting the need to account for SGRs. From an applied perspective, the work will lead to improved performance of speaker normalization and identification systems especially when the training and test data are mismatched (for example, when training the acoustic models on adult speech while recognizing children's speech in quiet and/or in noise). Thus, the resulting robust systems should be able to deal effectively with speech variability, a major challenge for ASR systems. Broader Impact. The interdisciplinary collaboration in Engineering, Linguistics, Speech & Hearing, and Psychology will facilitate a multidisciplinary learning environment for the participating faculty, research scientist, graduate and undergraduate students and will result in the broader impact of enhanced training in speech production and perception modeling and algorithm development. We will encourage participation of undergraduate students. We intend to extensively publish the results of our work in high-quality journals and present them at relevant conferences. The proposed work will result in a set of databases and tools that will be disseminated to the research and education community. The project can have a profound impact on improving noise-robust ASR for adults and children, and for those who are English Language Learners, and providing the first evidence-based approach to incorporating normalization algorithms into speech processors for sensory aids (e.g., hearing aids and cochlear implants). The results would also generalize to other languages beyond Spanish and English. The proposed work can be applied to the area of speaker identification, which is important in both commercial and military applications.
Project Description: Statement of Work We will 1) collect a speech database which includes simultaneous recordings of subglottal acoustics, 2) analyze the data to drive the development of an evidence-based theory of SGR-based speaker normalization, 3) develop and implement speaker normalization and identification algorithms, and 4) test the normalization procedures on both ASR and human speech perception tasks in quiet and in noise. Since we were awaiting final human subject approval, work on the award hasn't started yet. We will start soon. Since this award is collaborative with Washington University, we will be having a kick-off meeting at UCLA on Monday, 10/19 where we plan the scope and timeline of each task.
Jobs Summary: No Jobs (Total jobs reported: 0)
Project Status: Not Started
This award's data was last updated on Jul. 14, 2009. Help expand these official descriptions using the wiki below.