- 【Updated on May 12, 2025】 Integration of CiNii Dissertations and CiNii Books into CiNii Research
- Trial version of CiNii Research Knowledge Graph Search feature is available on CiNii Labs
- 【Updated on June 30, 2025】Suspension and deletion of data provided by Nikkei BP
- Regarding the recording of “Research Data” and “Evidence Data”
Speaker-invariant and rhythm-sensitive representation of spoken words
Description
It is well-known that human speech recognition (HSR) is much more robust than automatic speech recognition (ASR) [1], [2]. Given that HSR's robustness to large acoustic variability is extremely high, it is reasonable for researchers to assume that humans are able to extract invariant patterns underlying input utterances [3]. Recently in developmental psychology, it was found that infants are very sensitive to distributional properties in the sounds of a language [4], [5]. Following this finding, the first author proposed a speaker-independent or invariant speech representation of each utterance, formed by using distributional properties in the sounds of that utterance [6], [7], [8]. This representation is called speech structure and was tested in isolated word recognition experiments [7], [8]. This paper introduces another kind of sensitivity into speech structure, that is sensitivity to language rhythm. Sonority-based syllable nucleus detection is implemented and we extract local and syllable-based structures as well as conventional global and holistic structures. Isolated word recognition experiments show that the recognition performance is improved with rhythmsensitive and local speech structures.
Journal
-
- 2013 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference
-
2013 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference 1-9, 2013-10-01
IEEE