逆フィルタとモーメント計算によるホルマント周波数抽出と合成音による評価

書誌事項

タイトル別名
  • Formant Frequency Extraction by Inverse Filtering and Moment Calculation and Its Evaluation using Synthetic Speech Sound
  • ギャクフィルタ ト モーメント ケイサン ニ ヨル ホルマント シュウハスウ チュウシュツ ト ゴウセイオン ニ ヨル ヒョウカ

この論文をさがす

抄録

A new method of formant frequency extraction utilizing characteristic features of the vowel-type spectra is proposed and realized in the form of FORTRAN program. An experimental evaluation of the method is carried out using synthetic speech sounds which simulate various troublesome conditions encountered in formant frequency extraction of natural speech. Inverse filtering in the spectral domain is made so as to leave a simple resonance spectrum of one formant behind from an input spectrum, for schematic example, leaving H^+_2 in Fig. 4(b) behind from P in Fig. 3(b). The formant frequency is then calculated as the first-order moment. A repetition of these two processes shown in Fig. 7 gives fairly accurate formant frequencies. Extractions on five Japanese vowels by five male adults and the non-nasal voiced portions of continuous speech sounds by two male announcers are carried out. Some results of them are shown in Table 4 and Fig. 8. Here discussed are some factors that may give rise to much trouble in the formant frequency extraction. The factors based on source characteristic are the source harmonic structure, zeros of the source spectral envelope, and the gross shape differences of the source spectral envelopes. The factors based on transfer characteristic are rapid formant transitions and their contiguities. In this paper four excitation waveforms and six source fundamental frequencies (100-200 Hz) are used in the synthesis combined with the formant frequency pattern of Fig. 9. Three of the excitation waveforms are triangular as shown in Fig. 1, of which K=0. 5, 0. 7 and 1. 0 and the remaining one is impulse-type. The error distribution of the formant frequencies extracted from these synthetic sounds are shown in Fig. 11. The results of the extraction are examined in relation to the factors described above with the following conclusions reached: (1)Under many troublesome conditions the proposed method provides fairly good accuracy and extraction errors do not exceed half the source fundamental frequency in most cases. (2)The extraction program is relatively simple. The average extraction time is about 0. 23 sec. for each 10ms short-time spectrum by the general-purpose computer NEAC 2200/500(add. , 5. 2μsec. ). It is remarkably fast as compared with usual methods. (3)The results of experiment with synthetic sounds generated under various excitation conditions and natural sounds uttered by many speakers suggest that this method is applicable to various speech sounds reliably.

収録刊行物

  • 日本音響学会誌

    日本音響学会誌 26 (5), 211-221, 1970

    一般社団法人 日本音響学会

被引用文献 (1)*注記

もっと見る

詳細情報 詳細情報について

問題の指摘

ページトップへ