Performance comparison of neural network architectures for speaker‐independent phoneme recognition

Search this article

Description

<jats:title>Abstract</jats:title><jats:p>We applied several types of time‐delay neural networks (TDNNs), generally used for speaker‐dependent and multispeaker speech recognition, to speaker‐independent speech recognition and compared their performance. Six or 12 speakers were used to train each network, and recognition experiments for voiced stops /b, d, g/ were performed in open speaker mode. The best recognition rates were 91.3 percent and 93.6 percent, using six and 12 training speakers, respectively. We found that constructing modular networks, such as modular TDNN with each network corresponding to a speaker, is effective in terms of decreasing the number of training iterations needed, showing slightly better performance than with a single TDNN with a comparable network capacity. This is because the modular networks make use of limited capacity effectively. On the other hand, a single TDNN with an increased number of hidden units showed a recognition rate comparable to that of the modular TDNN.</jats:p>

Journal

Details 詳細情報について

Report a problem

Back to top