Phoneme Classification Using Temporal Tracking of Speech Clusters in Spectro-temporal Domain

Document Type : Original Article


Department of Electrical Engineering, Qaemshahr Branch, Islamic Azad University, Qaemshahr, Iran


This article presents a new feature extraction technique based on the temporal tracking of clusters in spectro-temporal features space. In the proposed method, auditory cortical outputs were clustered. The attributes of speech clusters were extracted as secondary features. However, the shape and position of speech clusters change during the time. The clusters temporally tracked and temporal tracking parameters were considered in secondary features. The new architecture was proposed for phoneme classification by a combining classifier using both tracked and energy-based features. Clustered based spectro-temporal features vectors were used for the classification of several subsets of TIMIT database phonemes. The results show that the phoneme classification rate was improved Using tracked spectro-temporal features. The results were improved to 78.9% on voiced plosives classification which was relatively 3.3% higher than the results of non-tracked spectro-temporal feature vectors. The results on other subsets of phonemes showed good improvement in classification rate too.  


1. Ruiz-Muñoz, J.F., You, Z., Raich, R. and Fern, X.Z.,
"Dictionary learning for bioacoustics monitoring with
applications to species classification", Journal of Signal
Processing Systems,  Vol. 90, No. 2, (2018), 233-247. 
2. Chi, T., Ru, P. and Shamma, S.A., "Multiresolution
spectrotemporal analysis of complex sounds", The Journal of
the Acoustical Society of America,  Vol. 118, No. 2, (2005),
3. Mesgarani, N., David, S.V., Fritz, J.B. and Shamma, S.A.,
"Mechanisms of noise robust representation of speech in
primary auditory cortex", Proceedings of the National Academy
of Sciences,  Vol. 111, No. 18, (2014), 6792-6797. 
4. Lu, K., Liu, W., Zan, P., David, S.V., Fritz, J.B. and Shamma,
S.A., "Implicit memory for complex sounds in higher auditory
cortex of the ferret", Journal of Neuroscience,  Vol. 38, No. 46,
(2018), 9955-9966. 
5. Yin, P., Shamma, S.A. and Fritz, J.B., "Relative salience of
spectral and temporal features in auditory long-term memory",
The Journal of the Acoustical Society of America,  Vol. 140,
No. 6, (2016), 4046-4060. 
6. Ruggles, D.R., Tausend, A.N., Shamma, S.A. and Oxenham,
A.J., "Cortical markers of auditory stream segregation revealed
for streaming based on tonotopy but not pitch", The Journal of the Acoustical Society of America,  Vol. 144, No. 4, (2018),
7. Francis, N.A., Elgueda, D., Englitz, B., Fritz, J.B. and Shamma,
S.A., "Laminar profile of task-related plasticity in ferret primary
auditory cortex", Scientific Reports,  Vol. 8, No. 1, (2018),
16375. doi: 10.1038/s41598-018-34739-3 
8. Winkowski, D.E., Nagode, D.A., Donaldson, K.J., Yin, P.,
Shamma, S.A., Fritz, J.B. and Kanold, P.O., "Orbitofrontal
cortex neurons respond to sound and activate primary auditory
cortex neurons", Cerebral Cortex,  Vol. 28, No. 3, (2017), 868879.
9. Shamma, S. and Dutta, K.J.T.J.o.t.A.S.o.A., "Spectro-temporal
templates unify the pitch percepts of resolved and unresolved
harmonics",  Vol. 145, No. 2, (2019), 615-629. 
10. Elgueda, D., Duque, D., Radtke-Schuller, S., Yin, P., David,
S.V., Shamma, S.A. and Fritz, J.B.J.N.N., "State-dependent
encoding of sound and behavioral meaning in a tertiary region of
the ferret auditory cortex",  Vol. 22, No., (2019), 447-459. 
11. Esfandian, N., Razzazi, F., Behrad, A. and Valipour, S., "A
feature selection method in spectro-temporal domain based on
gaussian mixture models", in IEEE 10th International
Conference On Signal Processing Proceedings, IEEE., (2010),
12. Esfandian, N., Razzazi, F. and Behrad, A., "A clustering based
feature selection method in spectro-temporal domain for speech
recognition", Engineering Applications of Artificial
Intelligence,  Vol. 25, No. 6, (2012), 1194-1202. 
13. Esfandian, N., Razzazi, F. and Behrad, A., "A feature extraction
method for speech recognition based on temporal tracking of
clusters in spectro-temporal domain", in The 16th CSI
International Symposium on Artificial Intelligence and Signal
Processing (AISP 2012), IEEE., (2012), 12-17. 
14. Fisher, W.M., "Ther darpa speech recognition research database:
Specifications and status", in Proc. DARPA Workshop on
Speech Recognition, (1986), 93-99. 
15. Guyon, I., Gunn, S., Nikravesh, M. and Zadeh, L.A., "Feature
extraction: Foundations and applications, Springer,  Vol. 207,