Image of Syllable-based Speech Recognition System Using Pitch Detection on Time–Frequency Domain Feature Extraction

Text

Syllable-based Speech Recognition System Using Pitch Detection on Time–Frequency Domain Feature Extraction



This research presents the segmentation of single-syllable sounds for speech recognition using an artificial neural network. The network combines key features from speech signals in the time and frequency domains. The approach involves dividing speech signals into frames using the short-time energy waveform. Pitch markers are then extracted from the frames and used as reference points to split them into sections. The sections are further analyzed using window searching to identify positions, amplitudes, local minimum and maximum values, and maximum slope values, which serve as key features in the time domain. In the frequency domain, cepstrum coefficients on the Mel scale are used as additional key features. The two types of key features are combined for speech recognition using the artificial neural network. The study also compares the performance of the combined and separated key features in the time and frequency domains when fed into the neural network. The results demonstrate that using the artificial neural network with two input layers (Mel frequency cepstral coefficient and time domain features) and the same hidden layers yields the highest recognition accuracy of 96.97% and 88.43% for blind tests.


Availability

No copy data


Detail Information

Series Title
-
Call Number
-
Publisher International Journal of Computing and Digital Systems : Bahrain.,
Collation
006
Language
English
ISBN/ISSN
2210-142X
Classification
NONE
Content Type
-
Media Type
-
Carrier Type
-
Edition
-
Subject(s)
Specific Detail Info
-
Statement of Responsibility

Other Information

Accreditation
Scopus Q3

Other version/related

No other version available


File Attachment



Information


Web Online Public Access Catalog - Use the search options to find documents quickly