Pemodelan CNN Untuk Deteksi Emosi Berbasis Speech Bahasa Indonesia

Yulistia Khoirotul Aini; Tri Budi  Santoso; Titon Dutono

doi:10.35143/jkt.v7i1.4623

Submitted

7 April 2021

Accepted

2 June 2021

Published

2 June 2021

Download

PDF (Bahasa Indonesia)

Statistic

Read Counter : 2175 Download : 2117

Downloads

Download data is not yet available.

Abstract

In the interaction between humans and computers, the ability to recognize, interpret, and respond to emotions expressed in speech is needed. Until now, there is very little research for speech emotion recognition (SER) based on Indonesian. This is due to the limited corpus of Indonesian data for SER. In this study, a SER system was created by taking a dataset from an Indonesian TV series. The system is designed with the ability to carry out the process of classification of emotions, namely four classes of emotional labels angry, happy, neutral and sad. For its implementation, the deep learning method is used, which in this case the CNN method is selected. In this system the input is a combination of three features, namely MFCC, fundamental frequency, and RMSE. From the experiments that have been carried out, the best results have been obtained for the Indonesian language SER system using the MFCC input + fundamental frequency, which shows an accuracy rate of 85%. Meanwhile, the lowest accuracy when using the MFCC + RMSE feature is 72%. From this initial study, it is hoped that it will be able to provide an overview for researchers in the SER field, about how to select speech signal features as input in testing and make it easier for the steps to develop their research.

Keywords

Speech Emotion Recognition (SER) CNN Deep Learning

License

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

Copyright info for authors

1. Authors hold the copyright in any process, procedure, or article described in the work and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.

2. Authors retain publishing rights to re-use all or portion of the work in different work but can not granting third-party requests for reprinting and republishing the work.

3. Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) as it can lead to productive exchanges, as well as earlier and greater citation of published work.

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License

How to Cite

Aini, Y. K., Santoso, T. B. ., & Dutono, T. (2021). Pemodelan CNN Untuk Deteksi Emosi Berbasis Speech Bahasa Indonesia. Jurnal Komputer Terapan, 7(1), 143–152. https://doi.org/10.35143/jkt.v7i1.4623

Download Citation

References

C.M. Lee, S.S. Narayanan, â€œToward Detecting Emotions in Spoken Dialogsâ€, IEEE Trans, Speech Audio Process, 13(2), 293â€“303 , 2005.
D. Tacconi, O. Mayora, P. Lukowicz, B. Arnrich, C. Setz, G. Troster, C. Haring, â€œActivity and Emotion Recognition to Support Early Diagnosis of Psychiatric Diseasesâ€, Second International Conference on Pervasive Computing Technologies for Healthcare, pp. 100â€“102, 2008.
S. Yildirim, S. Narayanan, A. Potamianos, â€œDetecting Emotional State of a Child in a Conversational Computer Gameâ€. Comput. Speech Lang. 25(1), 29â€“44 , 2011.
D. Ververidis, C. Kotropoulos, â€œEmotional speech recognition: resources, features, and methodsâ€, Speech Commun. 48 (9), 1162â€“1181, 2006.
D.Neiberg, K. Elenius, â€œAutomatic Recognition of Anger in Spontaneous Speechâ€, INTERSPEECH 2008, Brisbane, Australia, pp. 2755â€“2758, 2008.
Alex, S. Ben, Mary, L., & Babu, B. P. â€œAttention and Feature Selection for Automatic Speech Emotion Recognition Using Utterance and Syllable-Level Prosodic Featuresâ€, Circuits, Systems, and Signal Processing, 39(11), 5681â€“5709, 2020.
Mirsamadi, S., Barsoum, E., & Zhang, C., â€œAutomatic Speech Emotion Recognition Using Recurrent Neural Networks With Local Attention Center for Robust Speech Systemsâ€ , IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2227â€“2231, 2017.
Mustaqeem, Sajjad, M., & Kwon, S. ,â€œClustering-Based Speech Emotion Recognition by Incorporating Learned Features and Deep BiLSTMâ€. IEEE Access, 8, 79861â€“79875, 2020.
Sun, T. W., â€œEnd-to-End Speech Emotion Recognition with Gender Informationâ€. IEEE Access, 8, 152423â€“152438., 2020.
Hamidi, Mina., â€œEmotion Recognition from Persian Speech with Neural Network.â€, International Journal of Artificial Intelligence & Applications. 3. 107-112, 2012.
Hamsa, S., Shahin, I., Iraqi, Y., & Werghi, N., â€œEmotion Recognition from Speech Using Wavelet Packet Transform Cochlear Filter Bank and Random Forest Classifier.â€, IEEE Access, 8, 96994â€“97006, 2020.
Fahmi, F., Jiwanggi, M. A., & Adriani, M. , â€œSpeech-Emotion Detection in an Indonesian Movieâ€, Proceedings of the 1st Joint Workshop on Spoken Language Technologies for Under-Resourced Languages (SLTU) and Collaboration and Computing for Under-Resourced Languages (CCURL), May, 185â€“193, 2020.
Cong, P.; Wang, C.; Ren, Z.; Wang, H.; Wang, Y.; Feng, J. â€œUnsatisï¬ed customer call detection with deep learningâ€, In Proceedings of the 2016 10th International Symposium on Chinese Spoken Language Processing(ISCSLP), Tianjin, China, 17â€“20; pp. 1â€“5, 2016.
Livingstone, S., & Russo, F. â€œThe Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS) â€œ. In PLoS ONE (Vol. 13), 2018.

References

C.M. Lee, S.S. Narayanan, â€œToward Detecting Emotions in Spoken Dialogsâ€, IEEE Trans, Speech Audio Process, 13(2), 293â€“303 , 2005.

D. Tacconi, O. Mayora, P. Lukowicz, B. Arnrich, C. Setz, G. Troster, C. Haring, â€œActivity and Emotion Recognition to Support Early Diagnosis of Psychiatric Diseasesâ€, Second International Conference on Pervasive Computing Technologies for Healthcare, pp. 100â€“102, 2008.

S. Yildirim, S. Narayanan, A. Potamianos, â€œDetecting Emotional State of a Child in a Conversational Computer Gameâ€. Comput. Speech Lang. 25(1), 29â€“44 , 2011.

D. Ververidis, C. Kotropoulos, â€œEmotional speech recognition: resources, features, and methodsâ€, Speech Commun. 48 (9), 1162â€“1181, 2006.

D.Neiberg, K. Elenius, â€œAutomatic Recognition of Anger in Spontaneous Speechâ€, INTERSPEECH 2008, Brisbane, Australia, pp. 2755â€“2758, 2008.

Alex, S. Ben, Mary, L., & Babu, B. P. â€œAttention and Feature Selection for Automatic Speech Emotion Recognition Using Utterance and Syllable-Level Prosodic Featuresâ€, Circuits, Systems, and Signal Processing, 39(11), 5681â€“5709, 2020.

Mirsamadi, S., Barsoum, E., & Zhang, C., â€œAutomatic Speech Emotion Recognition Using Recurrent Neural Networks With Local Attention Center for Robust Speech Systemsâ€ , IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2227â€“2231, 2017.

Mustaqeem, Sajjad, M., & Kwon, S. ,â€œClustering-Based Speech Emotion Recognition by Incorporating Learned Features and Deep BiLSTMâ€. IEEE Access, 8, 79861â€“79875, 2020.

Sun, T. W., â€œEnd-to-End Speech Emotion Recognition with Gender Informationâ€. IEEE Access, 8, 152423â€“152438., 2020.

Hamidi, Mina., â€œEmotion Recognition from Persian Speech with Neural Network.â€, International Journal of Artificial Intelligence & Applications. 3. 107-112, 2012.

Hamsa, S., Shahin, I., Iraqi, Y., & Werghi, N., â€œEmotion Recognition from Speech Using Wavelet Packet Transform Cochlear Filter Bank and Random Forest Classifier.â€, IEEE Access, 8, 96994â€“97006, 2020.

Fahmi, F., Jiwanggi, M. A., & Adriani, M. , â€œSpeech-Emotion Detection in an Indonesian Movieâ€, Proceedings of the 1st Joint Workshop on Spoken Language Technologies for Under-Resourced Languages (SLTU) and Collaboration and Computing for Under-Resourced Languages (CCURL), May, 185â€“193, 2020.

Cong, P.; Wang, C.; Ren, Z.; Wang, H.; Wang, Y.; Feng, J. â€œUnsatisï¬ed customer call detection with deep learningâ€, In Proceedings of the 2016 10th International Symposium on Chinese Spoken Language Processing(ISCSLP), Tianjin, China, 17â€“20; pp. 1â€“5, 2016.

Livingstone, S., & Russo, F. â€œThe Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS) â€œ. In PLoS ONE (Vol. 13), 2018.

Pemodelan CNN Untuk Deteksi Emosi Berbasis Speech Bahasa Indonesia

Downloads

Abstract

Keywords

Copyright info for authors

References

References

Most read articles by the same author(s)

Similar Articles

Similar Articles

Aplikasi Pembelajaran Hiragana Bahasa Jepang Berbasis Android Menggunakan Speech Recognition

CNN Modelling Untuk Deteksi Wajah Berbasis Gender Menggunakan Python

Application of Deep Learning on Types of Oil Palm Plant Diseases Using the Convolutional Neural Network Algorithm

Learning Impact Role Playing Game Edukasi Terhadap Motivasi Belajar Sejarah Siswa

Artificial Intelligence Automatic Speech Recognition (ASR) untuk pencarian potongan ayat Al-Quâ€™ran

ENHANCING STUDENTS’ LEARNING OUTCOMES IN CLASSICAL CRYPTOGRAPHY THROUGH INTERACTIVE 3D SIMULATIONS

DeepSun: Klasifikasi Fase Cahaya Matahari Berdasarkan Warna Menggunakan CNN

Sistem Pengenalan Karakter pada Plat Kendaraan Bermotor Menggunakan Profile Projection dan Algoritma Korelasi

Optimasi Model CNN untuk Identifikasi Jenis Bunga Berdasarkan spektrum Warna

PENGEMBANGAN APLIKASI MOBILE UNTUK DETEKSI CACAT BIJI KOPI ROBUSTA BERDASARKAN STANDAR NASIONAL INDONESIA

Pemodelan CNN Untuk Deteksi Emosi Berbasis Speech Bahasa Indonesia

Article Sidebar

Downloads

Main Article Content

Abstract

Keywords

Article Details

Copyright info for authors

References

References

Most read articles by the same author(s)

Similar Articles