The Second DISPLACE Challenge: DIarization of SPeaker and LAnguage in Conversational Environments Shareef Babu Kalluri, Prachi Singh, Pratik Roy Chowdhuri, Apoorva Kulkarni, Shikha Baghel, Pradyoth Hegde, Swapnil Sontakke, Deepak K T, S.R. Mahadeva Prasanna, Deepu Vijayasenan, Sriram Ganapathy Proceedings of the Annual Conference of the International Speech Communication Association Interspeech, 2024
Driver Speech Detection in Real Driving Scenario Mrinmoy Bhattacharjee, Shikha Baghel, S. R. Mahadeva Prasanna Lecture Notes in Computer Science Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics, 2023
The DISPLACE Challenge 2023 - DIarization of SPeaker and LAnguage in Conversational Environments Shikha Baghel, Shreyas Ramoji, - Sidharth, Ranjana H, Prachi Singh, Somil Jain, Pratik Roy Chowdhuri, Kaustubh Kulkarni, Swapnil Padhi, Deepu Vijayasenan, Sriram Ganapathy Proceedings of the Annual Conference of the International Speech Communication Association Interspeech, 2023 In multilingual societies, social conversations often involve code-mixed speech. The current speech technology may not be well equipped to extract information from multi-lingual multi-speaker conversations. The DISPLACE challenge entails a first-of-kind task to benchmark speaker and language diarization on the same data, as the data contains multi-speaker conversations in multilingual code-mixed speech. The challenge attempts to highlight outstanding issues in speaker diarization (SD) in multilingual settings with code-mixing. Further, language diarization (LD) in multi-speaker settings also introduces new challenges, where the system has to disambiguate speaker switches with code switches. For this challenge, a natural multilingual, multi-speaker conversational dataset is distributed for development and evaluation purposes. The systems are evaluated on single-channel far-field recordings. We also release a baseline system and report the highlights of the system submissions.
Under-resourced dialect identification in Ao using source information Moakala Tzudir, Shikha Baghel, Priyankoo Sarmah, S. R. Mahadeva Prasanna Journal of the Acoustical Society of America, 2022 This paper reports the findings of an automatic dialect identification (DID) task conducted on Ao speech data using source features. Considering that Ao is a tone language, in this study for DID, the gammatonegram of the linear prediction residual is proposed as a feature. As Ao is an under-resourced language, data augmentation was carried out to increase the size of the speech corpus. The results showed that data augmentation improved DID by 14%. A perception test conducted on Ao speakers showed better DID by the subjects when utterance duration was 3 s. Accordingly, automatic DID was conducted on utterances of various duration. A baseline DID system with the Slms feature attained an average F1-score of 53.84% in a 3 s long utterance. Inclusion of source features, Silpr and [Formula: see text], improved the F1-score to 60.69%. In a final system, with a combination of Silpr, [Formula: see text], Slms, and Mel frequency cepstral coefficient features, the F1-score increased to 61.46%.
Analyzing RMFCC Feature for Dialect Identification in Ao, an Under-Resourced Language Moakala Tzudir, Shikha Baghel, Priyankoo Sarmah, S. R. M. Prasanna 2022 National Conference on Communications Ncc 2022, 2022 Ao is a language spoken in Nagaland in the North-East of India. It is a low-resource tone language under the Tibeto-Burman language family. It consists of three tones, namely, high, mid and low. It has three distinct dialects of the language viz. Chungli, Mongsen and Changki. This paper presents an automatic dialect identification in Ao using the excitation source feature. The objective of a dialect identification system is to identify a speech variety within a language. The goal of this study is to determine if the excitation source features such as Residual Mel Frequency Cepstral Coefficient (RMFCC) can be exploited to discriminate the three dialects in Ao automatically. In addition, vocal tract system features, namely Mel Frequency Cepstral Coefficients (MFCC) and Shifted Delta Cepstral (SDC) coefficients, are used as the baseline methods. The RMFCC features are obtained from the Linear Prediction (LP) residual signal, while MFCC features are derived from the smooth spectrum of the speech signal. SDC coefficients are explored to provide additional temporal information. This work is evaluated on trisyllabic words uttered by 36 speakers for the three dialects of Ao. A Gaussian Mixture Model (GMM) based classifier is used for classification. The performance of the system yields a better dialect identification accuracy rate when all three features are combined.
Overlapped speech detection using phase features Shikha Baghel, S. R. Mahadeva Prasanna, Prithwijit Guha Journal of the Acoustical Society of America, 2021 Simultaneous speech of multiple speakers is known as overlapped speech, which causes problems for speech recognition and speaker diarization systems. The present work uses previously less utilized signal phase information in the task of overlapped speech detection. In this context, Instantaneous Frequency Cosine Coefficient (IFCC) and Modified Group Delay Cepstral Coefficient (MGDCC) features are explored. IFCC captures the time-varying phase characteristics, while MGDCC represents the frequency-varying information of the phase spectrum. A Convolutional Neural Network and Long Short-Term Memory (CNN-LSTM)-based classifier is used for the classification. The present work uses synthetically generated overlapped speech from the GRID corpus. The proposed method is benchmarked against three baseline approaches that use magnitude spectrum features. It is observed that the combination of IFCC and MGDCC features with CNN-LSTM classifier provides better performance than the baselines. The combination of phase features with magnitude-based MFCC feature provides the best performance, indicating the importance of complementary information. The present study also investigates the effect of segment duration, genders, and number of simultaneous speakers on the overlapped speech detection system. Finally, the proposed method is also evaluated on real overlapped data from the AMI corpus.
Effect of high-energy voiced speech segments and speaker gender on shouted speech detection Shikha Baghel, S. R. M. Prasanna, Prithwijit Guha 2021 National Conference on Communications Ncc 2021, 2021 Shouted speech detection is an essential preprocessing task in many conventional speech processing systems. Mostly, shouted speech has been studied in terms of the characterization of vocal tract and excitation source features. Previous works have also established the significance of voiced segments in shouted speech detection. This work posits that a significant emphasis is given to a portion of the voiced segments during shouted speech production. These emphasized voiced regions have significant energy. This work analyzes the effect of high-energy voiced segments on shouted speech detection. Moreover, fundamental frequency is a crucial characteristic of both shouted speech and speaker gender. Authors believe that gender has a significant effect on shouted speech detection. Therefore, the present work also studies the impact of gender on the current task. The classification between normal and shouted speech is performed using a DNN based classifier. A statistical significance test of the features extracted from high-energy voiced segments is also performed. The results support the claim that high-energy voiced segments carry highly discriminating information. Additionally, classification results of gender experiments show that gender has a notable effect on shouted speech detection.
Excitation source feature based dialect identification in Ao - A low resource language Moakala Tzudir, Shikha Baghel, Priyankoo Sarmah, S.R. Mahadeva Prasanna Proceedings of the Annual Conference of the International Speech Communication Association Interspeech, 2021 Ao is an under-resourced Tibeto-Burman tonal language spoken in Nagaland, India. There are three distinct dialects of the language, namely, Chungli, Mongsen and Changki. The objective of dialect identification is to identify one dialect from the other within the same language family. The goal of this study is to ascertain the potential of excitation source features for automatic dialect identification in Ao. In this direction, Integrated Linear Prediction Residual (ILPR), an approximate representation of source signal, is explored. The log Mel spectrogram of ILPR (SExt) signal is used to exploit the time-frequency characteristics of the excitation source. This work proposes attention based CNN-BiGRU architecture for automatic dialect identification tasks. Additionally, log Mel spectrogram (SV T ), extracted from the pre-emphasized speech signal, is used as a baseline method. The SV T contains the vocal-tract characteristics of the speech signal. A significant performance improvement of (nearly) 6% accuracy is observed when the excitation source feature (SExt) is combined with the vocal tract representation (SV T ). To analyse the effect of segment duration, dialect identification performance is reported for three different durations, viz., 1 sec, 3 sec and 6 sec. The effect of gender in dialect identification task for Ao is also studied in this work.
Shouted and Normal Speech Classification Using 1D CNN Shikha Baghel, Mrinmoy Bhattacharjee, S. R. M. Prasanna, Prithwijit Guha Lecture Notes in Computer Science Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics, 2019
The Second DISPLACE Challenge: DIarization of SPeaker and LAnguage in Conversational Environments SB Kalluri, P Singh, PR Chowdhuri, A Kulkarni, S Baghel, P Hegde, ... arXiv preprint arXiv:2406.09494 , 2024 2024 Citations: 11
Summary of the DISPLACE Challenge 2023-DIarization of SPeaker and LAnguage in Conversational Environments S Baghel, S Ramoji, S Jain, PR Chowdhuri, P Singh, D Vijayasenan, ... Speech Communication , 2024 2024 Citations: 15
Driver Speech Detection in Real Driving Scenario M Bhattacharjee, S Baghel, SRM Prasanna International Conference on Speech and Computer, 189-199 , 2023 2023 Citations: 1
The DISPLACE Challenge 2023 - DIarization of SPeaker and LAnguage in Conversational Environments SG Shikha Baghel, Shreyas Ramoji, Sidharth, Ranjana H, Prachi Singh, Somil ... INTERSPEECH-2023, 3562--3566 , 2023 2023 Citations: 11
DISPLACE Challenge: DIarization of SPeaker and LAnguage in Conversational Environments S Baghel, S Ramoji, P Singh, S Jain, PR Chowdhuri, K Kulkarni, S Padhi, ... INTERSPEECH at https://www.isca-speech.org/archive/interspeech_2023 … , 2023 2023 Citations: 6
Overlapped Speech Detection Using AM-FM Based Time-Frequency Representations S Baghel, SRM Prasanna, P Guha International Conference on Speech and Computer, 33-43 , 2022 2022
Under-resourced dialect identification in Ao using source information M Tzudir, S Baghel, P Sarmah, SRM Prasanna The Journal of the Acoustical Society of America 152 (3), 1755-1766 , 2022 2022 Citations: 7
Analyzing RMFCC Feature for Dialect Identification in Ao, an Under-Resourced Language M Tzudir, S Baghel, P Sarmah, SRM Prasanna 2022 National Conference on Communications (NCC), 308-313 , 2022 2022 Citations: 16
Shouted, Overlapped and Competitive Speech Detection in Indian Television News Debates S Baghel 2022
Overlapped speech detection using phase features S Baghel, SRM Prasanna, P Guha The Journal of the Acoustical Society of America 150 (4), 2770-2781 , 2021 2021 Citations: 3
Effect of High-Energy Voiced Speech Segments and Speaker Gender on Shouted Speech Detection S Baghel, SRM Prasanna, P Guha 2021 National Conference on Communications (NCC), 1-6 , 2021 2021 Citations: 1
Excitation Source Feature Based Dialect Identification in Ao-A Low Resource Language. M Tzudir, S Baghel, P Sarmah, SRM Prasanna Interspeech, 1524-1528 , 2021 2021 Citations: 9
Automatic Detection of Shouted Speech Segments in Indian News Debates S Baghel, M Bhattacharjee, SRM Prasanna, P Guha Proc. Interspeech 2021, 4179-4183 , 2021 2021 Citations: 6
Overlapped/Non-Overlapped Speech Transition Point Detection Using Bag-of-Audio-Words S Baghel, SRM Prasanna, P Guhal 2020 International Conference on Signal Processing and Communications (SPCOM … , 2020 2020
Exploration of excitation source information for shouted and normal speech classification S Baghel, SRM Prasanna, P Guha The Journal of the Acoustical Society of America 147 (2), 1250-1261 , 2020 2020 Citations: 17
Analysis of Excitation Source Characteristics for Shouted and Normal Speech Classification S Baghel, SRM Prasanna, P Guha 2020 National Conference on Communications (NCC), 1-6 , 2020 2020 Citations: 2
Shouted and normal speech classification using 1D CNN S Baghel, M Bhattacharjee, SRM Prasanna, P Guha Pattern Recognition and Machine Intelligence: 8th International Conference … , 2019 2019 Citations: 11
Excitation Source Feature for Discriminating Shouted and Normal Speech S Baghel, SRM Prasanna, P Guha 2018 International Conference on Signal Processing and Communications (SPCOM … , 2018 2018 Citations: 7
Classification of multi speaker shouted speech and single speaker normal speech S Baghel, SRM Prasanna, P Guha TENCON 2017-2017 IEEE Region 10 Conference, 2388-2392 , 2017 2017 Citations: 8
Shouted/normal speech classification using speech-specific features S Baghel, BK Khonglah, SRM Prasanna, P Guha 2016 IEEE Region 10 Conference (TENCON), 1655-1659 , 2016 2016 Citations: 8
MOST CITED SCHOLAR PUBLICATIONS
Exploration of excitation source information for shouted and normal speech classification S Baghel, SRM Prasanna, P Guha The Journal of the Acoustical Society of America 147 (2), 1250-1261 , 2020 2020 Citations: 17
Analyzing RMFCC Feature for Dialect Identification in Ao, an Under-Resourced Language M Tzudir, S Baghel, P Sarmah, SRM Prasanna 2022 National Conference on Communications (NCC), 308-313 , 2022 2022 Citations: 16
Summary of the DISPLACE Challenge 2023-DIarization of SPeaker and LAnguage in Conversational Environments S Baghel, S Ramoji, S Jain, PR Chowdhuri, P Singh, D Vijayasenan, ... Speech Communication , 2024 2024 Citations: 15
The Second DISPLACE Challenge: DIarization of SPeaker and LAnguage in Conversational Environments SB Kalluri, P Singh, PR Chowdhuri, A Kulkarni, S Baghel, P Hegde, ... arXiv preprint arXiv:2406.09494 , 2024 2024 Citations: 11
The DISPLACE Challenge 2023 - DIarization of SPeaker and LAnguage in Conversational Environments SG Shikha Baghel, Shreyas Ramoji, Sidharth, Ranjana H, Prachi Singh, Somil ... INTERSPEECH-2023, 3562--3566 , 2023 2023 Citations: 11
Shouted and normal speech classification using 1D CNN S Baghel, M Bhattacharjee, SRM Prasanna, P Guha Pattern Recognition and Machine Intelligence: 8th International Conference … , 2019 2019 Citations: 11
Excitation Source Feature Based Dialect Identification in Ao-A Low Resource Language. M Tzudir, S Baghel, P Sarmah, SRM Prasanna Interspeech, 1524-1528 , 2021 2021 Citations: 9
Classification of multi speaker shouted speech and single speaker normal speech S Baghel, SRM Prasanna, P Guha TENCON 2017-2017 IEEE Region 10 Conference, 2388-2392 , 2017 2017 Citations: 8
Shouted/normal speech classification using speech-specific features S Baghel, BK Khonglah, SRM Prasanna, P Guha 2016 IEEE Region 10 Conference (TENCON), 1655-1659 , 2016 2016 Citations: 8
Under-resourced dialect identification in Ao using source information M Tzudir, S Baghel, P Sarmah, SRM Prasanna The Journal of the Acoustical Society of America 152 (3), 1755-1766 , 2022 2022 Citations: 7
Excitation Source Feature for Discriminating Shouted and Normal Speech S Baghel, SRM Prasanna, P Guha 2018 International Conference on Signal Processing and Communications (SPCOM … , 2018 2018 Citations: 7
DISPLACE Challenge: DIarization of SPeaker and LAnguage in Conversational Environments S Baghel, S Ramoji, P Singh, S Jain, PR Chowdhuri, K Kulkarni, S Padhi, ... INTERSPEECH at https://www.isca-speech.org/archive/interspeech_2023 … , 2023 2023 Citations: 6
Automatic Detection of Shouted Speech Segments in Indian News Debates S Baghel, M Bhattacharjee, SRM Prasanna, P Guha Proc. Interspeech 2021, 4179-4183 , 2021 2021 Citations: 6
Overlapped speech detection using phase features S Baghel, SRM Prasanna, P Guha The Journal of the Acoustical Society of America 150 (4), 2770-2781 , 2021 2021 Citations: 3
Analysis of Excitation Source Characteristics for Shouted and Normal Speech Classification S Baghel, SRM Prasanna, P Guha 2020 National Conference on Communications (NCC), 1-6 , 2020 2020 Citations: 2
Driver Speech Detection in Real Driving Scenario M Bhattacharjee, S Baghel, SRM Prasanna International Conference on Speech and Computer, 189-199 , 2023 2023 Citations: 1
Effect of High-Energy Voiced Speech Segments and Speaker Gender on Shouted Speech Detection S Baghel, SRM Prasanna, P Guha 2021 National Conference on Communications (NCC), 1-6 , 2021 2021 Citations: 1
Overlapped Speech Detection Using AM-FM Based Time-Frequency Representations S Baghel, SRM Prasanna, P Guha International Conference on Speech and Computer, 33-43 , 2022 2022
Shouted, Overlapped and Competitive Speech Detection in Indian Television News Debates S Baghel 2022
Overlapped/Non-Overlapped Speech Transition Point Detection Using Bag-of-Audio-Words S Baghel, SRM Prasanna, P Guhal 2020 International Conference on Signal Processing and Communications (SPCOM … , 2020 2020