Shikha Baghel

Scopus Publications

Summary of the DISPLACE challenge 2023-DIarization of SPeaker and LAnguage in Conversational Environments
Shikha Baghel, Shreyas Ramoji, Somil Jain, Pratik Roy Chowdhuri, Prachi Singh, Deepu Vijayasenan, Sriram Ganapathy
Speech Communication, 2024
The Second DISPLACE Challenge: DIarization of SPeaker and LAnguage in Conversational Environments
Shareef Babu Kalluri, Prachi Singh, Pratik Roy Chowdhuri, Apoorva Kulkarni, Shikha Baghel, Pradyoth Hegde, Swapnil Sontakke, Deepak K T, S.R. Mahadeva Prasanna, Deepu Vijayasenan, Sriram Ganapathy
Proceedings of the Annual Conference of the International Speech Communication Association Interspeech, 2024
Driver Speech Detection in Real Driving Scenario
Mrinmoy Bhattacharjee, Shikha Baghel, S. R. Mahadeva Prasanna
Lecture Notes in Computer Science Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics, 2023
The DISPLACE Challenge 2023 - DIarization of SPeaker and LAnguage in Conversational Environments
Shikha Baghel, Shreyas Ramoji, - Sidharth, Ranjana H, Prachi Singh, Somil Jain, Pratik Roy Chowdhuri, Kaustubh Kulkarni, Swapnil Padhi, Deepu Vijayasenan, Sriram Ganapathy
Proceedings of the Annual Conference of the International Speech Communication Association Interspeech, 2023
In multilingual societies, social conversations often involve code-mixed speech. The current speech technology may not be well equipped to extract information from multi-lingual multi-speaker conversations. The DISPLACE challenge entails a first-of-kind task to benchmark speaker and language diarization on the same data, as the data contains multi-speaker conversations in multilingual code-mixed speech. The challenge attempts to highlight outstanding issues in speaker diarization (SD) in multilingual settings with code-mixing. Further, language diarization (LD) in multi-speaker settings also introduces new challenges, where the system has to disambiguate speaker switches with code switches. For this challenge, a natural multilingual, multi-speaker conversational dataset is distributed for development and evaluation purposes. The systems are evaluated on single-channel far-field recordings. We also release a baseline system and report the highlights of the system submissions.
Under-resourced dialect identification in Ao using source information
Moakala Tzudir, Shikha Baghel, Priyankoo Sarmah, S. R. Mahadeva Prasanna
Journal of the Acoustical Society of America, 2022
This paper reports the findings of an automatic dialect identification (DID) task conducted on Ao speech data using source features. Considering that Ao is a tone language, in this study for DID, the gammatonegram of the linear prediction residual is proposed as a feature. As Ao is an under-resourced language, data augmentation was carried out to increase the size of the speech corpus. The results showed that data augmentation improved DID by 14%. A perception test conducted on Ao speakers showed better DID by the subjects when utterance duration was 3 s. Accordingly, automatic DID was conducted on utterances of various duration. A baseline DID system with the Slms feature attained an average F1-score of 53.84% in a 3 s long utterance. Inclusion of source features, Silpr and [Formula: see text], improved the F1-score to 60.69%. In a final system, with a combination of Silpr, [Formula: see text], Slms, and Mel frequency cepstral coefficient features, the F1-score increased to 61.46%.
Overlapped Speech Detection Using AM-FM Based Time-Frequency Representations
Shikha Baghel, S. R. M. Prasanna, Prithwijit Guha
Lecture Notes in Computer Science Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics, 2022
Analyzing RMFCC Feature for Dialect Identification in Ao, an Under-Resourced Language
Moakala Tzudir, Shikha Baghel, Priyankoo Sarmah, S. R. M. Prasanna
2022 National Conference on Communications Ncc 2022, 2022
Ao is a language spoken in Nagaland in the North-East of India. It is a low-resource tone language under the Tibeto-Burman language family. It consists of three tones, namely, high, mid and low. It has three distinct dialects of the language viz. Chungli, Mongsen and Changki. This paper presents an automatic dialect identification in Ao using the excitation source feature. The objective of a dialect identification system is to identify a speech variety within a language. The goal of this study is to determine if the excitation source features such as Residual Mel Frequency Cepstral Coefficient (RMFCC) can be exploited to discriminate the three dialects in Ao automatically. In addition, vocal tract system features, namely Mel Frequency Cepstral Coefficients (MFCC) and Shifted Delta Cepstral (SDC) coefficients, are used as the baseline methods. The RMFCC features are obtained from the Linear Prediction (LP) residual signal, while MFCC features are derived from the smooth spectrum of the speech signal. SDC coefficients are explored to provide additional temporal information. This work is evaluated on trisyllabic words uttered by 36 speakers for the three dialects of Ao. A Gaussian Mixture Model (GMM) based classifier is used for classification. The performance of the system yields a better dialect identification accuracy rate when all three features are combined.
Overlapped speech detection using phase features
Shikha Baghel, S. R. Mahadeva Prasanna, Prithwijit Guha
Journal of the Acoustical Society of America, 2021
Simultaneous speech of multiple speakers is known as overlapped speech, which causes problems for speech recognition and speaker diarization systems. The present work uses previously less utilized signal phase information in the task of overlapped speech detection. In this context, Instantaneous Frequency Cosine Coefficient (IFCC) and Modified Group Delay Cepstral Coefficient (MGDCC) features are explored. IFCC captures the time-varying phase characteristics, while MGDCC represents the frequency-varying information of the phase spectrum. A Convolutional Neural Network and Long Short-Term Memory (CNN-LSTM)-based classifier is used for the classification. The present work uses synthetically generated overlapped speech from the GRID corpus. The proposed method is benchmarked against three baseline approaches that use magnitude spectrum features. It is observed that the combination of IFCC and MGDCC features with CNN-LSTM classifier provides better performance than the baselines. The combination of phase features with magnitude-based MFCC feature provides the best performance, indicating the importance of complementary information. The present study also investigates the effect of segment duration, genders, and number of simultaneous speakers on the overlapped speech detection system. Finally, the proposed method is also evaluated on real overlapped data from the AMI corpus.
Effect of high-energy voiced speech segments and speaker gender on shouted speech detection
Shikha Baghel, S. R. M. Prasanna, Prithwijit Guha
2021 National Conference on Communications Ncc 2021, 2021
Shouted speech detection is an essential preprocessing task in many conventional speech processing systems. Mostly, shouted speech has been studied in terms of the characterization of vocal tract and excitation source features. Previous works have also established the significance of voiced segments in shouted speech detection. This work posits that a significant emphasis is given to a portion of the voiced segments during shouted speech production. These emphasized voiced regions have significant energy. This work analyzes the effect of high-energy voiced segments on shouted speech detection. Moreover, fundamental frequency is a crucial characteristic of both shouted speech and speaker gender. Authors believe that gender has a significant effect on shouted speech detection. Therefore, the present work also studies the impact of gender on the current task. The classification between normal and shouted speech is performed using a DNN based classifier. A statistical significance test of the features extracted from high-energy voiced segments is also performed. The results support the claim that high-energy voiced segments carry highly discriminating information. Additionally, classification results of gender experiments show that gender has a notable effect on shouted speech detection.
Excitation source feature based dialect identification in Ao - A low resource language
Moakala Tzudir, Shikha Baghel, Priyankoo Sarmah, S.R. Mahadeva Prasanna
Proceedings of the Annual Conference of the International Speech Communication Association Interspeech, 2021
Ao is an under-resourced Tibeto-Burman tonal language spoken in Nagaland, India. There are three distinct dialects of the language, namely, Chungli, Mongsen and Changki. The objective of dialect identification is to identify one dialect from the other within the same language family. The goal of this study is to ascertain the potential of excitation source features for automatic dialect identification in Ao. In this direction, Integrated Linear Prediction Residual (ILPR), an approximate representation of source signal, is explored. The log Mel spectrogram of ILPR (SExt) signal is used to exploit the time-frequency characteristics of the excitation source. This work proposes attention based CNN-BiGRU architecture for automatic dialect identification tasks. Additionally, log Mel spectrogram (SV T ), extracted from the pre-emphasized speech signal, is used as a baseline method. The SV T contains the vocal-tract characteristics of the speech signal. A significant performance improvement of (nearly) 6% accuracy is observed when the excitation source feature (SExt) is combined with the vocal tract representation (SV T ). To analyse the effect of segment duration, dialect identification performance is reported for three different durations, viz., 1 sec, 3 sec and 6 sec. The effect of gender in dialect identification task for Ao is also studied in this work.
Automatic detection of shouted speech segments in Indian news debates
Shikha Baghel, Mrinmoy Bhattacharjee, S.R. Mahadeva Prasanna, Prithwijit Guha
Proceedings of the Annual Conference of the International Speech Communication Association Interspeech, 2021
Overlapped/Non-Overlapped Speech Transition Point Detection Using Bag-of-Audio-Words
Shikha Baghel, S. R. Mahadeva Prasanna, Prithwijit Guhal
Spcom 2020 International Conference on Signal Processing and Communications, 2020
Exploration of excitation source information for shouted and normal speech classification
Shikha Baghel, S. R. Mahadeva Prasanna, Prithwijit Guha
Journal of the Acoustical Society of America, 2020
Analysis of excitation source characteristics for shouted and normal speech classification
Shikha Baghel, S. R. Mahadeva Prasanna, Prithwijit Guha
26th National Conference on Communications Ncc 2020, 2020
Shouted and Normal Speech Classification Using 1D CNN
Shikha Baghel, Mrinmoy Bhattacharjee, S. R. M. Prasanna, Prithwijit Guha
Lecture Notes in Computer Science Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics, 2019
Excitation source feature for discriminating shouted and normal speech
Shikha Baghel, S. R. Mahadeva Prasanna, Prithwijit Guha
Spcom 2018 12th International Conference on Signal Processing and Communications, 2018
Classification of multi speaker shouted speech and single speaker normal speech
Shikha Baghel, S. R. Mahadeva Prasanna, Prithwijit Guha
IEEE Region 10 Annual International Conference Proceedings TENCON, 2017
Shouted/normal speech classification using speech-specific features
Shikha Baghel, Banriskhem K. Khonglah, S.R. Mahadeva Prasanna, Prithwijit Guha
IEEE Region 10 Annual International Conference Proceedings TENCON, 2017

RECENT SCHOLAR PUBLICATIONS

The Second DISPLACE Challenge: DIarization of SPeaker and LAnguage in Conversational Environments
SB Kalluri, P Singh, PR Chowdhuri, A Kulkarni, S Baghel, P Hegde, ...
arXiv preprint arXiv:2406.09494 , 2024
2024
Citations: 11
Summary of the DISPLACE Challenge 2023-DIarization of SPeaker and LAnguage in Conversational Environments
S Baghel, S Ramoji, S Jain, PR Chowdhuri, P Singh, D Vijayasenan, ...
Speech Communication , 2024
2024
Citations: 15
Driver Speech Detection in Real Driving Scenario
M Bhattacharjee, S Baghel, SRM Prasanna
International Conference on Speech and Computer, 189-199 , 2023
2023
Citations: 1
The DISPLACE Challenge 2023 - DIarization of SPeaker and LAnguage in Conversational Environments
SG Shikha Baghel, Shreyas Ramoji, Sidharth, Ranjana H, Prachi Singh, Somil ...
INTERSPEECH-2023, 3562--3566 , 2023
2023
Citations: 11
DISPLACE Challenge: DIarization of SPeaker and LAnguage in Conversational Environments
S Baghel, S Ramoji, P Singh, S Jain, PR Chowdhuri, K Kulkarni, S Padhi, ...
INTERSPEECH at https://www.isca-speech.org/archive/interspeech_2023 … , 2023
2023
Citations: 6
Overlapped Speech Detection Using AM-FM Based Time-Frequency Representations
S Baghel, SRM Prasanna, P Guha
International Conference on Speech and Computer, 33-43 , 2022
2022
Under-resourced dialect identification in Ao using source information
M Tzudir, S Baghel, P Sarmah, SRM Prasanna
The Journal of the Acoustical Society of America 152 (3), 1755-1766 , 2022
2022
Citations: 7
Analyzing RMFCC Feature for Dialect Identification in Ao, an Under-Resourced Language
M Tzudir, S Baghel, P Sarmah, SRM Prasanna
2022 National Conference on Communications (NCC), 308-313 , 2022
2022
Citations: 16
Shouted, Overlapped and Competitive Speech Detection in Indian Television News Debates
S Baghel
2022
Overlapped speech detection using phase features
S Baghel, SRM Prasanna, P Guha
The Journal of the Acoustical Society of America 150 (4), 2770-2781 , 2021
2021
Citations: 3
Effect of High-Energy Voiced Speech Segments and Speaker Gender on Shouted Speech Detection
S Baghel, SRM Prasanna, P Guha
2021 National Conference on Communications (NCC), 1-6 , 2021
2021
Citations: 1
Excitation Source Feature Based Dialect Identification in Ao-A Low Resource Language.
M Tzudir, S Baghel, P Sarmah, SRM Prasanna
Interspeech, 1524-1528 , 2021
2021
Citations: 9
Automatic Detection of Shouted Speech Segments in Indian News Debates
S Baghel, M Bhattacharjee, SRM Prasanna, P Guha
Proc. Interspeech 2021, 4179-4183 , 2021
2021
Citations: 6
Overlapped/Non-Overlapped Speech Transition Point Detection Using Bag-of-Audio-Words
S Baghel, SRM Prasanna, P Guhal
2020 International Conference on Signal Processing and Communications (SPCOM … , 2020
2020
Exploration of excitation source information for shouted and normal speech classification
S Baghel, SRM Prasanna, P Guha
The Journal of the Acoustical Society of America 147 (2), 1250-1261 , 2020
2020
Citations: 17
Analysis of Excitation Source Characteristics for Shouted and Normal Speech Classification
S Baghel, SRM Prasanna, P Guha
2020 National Conference on Communications (NCC), 1-6 , 2020
2020
Citations: 2
Shouted and normal speech classification using 1D CNN
S Baghel, M Bhattacharjee, SRM Prasanna, P Guha
Pattern Recognition and Machine Intelligence: 8th International Conference … , 2019
2019
Citations: 11
Excitation Source Feature for Discriminating Shouted and Normal Speech
S Baghel, SRM Prasanna, P Guha
2018 International Conference on Signal Processing and Communications (SPCOM … , 2018
2018
Citations: 7
Classification of multi speaker shouted speech and single speaker normal speech
S Baghel, SRM Prasanna, P Guha
TENCON 2017-2017 IEEE Region 10 Conference, 2388-2392 , 2017
2017
Citations: 8
Shouted/normal speech classification using speech-specific features
S Baghel, BK Khonglah, SRM Prasanna, P Guha
2016 IEEE Region 10 Conference (TENCON), 1655-1659 , 2016
2016
Citations: 8

MOST CITED SCHOLAR PUBLICATIONS

Exploration of excitation source information for shouted and normal speech classification
S Baghel, SRM Prasanna, P Guha
The Journal of the Acoustical Society of America 147 (2), 1250-1261 , 2020
2020
Citations: 17
Analyzing RMFCC Feature for Dialect Identification in Ao, an Under-Resourced Language
M Tzudir, S Baghel, P Sarmah, SRM Prasanna
2022 National Conference on Communications (NCC), 308-313 , 2022
2022
Citations: 16
Summary of the DISPLACE Challenge 2023-DIarization of SPeaker and LAnguage in Conversational Environments
S Baghel, S Ramoji, S Jain, PR Chowdhuri, P Singh, D Vijayasenan, ...
Speech Communication , 2024
2024
Citations: 15
The Second DISPLACE Challenge: DIarization of SPeaker and LAnguage in Conversational Environments
SB Kalluri, P Singh, PR Chowdhuri, A Kulkarni, S Baghel, P Hegde, ...
arXiv preprint arXiv:2406.09494 , 2024
2024
Citations: 11
The DISPLACE Challenge 2023 - DIarization of SPeaker and LAnguage in Conversational Environments
SG Shikha Baghel, Shreyas Ramoji, Sidharth, Ranjana H, Prachi Singh, Somil ...
INTERSPEECH-2023, 3562--3566 , 2023
2023
Citations: 11
Shouted and normal speech classification using 1D CNN
S Baghel, M Bhattacharjee, SRM Prasanna, P Guha
Pattern Recognition and Machine Intelligence: 8th International Conference … , 2019
2019
Citations: 11
Excitation Source Feature Based Dialect Identification in Ao-A Low Resource Language.
M Tzudir, S Baghel, P Sarmah, SRM Prasanna
Interspeech, 1524-1528 , 2021
2021
Citations: 9
Classification of multi speaker shouted speech and single speaker normal speech
S Baghel, SRM Prasanna, P Guha
TENCON 2017-2017 IEEE Region 10 Conference, 2388-2392 , 2017
2017
Citations: 8
Shouted/normal speech classification using speech-specific features
S Baghel, BK Khonglah, SRM Prasanna, P Guha
2016 IEEE Region 10 Conference (TENCON), 1655-1659 , 2016
2016
Citations: 8
Under-resourced dialect identification in Ao using source information
M Tzudir, S Baghel, P Sarmah, SRM Prasanna
The Journal of the Acoustical Society of America 152 (3), 1755-1766 , 2022
2022
Citations: 7
Excitation Source Feature for Discriminating Shouted and Normal Speech
S Baghel, SRM Prasanna, P Guha
2018 International Conference on Signal Processing and Communications (SPCOM … , 2018
2018
Citations: 7
DISPLACE Challenge: DIarization of SPeaker and LAnguage in Conversational Environments
S Baghel, S Ramoji, P Singh, S Jain, PR Chowdhuri, K Kulkarni, S Padhi, ...
INTERSPEECH at https://www.isca-speech.org/archive/interspeech_2023 … , 2023
2023
Citations: 6
Automatic Detection of Shouted Speech Segments in Indian News Debates
S Baghel, M Bhattacharjee, SRM Prasanna, P Guha
Proc. Interspeech 2021, 4179-4183 , 2021
2021
Citations: 6
Overlapped speech detection using phase features
S Baghel, SRM Prasanna, P Guha
The Journal of the Acoustical Society of America 150 (4), 2770-2781 , 2021
2021
Citations: 3
Analysis of Excitation Source Characteristics for Shouted and Normal Speech Classification
S Baghel, SRM Prasanna, P Guha
2020 National Conference on Communications (NCC), 1-6 , 2020
2020
Citations: 2
Driver Speech Detection in Real Driving Scenario
M Bhattacharjee, S Baghel, SRM Prasanna
International Conference on Speech and Computer, 189-199 , 2023
2023
Citations: 1
Effect of High-Energy Voiced Speech Segments and Speaker Gender on Shouted Speech Detection
S Baghel, SRM Prasanna, P Guha
2021 National Conference on Communications (NCC), 1-6 , 2021
2021
Citations: 1
Overlapped Speech Detection Using AM-FM Based Time-Frequency Representations
S Baghel, SRM Prasanna, P Guha
International Conference on Speech and Computer, 33-43 , 2022
2022
Shouted, Overlapped and Competitive Speech Detection in Indian Television News Debates
S Baghel
2022
Overlapped/Non-Overlapped Speech Transition Point Detection Using Bag-of-Audio-Words
S Baghel, SRM Prasanna, P Guhal
2020 International Conference on Signal Processing and Communications (SPCOM … , 2020
2020

Shikha Baghel

RESEARCH INTERESTS

Scopus Publications

RECENT SCHOLAR PUBLICATIONS

MOST CITED SCHOLAR PUBLICATIONS