Information Retrieval, Recommender Systems, Text Mining, Data Science
106
Scopus Publications
Scopus Publications
Evaluating Semantic and Perceptual Alignment in Multilingual Story Visualization Krishna Tewari, Sharma Nandini Surendra, Divya Sharma, Sukomal Pal International Conference on Intelligent User Interfaces Proceedings IUI, 2026 Story visualization requires generating sequential images that maintain character consistency, narrative relevance, and cultural alignment. While generative models show promise, evaluating these outputs still relies heavily on human judgment. This highlights a clear need for reliable, automatic evaluation metrics. This paper focuses on evaluating multilingual story illustrations by comparing generated images against reference frames. We employ DreamSim for perceptual similarity and introduce foreground-only evaluation to reduce background bias. Additionally, we propose a hybrid metric that combines Jaccard similarity with DreamSim. Our experiments on English and Hindi story datasets show that DreamSim provides a strong perceptual baseline. However, it struggle to capture semantic relevance in stylized illustrations. After analyzing precision, recall, accuracy, and F1-scores, we find that perceptual metrics often fail to reflect narrative depth in culturally specific styles. Our work highlights the open challenges in this field and advocates for narrative-aware, multimodal evaluation frameworks. We present these findings to solicit feedback on our design choices and to encourage discussion on robust assessment methods for diverse story visualization.
Overview of the CMIR Track at FIRE 2025: Code-Mixed Information Retrieval from Social Media Data Supriya Chanda, Krishna Tewari, Sukomal Pal Fire 2025 Proceedings of the 17th Annual Meeting of the Forum for Information Retrieval Evaluation, 2026 The increasing use of multilingual and code-mixed communication on social media presents unique challenges for Information Retrieval (IR), especially in low-resource languages such as Bengali. To foster research in this direction, we organized the CMIR-2025 shared task, the second edition of the Code-Mixed Information Retrieval challenge. Building on the initial CMIR-2024 task, where only Roman-script Bengali was considered, this year’s edition introduces a more realistic and complex setting by retaining Bengali words in their native script. The resulting dataset contains mixed-script Bengali–English text, requiring participating systems to retrieve relevant comments for a given query from social media discussions. Eight teams participated, submitting a total of 26 runs using lexical models, neural rankers, and fusion-based approaches. Evaluation using MAP, nDCG, P@5, and P@10 reveals that fusion and hybrid retrieval systems consistently outperform standalone models, indicating the importance of combining lexical and semantic signals for handling noisy code-mixed data. This paper presents the dataset, task design, evaluation results, and key insights that highlight open challenges and future research directions for mixed-script IR.
Overview of the Shared Task on Multilingual Story Illustration: Bridging Cultures through AI Artistry (MUSIA) Krishna Tewari, Anshita Malviya, Supriya Chanda, Arjun Mukherjee, Sukomal Pal Fire 2025 Proceedings of the 17th Annual Meeting of the Forum for Information Retrieval Evaluation, 2026 The Multilingual Story Illustration Shared Task (MUSIA), conducted as part of FIRE-2025, investigates the challenge of generating culturally grounded visual narratives for short stories written in Hindi and English. As multimodal AI becomes increasingly relevant in education, and creative content generation, this task evaluates how effectively current systems can interpret multilingual text and produce coherent, culturally faithful image sequences. A total of 8 teams registered for the shared task; 5 teams submitted system runs, and 4 teams ultimately submitted overview papers. Their approaches spanned narrative segmentation, translation, summarization, prompt engineering, and diffusion-based image generation using state-of-the-art models. Human evaluation measured system performance along three dimensions: relevance, consistency, and visual quality. Results showed that pipelines integrating large language models for story-aware prompt construction, combined with diffusion models, achieved the strongest performance, particularly in visual quality and cultural alignment. However, several systems struggled with maintaining character identity, stylistic continuity, and fine-grained adherence to story details. This paper presents the dataset, task formulation, methodological strategies adopted by participating teams, and comparative results, offering a foundation for advancing research in multilingual, culturally sensitive story visualization.
Advancing Hindi Text Summarization: Named Entity Recognition and Content Augmentation Strategies Saumay Gupta, Sukomal Pal ACM Transactions on Asian and Low Resource Language Information Processing, 2025 We explore advancements in Hindi text summarization, a critical area in natural language processing that aids in managing information overload. Despite a growing corpus of Hindi data, there’s a significant gap in practical summarization tools due to intricate linguistic features and limited resources compared to English. Previous works focused on extractive methods, but recent shifts towards abstractive approaches promise more natural and coherent summaries by understanding and paraphrasing content. Our research introduces novel methodologies, Named Entity Aware-Abstractive Text Summarization (NEA-ATS) and Query-Driven Content Augmentation for Summarization (QDCAS), aimed at enhancing the accuracy and richness of Hindi summaries. NEA-ATS integrates Named Entity Recognition to prioritize crucial information, improving language model attention to critical details but occasionally disrupting context. While NEA-ATS shows some improvements, it occasionally disrupts the text’s context, leading to only marginal gains in summary quality. Meanwhile, QDCAS addresses extrinsic hallucinations—common in state-of-the-art models—by augmenting source documents with relevant content through focused web crawling—a technique to selectively gather topic-specific web pages—broadening contextual understanding and refining outputs. Empirical results demonstrate the effectiveness of QDCAS, showing marginal improvements in ROUGE and BERTScores over traditional language models. This work advances Hindi text summarization and explores content-rich strategies, potentially expanding to other languages and domains.
Overview of the shared task on code-mixed information retrieval from social media data Supriya Chanda, Sukomal Pal ACM International Conference Proceeding Series, 2025 The rise of multilingual communication on social media platforms such as Facebook, Twitter, and WhatsApp presents a compelling challenge for information retrieval in code-mixed contexts within natural language processing.This paper provides an overview of the Code-Mixed Information Retrieval Shared Task, which is part of the FIRE-2024 conference.The main focus of this experiment was the evaluation of how relevant documents code-mixed from a corpus of Bengali-English comments were to be given for a set of code-mixed queries.Six teams showed interest in participating in the shared task; two teams provided their runs.This article describes the models used by the competing teams and their performance evaluated on the Mean Average Precision (MAP), a significant metric used for information retrieval tasks.
A case study on decompounding in Indian language IR Siba Sankar Sahu, Sukomal Pal Natural Language Processing, 2025 Decompounding is an essential preprocessing step in text-processing tasks such as machine translation, speech recognition, and information retrieval (IR). Here, the IR issues are explored from five viewpoints. (A) Does word decompounding impact the Indian language IR? If yes, to what extent? (B) Can corpus-based decompounding models be used in the Indian language IR? If yes, how? (C) Can machine learning and deep learning-based decompounding models be applied in the Indian language IR? If yes, how? (D) Among the different decompounding models (corpus-based, hybrid machine learning-based, and deep learning-based), which provides the best effectiveness in the IR domain? (E) Among the different IR models, which provides the best effectiveness from the IR perspective? This study proposes different corpus-based, hybrid machine learning-based, and deep learning-based decompounding models in Indian languages (Marathi, Hindi, and Sanskrit). Moreover, we evaluate the effectiveness of each activity from an IR perspective only. It is observed that the different decompounding models improve IR effectiveness. The deep learning-based decompounding models outperform the corpus-based and hybrid machine learning-based models in Indian language IR. Among the different deep learning-based models, the Bi-LSTM-A model performs best and improves mean average precision (MAP) by 28.02% in Marathi. Similarly, the Bi-RNN-A model improves MAP by 18.18% and 6.1% in Hindi and Sanskrit, respectively. Among the retrieval models, the In_expC2 model outperforms others in Marathi and Hindi, and the BB2 model outperforms others in Sanskrit.
Overview of the shared task on code-mixed information retrieval from social media data Ceur Workshop Proceedings, 2025
Advancing Vision and Language in GI Diagnosis: Florence2 for Question Answering and Stable Diffusion for Image Synthesis Ceur Workshop Proceedings, 2025
IReL, IIT(BHU) at MEDIQA-MAGIC 2025: Tackling Multimodal Dermatology with CLIPSeg-Based Segmentation and BERT-Swin Question Answering Ceur Workshop Proceedings, 2025
Arcturus at CheckThat! 2025: DeBERTa-v3-Base for Multilingual Subjectivity Detection in News Articles Ceur Workshop Proceedings, 2025
Findings of the Code-Mixed Information Retrieval from Social Media Data (CMIR) Shared Task at FIRE 2025 Ceur Workshop Proceedings, 2025
SAViOR: Sentiment, Sarcasm, Abuse, and Vulgarity in Online Realities (Memes) Ceur Workshop Proceedings, 2025
Sentiment Analysis and Homophobia detection of Code-Mixed Dravidian Languages leveraging pre-trained model and word-level language tag Ceur Workshop Proceedings, 2022
Coarse and Fine-Grained Conversational Hate Speech and Offensive Content Identification in Code-Mixed Languages using Fine-Tuned Multilingual Embedding Ceur Workshop Proceedings, 2022
Extractive Text Summarization using Meta-heuristic Approach Ceur Workshop Proceedings, 2022
Is Meta Embedding better than pre-trained word embedding to perform Sentiment Analysis for Dravidian Languages in Code-Mixed Text? Ceur Workshop Proceedings, 2021
Fine-tuning Pre-Trained Transformer based model for Hate Speech and Offensive Content Identification in English, Indo-Aryan and Code-Mixed (English-Hindi) languages Ceur Workshop Proceedings, 2021
MetaGen: An academic Meta-review Generation system Chaitanya Bhatia, Tribikram Pradhan, Sukomal Pal SIGIR 2020 Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, 2020
IRLab@IITBHU@Dravidian-CodeMix-FIRE2020: Sentiment analysis for dravidian languages in code-mixed text Ceur Workshop Proceedings, 2020
Sentiment analysis on multilingual code mixing text using BERT-BASE: participation of IRLab@IIT(BHU) in dravidian-CodeMix and HASOC tasks of FIRE2020 Ceur Workshop Proceedings, 2020
An Indian Language Social Media Collection for Hate and Offensive Speech Lrec 2020 Workshop Language Resources and Evaluation Conference Resources and Techniques for User and Author Profiling in Abusive Language Rest Up 2020 Proceedings, 2020
IIT BHU at fire 2018 Irmidis track - Obtaining factual tweets during natural disasters Ceur Workshop Proceedings, 2018
IIT-BHU In TREC 2018 Incidents Stream Track 27th Text Retrieval Conference Trec 2018 Proceedings, 2018
Rule based event extraction system from newswires and social media text in Indian languages (EventXtract-IL) for english and Hindi data Ceur Workshop Proceedings, 2018
IIT(Bhu)@Iecsil-Fire-2018: Language independent automatic framework for entity extraction in Indian languages Ceur Workshop Proceedings, 2018
Design of a meta search system for legal domain Ambedkar Kanapala, Sukomal Pal, Rajendra Pamula 2017 4th International Conference on Advanced Computing and Communication Systems Icaccs 2017, 2017
Microblog retrieval for disaster relief: How to create ground truths? Ceur Workshop Proceedings, 2017
IIT BHU at FIRE 2017 IRMiDis Track - Fully automatic approaches to information retrieval Ceur Workshop Proceedings, 2017
IIT BHU at FIRE 2016 microblog track: A semi-automatic microblog retrieval system Ceur Workshop Proceedings, 2016
IR-IITBHU at TREC 2016 Open Search Track: Retrieving documents using Divergence From Randomness model in Terrier 25th Text Retrieval Conference Trec 2016 Proceedings, 2016
Passage retrieval for tweet contextualization at INEX 2012 Ceur Workshop Proceedings, 2012
DCU and ISI@INEX 2010: Adhoc and data-centric tracks Debasis Ganguly, Johannes Leveling, Gareth J. F. Jones, Sauparna Palchowdhury, Sukomal Pal, Mandar Mitra Lecture Notes in Computer Science Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics, 2011
Using negative information in search Sauparna Palchowdhury, Sukomal Pal, Mandar Mitra Proceedings 2nd International Conference on Emerging Applications of Information Technology Eait 2011, 2011
The fire 2008 evaluation exercise Prasenjit Majumder, Mandar Mitra, Dipasree Pal, Ayan Bandyopadhyay, Samaresh Maiti, Sukomal Pal, Deboshree Modak, Sucharita Sanyal ACM Transactions on Asian Language Information Processing, 2010
Indian statistical institute at INEX 2008 adhoc track Sukomal Pal, Mandar Mitra, Debasis Ganguly, Samaresh Maiti, Ayan Bandyopadhyay, Aparajita Sen, Sukanya Mitra Lecture Notes in Computer Science Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics, 2009
Text collections for FIRE Prasenjit Majumder, Mandar Mitra, Dipasree Pal, Ayan Bandyopadhyay, Samaresh Maiti, Sukanya Mitra, Aparajita Sen, Sukomal Pal ACM SIGIR 2008 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval Proceedings, 2008