Datasets, Models and NLP Techniques for Legal Contracts—A Survey Kapil Vuthoo, Sonia Khetarpaul, L. Venkata Subramaniam Expert Systems, 2026 The development of computational models for legal reasoning has been a prominent research area for decades. Recently, however, there has been significant progress in enhancing the comprehension of legal contracts through advanced Natural Language Processing (NLP) techniques, particularly transformer‐based models. NLP plays a crucial role in identifying and analysing various types of legal contracts and extracting critical clauses from them. While rule‐based approaches were traditionally dominant, modern deep learning and transformer models are increasingly utilized. These models enable the learning of complex rules that are often difficult for humans to articulate using symbolic or rule‐based systems. Furthermore, ongoing research is exploring neuro‐symbolic models that aim to integrate the strengths of both symbolic and neural approaches. This survey paper identifies gaps in clause relationship linkage and neuro‐symbolic approaches. This survey reviews the techniques and datasets employed in NLP for legal contract analysis, summarizing recent advancements in this field. It emphasizes the evolution of NLP since the introduction of transformer architectures such as GPT‐4, Llama, BERT, XLNet, Gemini and other variants frequently used to address a range of NLP problems. Additionally, it provides an overview of state‐of‐the‐art research that has achieved notable performance in tasks such as clause extraction, document classification, risk assessment, legal question answering and more.
AI-enhanced bilingual banking assistant Manasbir Singh Bhatia, Sonia Khetarpaul Scientific Reports, 2025 Financial literacy and access to banking services are still major challenges, especially in multilingual regions. Many people who do not speak English fluently face difficulties using traditional banking online systems. To help solve this problem, we developed a bilingual banking assistant that works in English, Hindi, and Hinglish (a mix of Hindi and English). To ensure user privacy without storing client conversations, the proposed assistant processes natural language banking queries and provides help with product details and financial advice. Our solution uses a Mixtral AI model for language understanding, automatic detection of English, Hindi and mixed languages and uses Google Translate to provide real-time translations. It also supports voice input and output, that makes it easier for users to interact using speech. The assistant can handle queries related to account services, fixed deposits, credit cards, fund transfers and many other banking-related topics. We conducted usability tests with 50 participants across 100 conversations to evaluate the effectiveness of the system. The development of this banking assistant shows how conversational interfaces can potentially improve banking accessibility across various language barriers to promote greater financial inclusion as well as customer satisfaction in a privacy-conscious manner.
Predicting Epidemic Outbreak Using Climatic Factors Dolly Sharma, Sonia Khetarpaul, Shashwat Tiwari, Lakshman Aakash, Aryan Gupta Lecture Notes in Computer Science Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics, 2024
Analyzing the Efficacy of Large Language Models: A Comparative Study Sonia Khetarpaul, Dolly Sharma, Shreya Sinha, Aryan Nagpal, Aarush Narang Lecture Notes in Computer Science Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics, 2024
Location-Based Ideal Site Selection using Clustering Sonia Khetarpaul, Dolly Sharma, Saurabh Mishra, Shambhavi Sud, Pranav Soni, Madhav Agarwal Proceedings of Inc4 2024 2024 IEEE International Conference on Contemporary Computing and Communications, 2024
Performance Prediction of Songs on Online Music Platforms Dolly Sharma, Sonia Khetarpaul, S Mohit Kumar, Ambreesh Parthasarathy, Sparsh Agarwalla Lecture Notes in Computer Science Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics, 2022
A Real Time Analysis of Offensive Texts to Prevent Cyberbullying Sonia Khetarpaul, Dolly Sharma, Mayuri Gupta, Vaibhav Gautam Lecture Notes in Computer Science Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics, 2021
Promoting fairness in LLMs: detection and mitigation of gender bias T Sachdeva, M Singhal, S Khetarpaul Knowledge and Information Systems 68 (1), 107 , 2026 2026
Datasets, Models and NLP Techniques for Legal Contracts—A Survey K Vuthoo, S Khetarpaul, L Venkata Subramaniam Expert Systems 43 (6), e70267 , 2026 2026
Investigating Cultural Bias: A Comparative Study of Large Language Models S Srivastava, P Singh, S Khetarpaul, U Sharma 11th International Conference on Pattern Recognition and Machine … , 2026 2026
A Comparative Study of AI-Driven Ontology Enrichment for Environmental Sustainability S Khetarpaul, H Baazaoui-Zghal, S Gupta, V Bist, D Bhatnagar 2025 IEEE International Conference on Big Data (BigData), 2899-2907 , 2026 2026
SiteSense: Optimizing Restaurant Site Selection Using a Comprehensive Data-Driven Framework A Swaroop, K Prasad, S Khetarpaul International Conference on Big Data Analytics, 171-187 , 2026 2026
IndLaw-QA: Fine-Tuned LLMs with RAG for Indian Legal QA A Badoni, DA Singh, K Vuthoo, S Singh, S Khetarpaul, LV Subramaniam International Conference on Big Data Analytics, 35-48 , 2026 2026
LegalNexus: Hierarchy-Aware Legal Case Retrieval Using Hyperbolic Graph Learning and Multi-Agent Refinement A Mishra, K Vuthoo, S KHETARPAUL, LV Subramaniam Available at SSRN 6678264 , 2026 2026
A Safety-Aware Intelligent Route Planning Framework Using Real-Time Hazard Detection and Traffic Prediction R Prasad, S KHETARPAUL, S Rankavat Available at SSRN 6168896 , 2026 2026 Citations: 1
Traffic-Aware Optimal Taxi Placement Using Graph Neural Network-Based Reinforcement Learning S Khetarpaul, PY Sharan 19th IEEE International Conference on Advanced Networks and … , 2025 2025
AI-enhanced bilingual banking assistant MS Bhatia, S Khetarpaul Scientific Reports 15 (1), 37526 , 2025 2025 Citations: 2
AASE: AI-Driven Automated Answer Script Evaluation MA Sura, M Rai, S Khetarpaul, S Mishra Authorea Preprints , 2025 2025 Citations: 1
Efficient Clause Identification in Contracts Using NLP and Web-Sourced Data K Vuthoo, S Khetarpaul, S Mishra International Conference on Data Analytics & Management, 531-541 , 2025 2025
LawMate: Leveraging Domain-Specific LLMs for the Indian Legal Ecosystem A Pattnayak, A Ramkumar, S Khetarpaul, K Vuthoo Asian Conference on Intelligent Information and Database Systems, 188-201 , 2025 2025 Citations: 42
Deciphering Email Threads: A Comparative Study of Large Language Models S Khetarpaul, D Sharma, A Barakoti, A Sharma, A Jain 2025 International Conference on Innovation in Computing and Engineering … , 2025 2025 Citations: 2
Identifying and recommending taxi hotspots in spatio-temporal space S Mishra, S Khetarpaul GeoInformatica 29 (1), 93-113 , 2025 2025 Citations: 1
Combining GraphSAGE and Label Propagation for Node Classification in Graphs D Sharma, S Khetarpaul, C Verma, P Jain International Conference on Information Integration and Web Intelligence … , 2024 2024
CLOR-QA: Cross-Lingual Open-Retrieval Question Answering Model with Dynamic Database Integration S Khetarpaul, VV Patil, MA Sura, S Sharma, P Singh International Conference on Information Integration and Web Intelligence, 19-31 , 2024 2024
ElectionBot for Voters: Bridging the Gap Between Data and Decision-Making with LLM S Khetarpaul, K Chathley, M Rai 2024 IEEE Region 10 Symposium (TENSYMP), 1-6 , 2024 2024
Analyzing the Efficacy of Large Language Models: A Comparative Study S Khetarpaul, D Sharma, S Sinha, A Nagpal, A Narang International Conference on Database and Expert Systems Applications, 215-221 , 2024 2024
Lecture video summarization using deep learning S Khetarpaul, L Jain, K Goyal, PV Tej Asian Conference on Intelligent Information and Database Systems, 94-105 , 2024 2024 Citations: 2
MOST CITED SCHOLAR PUBLICATIONS
Mining GPS data to determine interesting locations S Khetarpaul, R Chauhan, SK Gupta, LV Subramaniam, U Nambiar Proceedings of the 8th International Workshop on Information Integration on … , 2011 2011 Citations: 67
LawMate: Leveraging Domain-Specific LLMs for the Indian Legal Ecosystem A Pattnayak, A Ramkumar, S Khetarpaul, K Vuthoo Asian Conference on Intelligent Information and Database Systems, 188-201 , 2025 2025 Citations: 42
SHEG: summarization and headline generation of news articles using deep learning RK Singh, S Khetarpaul, R Gorantla, SG Allada Neural Computing and Applications 33 (8), 3251-3265 , 2021 2021 Citations: 38
Bus arrival time prediction using a modified amalgamation of fuzzy clustering and neural network on spatio-temporal data S Khetarpaul, SK Gupta, S Malhotra, LV Subramaniam Australasian Database Conference, 142-154 , 2015 2015 Citations: 24
Object and currency detection with audio feedback for visually impaired KKSN Reddy, C Yashwanth, PATV Sai, S Khetarpaul 2020 IEEE Region 10 Symposium (TENSYMP), 1152-1155 , 2020 2020 Citations: 14
Spatiotemporal social (STS) data model: correlating social networks and spatiotemporal data S Khetarpaul, SK Gupta, LV Subramaniam Social Network Analysis and Mining 6 (1), 81 , 2016 2016 Citations: 8
Analyzing travel patterns for scheduling in a dynamic environment S Khetarpaul, SK Gupta, LV Subramaniam International Conference on Availability, Reliability, and Security, 304-318 , 2013 2013 Citations: 7
Mining GPS traces to recommend common meeting points S Khetarpaul, SK Gupta, LV Subramaniam, U Nambiar Proceedings of the 16th International Database Engineering & Applications … , 2012 2012 Citations: 7
A Real Time Analysis of Offensive Texts to Prevent Cyberbullying S Khetarpaul, D Sharma, M Gupta, V Gautam Australasian Database Conference, 152-165 , 2021 2021 Citations: 6
Symptoms-disease detecting conversation agent using knowledge graphs I Ananta, S Khetarpaul, D Sharma Proceedings of the 2024 Australasian Computer Science Week, 98-107 , 2024 2024 Citations: 4
Real-Time Detection and Visualization of Traffic Conditions by Mining Twitter Data S Khetarpaul, D Sharma, JI Jose, M Saragur Australasian Database Conference, 141-152 , 2022 2022 Citations: 4
Mining location based social networks to understand the citizen’s check-in patterns S Khetarpaul Computing 103 (12), 2967-2993 , 2021 2021 Citations: 4
Location-based ideal site selection using clustering S Khetarpaul, D Sharma, S Mishra, S Sud, P Soni, M Agarwal 2024 IEEE International Conference on Contemporary Computing and … , 2024 2024 Citations: 3
Performance prediction of songs on online music platforms D Sharma, S Khetarpaul, S Mohit Kumar, A Parthasarathy, S Agarwalla Australasian Database Conference, 209-216 , 2022 2022 Citations: 3
Optimal placement of taxis in a city using dominating set problem S Mishra, S Khetarpaul Australasian Database Conference, 111-124 , 2021 2021 Citations: 3
AI-enhanced bilingual banking assistant MS Bhatia, S Khetarpaul Scientific Reports 15 (1), 37526 , 2025 2025 Citations: 2
Deciphering Email Threads: A Comparative Study of Large Language Models S Khetarpaul, D Sharma, A Barakoti, A Sharma, A Jain 2025 International Conference on Innovation in Computing and Engineering … , 2025 2025 Citations: 2
Lecture video summarization using deep learning S Khetarpaul, L Jain, K Goyal, PV Tej Asian Conference on Intelligent Information and Database Systems, 94-105 , 2024 2024 Citations: 2
Predicting epidemic outbreak using climatic factors D Sharma, S Khetarpaul, S Tiwari, L Aakash, A Gupta Asian Conference on Intelligent Information and Database Systems, 264-275 , 2024 2024 Citations: 2
Mining optimal meeting points for moving users in spatio-temporal space S Khetarpaul, SK Gupta, LV Subramaniam Social Network Analysis and Mining 8 (1), 50 , 2018 2018 Citations: 2