Hypernymy Relation in NLP: Tasks, Approaches, Resources, and Future Directions—A Systematic Literature Review Randah Alharbi, Husni Al-Muhtaseb, Tarek Helmy IEEE Access, 2025 Hypernymy is a semantic relation between two terms, where a more specific term is entailed by a more general term—that is, the meaning of the more specific term is encompassed by the meaning of the more general term. This relation is crucial for many natural language processing (NLP) tasks, including textual entailment, search query expansion, and machine translation. In this systematic literature review, the hypernymy relation in the context of NLP is investigated, with a focus on identifying hypernymy-related tasks, employed approaches, available resources, and future research directions. The reviewed studies were extracted from five pre-defined databases, covering the period from 2018 to March 2023. The review process identified 75 primary studies that were analyzed to extract the targeted tasks, languages, approaches, representations, and datasets. The synthesized data were used to address the review questions. The review identifies the main hypernymy-related tasks, including hypernymy extraction, detection, directionality, graded lexical entailment, and discovery. The evaluation practices employed were summarized, including accuracy, F1, Mean Average Precision (MAP), and Spearman’s correlation coefficient. The targeted languages are highlighted, with English being the most studied; however, multilingual coverage is steadily growing. Several benchmark datasets for each task are presented, along with their statistics, characteristics, and construction techniques. Additionally, representation techniques are summarized, ranging from Word2Vec, FastText, and GloVe to hypernymy-specific representations. Finally, research gaps are discussed, and potential future directions are outlined. This review consolidates scattered findings and provides a practical map of tasks, resources, and techniques for researchers building hypernymy-aware NLP systems.
BPTI: Bilingual Printed Text Images Dataset for Recognition Purposes Mohammed Yahia, Husni Al-Muhtaseb International Arab Journal of Information Technology, 2023 Datasets of text images are important for optical text recognition systems. Such datasets can be used to enhance performance and recognition rates. In this research work, we present a bilingual dataset consists of Arabic/English text images to address the lack of availability of bilingual text databases. The presented dataset consists of 97812 text images, which are categorized into two groups; Scanned page and digitized line images. Images of the two forms are written with 10 fonts and four sizes, and prepared/scanned with four dpi resolutions. The dataset preparation process includes text collection, text editing, image construction, and image processing. The dataset can be used in optical text recognition, optical font recognition, language identification, and segmentation. Different text recognition and language identification experiments have been conducted using images of the dataset and Hidden Markov Model (HMM) classifier. For the digitized images recognition experiments, the best-achieved recognition correctness is 99.01% and the best accuracy is 99.01%. The font that has the highest recognition rates was Tahoma. For the scanned images recognition experiments, Tahoma has also shown the highest performance with 97.86% for correctness and 97.73% for accuracy. For the language identification experiments, Tahoma has shown the performance with 99.98% for word-language identification rate.
Arabic Keyphrase Extraction: Enhancing Deep Learning Models with Pre-trained Contextual Embedding and External Features Randah Alharbi, Husni Al-Muhtasab Wanlp 2022 7th Arabic Natural Language Processing Proceedings of the Workshop, 2022 Keyphrase extraction is essential to many Information retrieval (IR) and Natural language Processing (NLP) tasks such as summarization and indexing. This study investigates deep learning approaches to Arabic keyphrase extraction. We address the problem as sequence classification and create a Bi-LSTM model to classify each sequence token as either part of the keyphrase or outside of it. We have extracted word embeddings from two pre-trained models, Word2Vec and BERT. Moreover, we have investigated the effect of incorporating linguistic, positional, and statistical features with word embeddings on performance. Our best-performing model has achieved 0.45 F1-score on ArabicKPE dataset when combining linguistic and positional features with BERT embedding.
Arabic Phonemes Transcription Using Learning Vector Quantization: 'Towards the Development of Fast Quranic Text Transcription' Khalid M.O. Nahar, Wasfi G. Al-Khatib, Moustafa Elshafei, Husni Al-Muhtaseb, Mansour M. Alghamdi Proceedings 2013 Taibah University International Conference on Advances in Information Technology for the Holy Quran and Its Sciences Nooric 2013, 2015 In this paper, we investigated the use of Learning Vector Quantization (LVQ) for phoneme transcription in Arabic speech recognition systems. We used Arabic speech corpus of TV news clips. Then, we employed feature vectors, which embed the frame neighboring correlation information between adjacent phonemes to replace the traditional trip hones models. Next, we generated the phonemes codebooks using the K-means splitting algorithm. After that, we trained the generated codebooks using the LVQ algorithm. When using the trained LVQ codebooks in utterance phoneme transcription of an open vocabulary test corpus, the phoneme recognition rate was 72% without the use of any added phoneme big rams or HMM models. The results of this research if improved could be used to serve the holy Quran text transcription without any phonemes big rams (phonemes language model). This would increase the speed of the Quranic speech to text transcription and creates the infrastructure of suitable high speed automatic identification system of Quranic sounds recognition and translation.
Arabic phonemes transcription using data driven approach International Arab Journal of Information Technology, 2015
Data-driven Arabic phoneme recognition using varying number of HMM states K. M. O. Nahar, W. G. Al-Khatib, M. Elshafei, H. Al-Muhtaseb, M. M. Alghamdi 2013 1st International Conference on Communications Signal Processing and their Applications Iccspa 2013, 2013 Continuous Arabic Speech Recognition, appears in many real life applications. Its speed, accuracy and improvement are highly dependent on the accuracy of the language phonemes set. The main goal of this research is to recognize and transcribe the Arabic phonemes based on a data-driven approach. We built a phoneme recognizer based on a data driven approach using HTK tool. Different numbers of Gaussian mixtures with different numbers of HMM states were used in modeling the Arabic phonemes in order to reach the best configuration. The corpus used consists of about 4000 files, representing 5 recorded hours of modern standard Arabic of TV-News. The maximum phoneme recognition accuracy reached was 56.79%. This result is very encouraging and shows the viability of our approach as compared to using a fixed number of HMM states.
Statistical analysis of Arabic phonemes used in Arabic speech recognition Khalid M. O Nahar, Mustafa Elshafei, Wasfi G. Al-Khatib, Husni Al-Muhtaseb, Mansour M. Alghamdi Lecture Notes in Computer Science Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics, 2012
Rescoring N-Best Hypotheses for Arabic Speech Recognition: A Syntax-Mining Approach Amta 2012 4th Workshop on Computational Approaches to Arabic Script Based Languages Proceedings, 2012
KHABEER: An object-oriented arabic expert system shell Arabian Journal for Science and Engineering, 1997
RECENT SCHOLAR PUBLICATIONS
ASRD: Development and Validation of a Large-Scale Arabic Semantic Relation Dataset R Alharbi, T Helmy, A Al-Saghyir, S Aglan, A Alosaimy, H Al-Muhtaseb 2025
Hypernymy Relation in NLP: Tasks, Approaches, Resources, and Future Directions—A Systematic Literature Review R Alharbi, H Al-Muhtaseb, T Helmy IEEE Access 13, 206272-206310 , 2025 2025 Citations: 1
BPTI: Bilingual Printed Text Images Dataset for Recognition Purposes M Yahia, H Al-Muhtaseb The International Arab Journal of Information Technology 20 (4) , 2023 2023
Arabic keyphrase extraction: Enhancing deep learning models with pre-trained contextual embedding and external features R Alharbi, H Al-Muhtasab Proceedings of the Seventh Arabic Natural Language Processing Workshop … , 2022 2022 Citations: 4
Sport-fanaticism lexicons for sentiment analysis in Arabic social text M Alqmase, H Al-Muhtaseb Social Network Analysis and Mining 12 (1), 56 , 2022 2022 Citations: 5
BPTI: Bilingual Printed Text Images Dataset for Recognition Purposes M Yahia, H Al-Muhtaseb Social Science Research Network (SSRN). https://papers.ssrn.com/sol3/papers … , 2022 2022
Sports-fanaticism formalism for sentiment analysis in Arabic text M Alqmase, H Al-Muhtaseb, H Rabaan Social Network Analysis and Mining 11 (1), 52 , 2021 2021 Citations: 26
Recognition of Printed Arabic-English Text MHN Yahia PQDT-Global , 2018 2018
Arabic Dataset for Automatic Keyphrase Extraction M Al Logmani, H Al Muhtaseb Seventh International Conference on Computer Science and Information … , 2017 2017 Citations: 3
Arabic phonemes recognition using hybrid LVQ/HMM model for continuous speech recognition KMO Nahar, M Abu Shquier, WG Al-Khatib, H Al-Muhtaseb, M Elshafei International Journal of Speech Technology 19 (3), 495-508 , 2016 2016 Citations: 28
An Arabic corpus to assist in the automatic extraction of key-phrases (in Arabic) مكنز عربي للمساعدة في الاستنباط الآلي للعبارات المفتاحية M Al Logmani, H Al-Muhtaseb The 5th international conference on Arabic language, Dubai. المؤتمر الدولي … , 2016 2016
Modeling the phenomenon of changing word pronunciation resulting from intonation judgements (in Arabic) نمذجة ظاهرة تغير نطق الكلمات الناتج عن أحكام التجويد M Amro, W Al-Khatib, Elshafei, Moustafa, H Al-Muhtaseb The 5th international conference on Arabic language, Dubai. المؤتمر الدولي … , 2016 2016
Post-processing optimization for Arabic optical character recognition (In Arabic) تحسين مرحلة "بعد المعالجة" في نظام التعرف الضوئي الآلي على الكتابة العربية H Al-Muhtaseb, H Luqman The 5th international conference on Arabic language, Dubai. المؤتمر الدولي … , 2016 2016
Automatic vocalization of Arabic text YMS Khraishi PQDT-Global , 2016 2016 Citations: 1
Towards A Minimal Phonetic Set for Quran Recitation HA Al-Muhtaseb, SA Bellegdi International Journal on Islamic an Al-Muhtaseb, HA, & Bellegdi, SA (2016 … , 2016 2016 Citations: 1
Automatic rule based phonetic transcription and syllabification for quranic text SA Bellegdi, HA Al-Muhtaseb International Journal on Islamic Applications in Computer Science And … , 2015 2015 Citations: 6
Arabic Phonemes Transcription using Data Driven Approach. K Nahar, H Al-Muhtaseb, W Al-Khatib, M Elshafei, M Alghamdi International Arab Journal of Information Technology (IAJIT) 12 (3) , 2015 2015 Citations: 19
System and method for decoding speech DEM Abuzeina, M Elshafei, H Al-Muhtaseb, WG Al-Khatib US Patent App. 13/597,162 , 2014 2014 Citations: 41
Arabic Phonemes Transcription Using Learning Vector Quantization:" Towards the Development of Fast Quranic Text Transcription" KMO Nahar, WG Al-Khatib, M Elshafei, H Al-Muhtaseb, MM Alghamdi 2013 Taibah University International Conference on Advances in Information … , 2013 2013 Citations: 2
Method of generating a transliteration font S Awaida, H Al-Muhtaseb US Patent 8,438,008 , 2013 2013 Citations: 16
MOST CITED SCHOLAR PUBLICATIONS
Offline handwritten Arabic cursive text recognition using Hidden Markov Models and re-ranking JH AlKhateeb, J Ren, J Jiang, H Al-Muhtaseb Pattern Recognition Letters 32 (8), 1081-1088 , 2011 2011 Citations: 175
Recognition of off-line printed Arabic text using Hidden Markov Models HA Al-Muhtaseb, SA Mahmoud, RS Qahwaji Signal processing 88 (12), 2902-2912 , 2008 2008 Citations: 132
Statistical methods for automatic diacritization of Arabic text M Elshafei, H Al-Muhtaseb, M Alghamdi The Saudi 18th National Computer Conference. Riyadh 18, 301-306 , 2006 2006 Citations: 102
Techniques for high quality Arabic speech synthesis M Elshafei, H Al-Muhtaseb, M Al-Ghamdi Information sciences 140 (3-4), 255-267 , 2002 2002 Citations: 81
Arabic broadcast news transcription system M Alghamdi, M Elshafei, H Al-Muhtaseb International Journal of Speech Technology 10, 183-195 , 2009 2009 Citations: 60
Generation of Arabic phonetic dictionaries for speech recognition M Ali, M Elshafei, M Al-Ghamdi, H Al-Muhtaseb, A Al-Najjar 2008 International conference on innovations in information technology, 59-63 , 2008 2008 Citations: 42
System and method for decoding speech DEM Abuzeina, M Elshafei, H Al-Muhtaseb, WG Al-Khatib US Patent App. 13/597,162 , 2014 2014 Citations: 41
Machine generation of Arabic diacritical marks MA Elshafei 2006 Citations: 40
Arabic phonetic dictionaries for speech recognition M Ali, M Elshafei, M Al-Ghamdi, H Al-Muhtaseb Journal of Information Technology Research (JITR) 2 (4), 67-80 , 2009 2009 Citations: 37
Cross-word Arabic pronunciation variation modeling for speech recognition D AbuZeina, W Al-Khatib, M Elshafei, H Al-Muhtaseb International Journal of Speech Technology 14 (3), 227-236 , 2011 2011 Citations: 34
Automatic arabic text image optical character recognition method HA Al-Muhtaseb, SA Mahmoud, R Qahwaji US Patent 8,150,160 , 2012 2012 Citations: 31
Statistical analysis of Arabic phonemes used in Arabic speech recognition KMO Nahar, M Elshafei, WG Al-Khatib, H Al-Muhtaseb, MM Alghamdi Neural Information Processing: 19th International Conference, ICONIP 2012 … , 2012 2012 Citations: 29
Arabic phonemes recognition using hybrid LVQ/HMM model for continuous speech recognition KMO Nahar, M Abu Shquier, WG Al-Khatib, H Al-Muhtaseb, M Elshafei International Journal of Speech Technology 19 (3), 495-508 , 2016 2016 Citations: 28
Sports-fanaticism formalism for sentiment analysis in Arabic text M Alqmase, H Al-Muhtaseb, H Rabaan Social Network Analysis and Mining 11 (1), 52 , 2021 2021 Citations: 26
Some Differences Between Arabic and English: A Step Towards an Arabic Upper Model H Al-Muhtaseb, C Mellish The 6th International Conference on Multilingual Computing, Cambridge, UK. , 1998 1998 Citations: 22
Techniques for high quality Arabic speech synthesis H Al-Muhtaseb, M Elshafei, M Al-Ghamdi Information sciences 140, 255-267 , 2002 2002 Citations: 21
Speaker-independent natural Arabic speech recognition system M Elshafei, H Al-Muhtaseb, M Al-Ghamdi The International Conference on Intelligent Systems , 2008 2008 Citations: 20
Arabic Phonemes Transcription using Data Driven Approach. K Nahar, H Al-Muhtaseb, W Al-Khatib, M Elshafei, M Alghamdi International Arab Journal of Information Technology (IAJIT) 12 (3) , 2015 2015 Citations: 19
Within-word pronunciation variation modeling for Arabic ASRs: a direct data-driven approach D AbuZeina, W Al-Khatib, M Elshafei, H Al-Muhtaseb International Journal of Speech Technology 15 (2), 65-75 , 2012 2012 Citations: 18
Toward enhanced Arabic speech recognition using part of speech tagging D AbuZeina, W Al-Khatib, M Elshafei, H Al-Muhtaseb International Journal of Speech Technology 14 (4), 419-426 , 2011 2011 Citations: 18