Tleubayeva Arailym

Scopus Publications

Enhancing Question Answering for Low-Resource Languages: The Case of Kazakh Language
Arailym Tleubayeva, Zhansaya Makhambetova, Aigerim Mansurova, Adai Shomanov
Proceedings of the 18th IEEE ACM International Conference on Utility and Cloud Computing Ucc 2025, 2025
The Kazakh language, with its agglutinative morphology and scarce annotated data, presents challenges for accurate Question Answering (QA). This study proposes a three-stage framework to improve multilingual QA performance in low-resource settings through model adaptation, semantic representation evaluation, and retrieval optimization. First, multilingual QA models (mT5, XLM-R, mDeBERTa-v3, AYA, Kaz-RoBERTa, and KazakhBERTmulti) were fine-tuned using SentencePiece tokenization and adapter-based training. Second, five embedding models were benchmarked via Domain and QuestionType classification, with Snowflake-Arctic achieving the highest accuracy (0.87) and BGE-M3 demonstrating strong robustness. Finally, a memory-aware retrieval mechanism was implemented by integrating Safe Memory clustering with FAISS indexing, improving contextual recall (R@10 = 0.984) without loss of precision. Results show that adapted multilingual transformers consistently outperform native Kazakh models, while enhanced retrieval strengthens grounding and cross-domain robustness. The framework provides a reproducible pathway toward scalable and linguistically inclusive QA systems for low-resource languages.
A Systematic Evaluation of Large Language Models and Retrieval-Augmented Generation for the Task of Kazakh Question Answering
Aigerim Mansurova, Arailym Tleubayeva, Aliya Nugumanova, Adai Shomanov, Sadi Evren Seker
Information Switzerland, 2025
This paper presents a systematic evaluation of large language models (LLMs) and retrieval-augmented generation (RAG) approaches for question answering (QA) in the low-resource Kazakh language. We assess the performance of existing proprietary (GPT-4o, Gemini 2.5-flash) and open-source Kazakh-oriented models (KazLLM-8B, Sherkala-8B, Irbis-7B) across closed-book and RAG settings. Within a three-stage evaluation framework we benchmark retriever quality, examine LLM abilities such as knowledge-gap detection, external truth integration and context grounding, and measures gains from realistic end-to-end RAG pipelines. Our results show a clear pattern: proprietary models lead in closed-book QA, but RAG narrows the gap substantially. Under the Ideal RAG setting, KazLLM-8B improves from its closed-book baseline of 0.427 to reach answer correctness of 0.867, closely matching GPT-4o’s score of 0.869. In the end-to-end RAG setup, KazLLM-8B paired with Snowflake retriever achieved answer correctness up to 0.754, surpassing GPT-4o’s best score of 0.632. Despite improvements, RAG outcomes show an inconsistency: high retrieval metrics do not guarantee high QA system accuracy. The findings highlight the importance of retrievers and context grounding strategies in enabling open-source Kazakh models to deliver competitive QA performance in a low-resource setting.
Text Similarity Detection in Agglutinative Languages: A Case Study of Kazakh Using Hybrid N-Gram and Semantic Models
Svitlana Biloshchytska, Arailym Tleubayeva, Oleksandr Kuchanskyi, Andrii Biloshchytskyi, Yurii Andrashko, Sapar Toxanov, Aidos Mukhatayev, Saltanat Sharipova
Applied Sciences Switzerland, 2025
This study presents an advanced hybrid approach for detecting near-duplicate texts in the Kazakh language, addressing the specific challenges posed by its agglutinative morphology. The proposed method combines statistical and semantic techniques, including N-gram analysis, TF-IDF, LSH, LSA, and LDA, and is benchmarked against the bert-base-multilingual-cased model. Experiments were conducted on the purpose-built Arailym-aitu/KazakhTextDuplicates corpus, which contains over 25,000 manually modified text fragments using typical techniques, such as paraphrasing, word order changes, synonym substitution, and morphological transformations. The results show that the hybrid model achieves a precision of 1.00, a recall of 0.73, and an F1-score of 0.84, significantly outperforming traditional N-gram and TF-IDF approaches and demonstrating comparable accuracy to the BERT model while requiring substantially lower computational resources. The hybrid model proved highly effective in detecting various types of near-duplicate texts, including paraphrased and structurally modified content, making it suitable for practical applications in academic integrity verification, plagiarism detection, and intelligent text analysis. Moreover, this study highlights the potential of lightweight hybrid architectures as a practical alternative to large transformer-based models, particularly for languages with limited annotated corpora and linguistic resources. It lays the foundation for future research in cross-lingual duplicate detection and deep model adaptation for the Kazakh language.
Development and Evaluation of a Small Kazakh Language Corpus to Improve the Efficiency of Multilingual NLP Systems in Low-Resource Environments
Arailym Tleubayeva, Sultan Aubakirov, Aisultan Tabuldin, Aday Shomanov
Sist 2025 2025 IEEE 5th International Conference on Smart Information Systems and Technologies Conference Proceedings, 2025
This study tackles NLP challenges in low-resource settings by developing the Small Kazakh Language Corpus—a high-quality, annotated collection of Kazakh texts sourced from news, scientific publications, and Wikipedia. The corpus was used to fine-tune two models for masked language modeling: the multilingual XLM-RoBERTa-base and the Kazakh-specific nur-dev/roberta-kaz-large. Fine-tuning notably improved XLM-RoBERTa-base's accuracy from 48.84% to 68.85% and F1-score from 41.96% to 68.62%, while nur-dev/roberta-kaz-large showed more modest gains. These findings demonstrate the critical role of language-specific resources in enhancing multilingual NLP systems and provide a solid foundation for further research in applications such as machine translation, sentiment analysis, and question answering.
Multilingual QA-RAG: Evaluating LLMs' Contradiction Handling in English and Kazakh
Arailym Tleubayeva, Aigerim Mansurova, Sultan Aubakirov, Aisultan Tabuldin, Adai Shomanov, Zhansaya Makhambetova
Proceedings 29th IEEE Acis International Conference on Software Engineering Artificial Intelligence Networking and Parallel Distributed Computing Snpd 2025 Summer, 2025
This study investigates how large language models (LLMs) respond to factual contradictions under varying prompt styles in both English and Kazakh—a low-resource language. We evaluate the performance of DeepSeek-R1-Distill-Qwen-7B, and GPT-4o-mini on a dataset of 100 question– context pairs with deliberately introduced factual distortions using Retrieval-Augmented Generation (RAG) and Zero-shot approaches across strict, standard, and weak prompt types. Results show that DeepSeek outperforms in English language, particularly in zero-shot settings, as well as it exhibits higher variability and sensitivity to prompt strength. GPT-4o-mini demonstrates more accurate performance across Kazakh datasets. Qualitative analysis reveals distinct model behaviors, such as blind acceptance of falsehoods, expression of uncertainty, and ethical refusals. The findings highlight the challenges LLMs face in contradiction detection and emphasize the need for robust solutions to improve factual reliability.
Effective detection of breast pathology using machine learning methods
Ainur Orazayeva, Jamalbek Tussupov, Gulmira Shangytbayeva, Assem Galymova, Ulzhalgas Zhunissova, Aliya Tergeussizova, Arailym Tleubayeva, Zhanat Kenzhebayeva
International Journal of Electrical and Computer Engineering, 2024
This work is devoted to the research and development of methods for effectively identifying breast pathologies using modern machine learning technologies, such as you only look once (YOLOv8) and faster region-based convolutional neural network (R-CNN). The paper presents an analysis of existing approaches to the diagnosis of breast diseases and an assessment of their effectiveness. YOLOv8 and Faster R-CNN architectures are then applied to create pathology detection models in mammography images. The work analyzed and classified identified breast pathologies at six levels, taking into account different degrees of severity and characteristics of the diseases. This approach allows for more accurate determination of disease progression and provides additional data for more individualized treatment planning. Classification results at various levels can improve the quality of medical decisions and provide more accurate information to doctors, which in turn improves the overall efficiency of diagnosis and treatment of breast diseases. Experimental results demonstrate high accuracy and speed of image processing, providing fast and reliable detection of potential breast pathologies. The data obtained confirm the effectiveness of the use of machine learning algorithms in the field of medical diagnostics, providing prospects for the further development of automated systems for detecting breast diseases in order to improve early diagnosis and treatment efficiency.
Systemic approach to optimizing natural language processing technologies in Astana IT University's admissions process
Arailym Tleubayeva, Alina Mitroshina, Alpar Arman, Arystan Shokan, Shaikhanova Aigul
Sist 2024 2024 IEEE 4th International Conference on Smart Information Systems and Technologies Proceedings, 2024
Impelemtation of artificial intelligence (AI) has a transformative potential for various sectors, including higher education. This study is focused on an AI system's effective development and accuracy in streamlining university admissions. The focal point is the AITU Admissions Advisor, an AI solution crafted to navigate the complexities of the admissions process. The study examines the problem of operational inefficiencies and inaccuracies that plague traditional admissions methods, and it positions the AI system as a remedy by offering automation and intelligent decision-making capabilities. The essence of the findings derives from a methodical evaluation of the AITU system against conventional practices, revealing its enhanced efficiency and precision in handling admissions procedures. Distinguishing features of these results include the system's adept use of natural language processing (NLP), sophisticated machine learning models, and a dynamic feedback system that collectively elevate its performance metrics. These technological strides underscore the system's reliability and responsiveness to the nuanced needs of applicants and administrators alike. The paper concludes that for practical implementation, seamless integration with existing university infrastructures, thorough staff training, and continuous system monitoring are imperative. This study provides a blueprint for the application of AI in higher education, showcasing a system that not only meets but anticipates the demands of modern university admissions.
Machine Learning Expert System for Recognizing Emotions in text 'Umai Cloud Services'
Arailym Tleubayeva, Aigul Shaikhanova, Baurzhan Ospan, Ayan Sultan, Mariyam Abu, Nurbakyt Darmenkyzy
Sist 2023 2023 IEEE International Conference on Smart Information Systems and Technologies Proceedings, 2023
In this research, the focus is on recognizing 28 emotions in a text using the Roberta model, which is a state-of-the-art pre-trained language model that has achieved outstanding results in various natural language processing tasks. The study explores the effectiveness of the Roberta model for emotion recognition and compares it with other approaches, such as CNNs and RNNs. In addition, the research investigates the problem of toxicity detection, which involves identifying and flagging potentially harmful or offensive language in a given text. Various techniques for toxicity detection are considered, including supervised learning and deep learning methods. The study also explores the process of extracting key phrases and words from a text using machine learning algorithms. This involves applying NLP techniques such as part-of-speech tagging, named entity recognition, and text summarization. All of these methods are implemented and tested using a cloud service provided by Umai Cloud Services, a Kazakh startup company that offers machine learning and artificial intelligence solutions. The results of the study demonstrate the effectiveness of the Roberta model for emotion recognition and show promising results for toxicity detection and text summarization.

RECENT SCHOLAR PUBLICATIONS

Enhancing Question Answering for Low-Resource Languages: The Case of Kazakh Language
A Tleubayeva, Z Makhambetova, A Mansurova, A Shomanov
Proceedings of the 18th IEEE/ACM International Conference on Utility and … , 2025
2025
DETECTING DUPLICATES IN KAZAKH TEXTS: A COMPARISON OF TF-IDF, WORD AND SENTENCE EMBEDDINGS
ABNAB Nugumanovа
МЕЖДУНАРОДНЫЙ ЖУРНАЛ ИНФОРМАЦИОННЫХ И КОММУНИКАЦИОННЫХ ТЕХНОЛОГИЙ 6 (4 … , 2025
2025
A Systematic Evaluation of Large Language Models and Retrieval-Augmented Generation for the Task of Kazakh Question Answering
SE Mansurova, A., Tleubayeva, A., Nugumanova, A., Shomanov, A., & Seker
Information 16 (11), 943 , 2025
2025
Citations: 3
COMPARATIVE ANALYSIS OF EMBEDDING MODELS FOR MATCHING QUESTIONS AND CONTEXTS IN THE KAZAKH LANGUAGE
ZK K. Mazhitova, A. Tleubayeva, S. Mukhammediya, A. Tanirbergenova, A ...
Vestnik KazUTB 3 (28), 13-23 , 2025
2025
Multilingual QA-RAG: Evaluating LLMs' Contradiction Handling in English and Kazakh
A Tleubayeva, A Mansurova, S Aubakirov, A Tabuldin, A Shomanov, ...
2025 IEEE/ACIS 29th International Conference on Software Engineering … , 2025
2025
Citations: 1
Text similarity detection in agglutinative languages: A case study of Kazakh using hybrid n-gram and semantic models
S Biloshchytska, A Tleubayeva, O Kuchanskyi, A Biloshchytskyi, ...
Applied Sciences 15 (12), 6707 , 2025
2025
Citations: 7
Development and Evaluation of a Small Kazakh Language Corpus to Improve the Efficiency of Multilingual NLP Systems in Low-Resource Environments
A Tleubayeva, S Aubakirov, A Tabuldin, A Shomanov
2025 IEEE 5th International Conference on Smart Information Systems and … , 2025
2025
Citations: 2
Protege ontology in computer science
AOT K.M. Maksutova, R.S. Niyazova, A.K. Shaikhanova
Вестник Национальной инженерной академии Республики Казахстан 4 (94), 112-123 , 2024
2024
Enhancing fingerprint recognition systems: Comparative analysis of biometric authentication algorithms and techniques for improved accuracy and reliability
T Meiramkhanov, A Tleubayeva
arXiv preprint arXiv:2412.14404 , 2024
2024
Citations: 14
Интеграция искусственного интеллекта для обнаружения респираторных заболеваний в программно-аппаратный комплекс «Диагностика на дому»
А Шайханова, И Поз, Э Кусембаева, С Даулеткалиулы, А Тлеубаева
Вестник КазАТК 135 (6), 272-282 , 2024
2024
Citations: 3
Comparative analysis of multilingual QA models and their adaptation to the Kazakh language
A Tleubayeva, A Shomanov
Scientific Journal of Astana IT University, 89-97 , 2024
2024
Citations: 8
Systemic approach to optimizing natural language processing technologies in Astana IT University's admissions process
A Tleubayeva, A Mitroshina, A Arman, A Shokan, S Aigul
2024 IEEE 4th International Conference on Smart Information Systems and … , 2024
2024
Citations: 2
Effective detection of breast pathology using machine learning methods
ZK Ainur Orazayeva, Jamalbek Tussupov, Gulmira Shangytbayeva, Assem Galymova ...
International Journal of Electrical and Computer Engineering (IJECE) 14 (5 … , 2024
2024
Citations: 4
INNOVATIVE ARCHITECTURAL SOLUTIONS AND INTERDISCIPLINARY IMPLEMENTATION OF THE BULT CLOUD PLATFORM FOR WEB APPLICATION ORCHESTRATION
AK Shaikhanova, ZA Bermukhambetov, VV Kim, AO Tleubayeva
Вестник Университета Шакарима. Серия технические науки, 40-48 , 2024
2024
Machine learning expert system for recognizing emotions in text “Umai Cloud Services”
A Tleubayeva, A Shaikhanova, B Ospan, A Sultan, M Abu, N Darmenkyzy
2023 IEEE International Conference on Smart Information Systems and … , 2023
2023
Citations: 2
Удаленная диагностика–польза для узкоспециализированных врачей
АК Шайханова, И Поз, ЭА Кусембаева, АО Тлеубаева
Вестник Университета Шакарима. Серия технические науки, 5-13 , 2023
2023
A model of an autonomous smart lighting system using sensors
A Tleubayeva, A Maidanov, A Kantayeva
Scientific Journal of Astana IT University, 34-44 , 2022
2022
Citations: 2
Практика преподавания курса «Робототехника» в образовательной среде LEGO Education
СМВ Тулегулов А. Д., Ешпанов В. С., Тлеубаева А. О., Серикбай А. Т., Ержуман ...
https://phsreda.com/ru/article/97068/discussion_platformhttps://phsreda.com … , 2020
2020
Citations: 1
Математикалық модельдеу әдістерімен екі механикалық дененің соқтығысу ықтималдығын есептеу
ТАО ДЖУМАМУХАМБЕТОВ Н.Г., ТУЛЕГУЛОВ А.Д., НУРГАЛИЕВА Р.М.
Журнал «Промышленный транспорт Казахстана». 3 (68), 87-92 , 2020
2020
Освоение практических цифровых навыков в сфере информационной безопасности
АД Тулегулов, ВС Ешпанов, АО Тлеубаева, СМ Меирбекулы, ...
Рецензенты: Жданова Светлана Николаевна, д-р пед. наук, 69 , 2020
2020

MOST CITED SCHOLAR PUBLICATIONS

Enhancing fingerprint recognition systems: Comparative analysis of biometric authentication algorithms and techniques for improved accuracy and reliability
T Meiramkhanov, A Tleubayeva
arXiv preprint arXiv:2412.14404 , 2024
2024
Citations: 14
Comparative analysis of multilingual QA models and their adaptation to the Kazakh language
A Tleubayeva, A Shomanov
Scientific Journal of Astana IT University, 89-97 , 2024
2024
Citations: 8
Text similarity detection in agglutinative languages: A case study of Kazakh using hybrid n-gram and semantic models
S Biloshchytska, A Tleubayeva, O Kuchanskyi, A Biloshchytskyi, ...
Applied Sciences 15 (12), 6707 , 2025
2025
Citations: 7
Effective detection of breast pathology using machine learning methods
ZK Ainur Orazayeva, Jamalbek Tussupov, Gulmira Shangytbayeva, Assem Galymova ...
International Journal of Electrical and Computer Engineering (IJECE) 14 (5 … , 2024
2024
Citations: 4
A Systematic Evaluation of Large Language Models and Retrieval-Augmented Generation for the Task of Kazakh Question Answering
SE Mansurova, A., Tleubayeva, A., Nugumanova, A., Shomanov, A., & Seker
Information 16 (11), 943 , 2025
2025
Citations: 3
Интеграция искусственного интеллекта для обнаружения респираторных заболеваний в программно-аппаратный комплекс «Диагностика на дому»
А Шайханова, И Поз, Э Кусембаева, С Даулеткалиулы, А Тлеубаева
Вестник КазАТК 135 (6), 272-282 , 2024
2024
Citations: 3
Development and Evaluation of a Small Kazakh Language Corpus to Improve the Efficiency of Multilingual NLP Systems in Low-Resource Environments
A Tleubayeva, S Aubakirov, A Tabuldin, A Shomanov
2025 IEEE 5th International Conference on Smart Information Systems and … , 2025
2025
Citations: 2
Systemic approach to optimizing natural language processing technologies in Astana IT University's admissions process
A Tleubayeva, A Mitroshina, A Arman, A Shokan, S Aigul
2024 IEEE 4th International Conference on Smart Information Systems and … , 2024
2024
Citations: 2
Machine learning expert system for recognizing emotions in text “Umai Cloud Services”
A Tleubayeva, A Shaikhanova, B Ospan, A Sultan, M Abu, N Darmenkyzy
2023 IEEE International Conference on Smart Information Systems and … , 2023
2023
Citations: 2
A model of an autonomous smart lighting system using sensors
A Tleubayeva, A Maidanov, A Kantayeva
Scientific Journal of Astana IT University, 34-44 , 2022
2022
Citations: 2
Multilingual QA-RAG: Evaluating LLMs' Contradiction Handling in English and Kazakh
A Tleubayeva, A Mansurova, S Aubakirov, A Tabuldin, A Shomanov, ...
2025 IEEE/ACIS 29th International Conference on Software Engineering … , 2025
2025
Citations: 1
Практика преподавания курса «Робототехника» в образовательной среде LEGO Education
СМВ Тулегулов А. Д., Ешпанов В. С., Тлеубаева А. О., Серикбай А. Т., Ержуман ...
https://phsreda.com/ru/article/97068/discussion_platformhttps://phsreda.com … , 2020
2020
Citations: 1
Enhancing Question Answering for Low-Resource Languages: The Case of Kazakh Language
A Tleubayeva, Z Makhambetova, A Mansurova, A Shomanov
Proceedings of the 18th IEEE/ACM International Conference on Utility and … , 2025
2025
DETECTING DUPLICATES IN KAZAKH TEXTS: A COMPARISON OF TF-IDF, WORD AND SENTENCE EMBEDDINGS
ABNAB Nugumanovа
МЕЖДУНАРОДНЫЙ ЖУРНАЛ ИНФОРМАЦИОННЫХ И КОММУНИКАЦИОННЫХ ТЕХНОЛОГИЙ 6 (4 … , 2025
2025
COMPARATIVE ANALYSIS OF EMBEDDING MODELS FOR MATCHING QUESTIONS AND CONTEXTS IN THE KAZAKH LANGUAGE
ZK K. Mazhitova, A. Tleubayeva, S. Mukhammediya, A. Tanirbergenova, A ...
Vestnik KazUTB 3 (28), 13-23 , 2025
2025
Protege ontology in computer science
AOT K.M. Maksutova, R.S. Niyazova, A.K. Shaikhanova
Вестник Национальной инженерной академии Республики Казахстан 4 (94), 112-123 , 2024
2024
INNOVATIVE ARCHITECTURAL SOLUTIONS AND INTERDISCIPLINARY IMPLEMENTATION OF THE BULT CLOUD PLATFORM FOR WEB APPLICATION ORCHESTRATION
AK Shaikhanova, ZA Bermukhambetov, VV Kim, AO Tleubayeva
Вестник Университета Шакарима. Серия технические науки, 40-48 , 2024
2024
Удаленная диагностика–польза для узкоспециализированных врачей
АК Шайханова, И Поз, ЭА Кусембаева, АО Тлеубаева
Вестник Университета Шакарима. Серия технические науки, 5-13 , 2023
2023
Математикалық модельдеу әдістерімен екі механикалық дененің соқтығысу ықтималдығын есептеу
ТАО ДЖУМАМУХАМБЕТОВ Н.Г., ТУЛЕГУЛОВ А.Д., НУРГАЛИЕВА Р.М.
Журнал «Промышленный транспорт Казахстана». 3 (68), 87-92 , 2020
2020
Освоение практических цифровых навыков в сфере информационной безопасности
АД Тулегулов, ВС Ешпанов, АО Тлеубаева, СМ Меирбекулы, ...
Рецензенты: Жданова Светлана Николаевна, д-р пед. наук, 69 , 2020
2020

Tleubayeva Arailym

RESEARCH, TEACHING, or OTHER INTERESTS

Scopus Publications

RECENT SCHOLAR PUBLICATIONS

MOST CITED SCHOLAR PUBLICATIONS