Agricultural and Biological Sciences, Artificial Intelligence, Biochemistry, Genetics and Molecular Biology, Plant Science
27
Scopus Publications
Scopus Publications
Trait Association for Flowering Time in Lentil from Global Multi-Environment Data Using GWAS and Machine Learning Shriprabha R. Upadhyaya, Hawlader A. Al-Mamun, Monica F. Danilevicz, Shameela Mohamedikbal, Mohammed Bennamoun, Jacqueline Batley, Kirstin E. Bett, David Edwards Plants, 2026 Flowering time is an important developmental stage in plants, influenced by multiple genes and environmental factors. Understanding its genetic basis and interaction with the environment facilitates the development of improved varieties adapted to different environments. Conventional Genome-Wide Association Studies (GWAS) have been widely used to associate genetic markers with heritable traits, but they do not inherently capture interactions among single nucleotide polymorphisms (SNPs) or between SNPs and the environment. Machine Learning (ML) approaches can model these interactions and improve trait prediction even in the presence of noise and missing data. In this study, multi-environment lentil (Lens culinaris Medik.) data were analysed using GWAS and two widely used ML models, Random Forest and XGBoost, to identify genetic markers associated with flowering time. Model interpretability was enhanced using Explainable AI (XAI) techniques, including SHapley Additive exPlanations. GWAS identified eight significant loci across chromosomes one, two, five and seven, with the most significant SNP located at Chr2_530433205, while ML approaches identified nine markers on chromosomes one, two, three, five and seven, with the most significant SNP at Chr7_523220088. The majority of the identified markers were linked to candidate genes for flowering, while ML also identified potential epistasis. These findings highlight ML as a powerful complementary tool to GWAS for trait association.
Application of machine learning and genomics for orphan crop improvement Tessa R. MacNish, Monica F. Danilevicz, Philipp E. Bayer, Mitchell S. Bestry, David Edwards Nature Communications, 2025 Orphan crops are important sources of nutrition in developing regions and many are tolerant to biotic and abiotic stressors; however, modern crop improvement technologies have not been widely applied to orphan crops due to the lack of resources available. There are orphan crop representatives across major crop types and the conservation of genes between these related species can be used in crop improvement. Machine learning (ML) has emerged as a promising tool for crop improvement. Transferring knowledge from major crops to orphan crops and using machine learning to improve accuracy and efficiency can be used to improve orphan crops. Machine learning has emerged as a promising tool for crop improvement. Here, the authors review transferring knowledge from major crops to orphan crops and using machine learning to improve accuracy and efficiency of orphan crops breeding.
Understanding plant phenotypes in crop breeding through explainable AI Monica F. Danilevicz, Shriprabha R. Upadhyaya, Jacqueline Batley, Mohammed Bennamoun, Philipp E. Bayer, David Edwards Plant Biotechnology Journal, 2025 SummaryMachine learning use in plant phenotyping has grown exponentially. These algorithms empowered the use of image data to measure plant traits rapidly and to predict the effect of genetic and environmental conditions on plant phenotype. However, the lack of interpretability in machine learning models has limited their usefulness in gaining insights into the underlying biological processes that drive plant phenotypes. Explainable AI (XAI) emerges to help understand the ‘why’ behind machine learning model predictions and allow researchers to investigate the most influential features that lead to prediction, classification or segmentation results. Understanding the mechanisms behind model prediction is also central to sanity‐checking models, increasing model reliability and identifying dataset biases that may limit the model's applicability across different conditions. This review introduces the concept of XAI and presents current algorithms, emphasizing their suitability for different data types or machine learning algorithms. The use of XAI to leverage trait information is highlighted, showcasing how recent studies employed model explanations to recognize the features that impact plant phenotype. Overall, this review presents a framework for using XAI to gain insights into intricate biological processes driving plant phenotypes, underscoring the significance of transparency and interpretability in machine learning.
Plant disease epidemiology in the age of artificial intelligence and machine learning Ting Xiang Neik, Aria Dolatabadian, Monica F. Danilevicz, Shriprabha R. Upadhyaya, Fangning Zhang, Jacqueline Batley, David Edwards Agriculture Communications, 2025 Crop diseases pose a major threat to global food security, causing substantial yield losses and economic damage each year. Plant disease epidemiology studies the dynamics of plant-pathogen interactions and their impact on disease outcomes, considering environmental influences at a population level. While recent advances in artificial intelligence (AI) and machine learning (ML) have introduced innovative tools for disease prediction and management, most applications have focused on plant disease detection, classification and prediction using imaging technologies and sensor-based data. However, their use in plant disease epidemiology, particularly in understanding host-pathogen interactions and the ecology and evolution of the pathosystems remains limited due to the complexity of multi-scale interactions. In this review, we first propose an updated ‘disease pyramid’ plant disease epidemiology model, incorporating ecological and evolutionary components into the traditional ‘disease triangle’ model. Following this, we discuss current ML applications in plant disease epidemiology, further highlighting both challenges and opportunities. We offer insights into potential input datasets that could significantly enhance the predictability and accuracy of ML models, while also outlining future directions for this rapidly evolving field. The aim of this review is to draw the reader’s attention to the knowledge gap in the application of ML in plant disease epidemiology and showcase the vast potential for expanding the scope of more in-depth and comprehensive research in this field in the future.
Exploring genomic feature selection: A comparative analysis of GWAS and machine learning algorithms in a large-scale soybean dataset Hawlader A. Al‐Mamun, Monica F. Danilevicz, Jacob I. Marsh, Cedric Gondro, David Edwards Plant Genome, 2025 The surge in high‐throughput technologies has empowered the acquisition of vast genomic datasets, prompting the search for genetic markers and biomarkers relevant to complex traits. However, grappling with the inherent complexities of high dimensionality and sparsity within these datasets poses formidable hurdles. The immense number of features and their potential redundancy demand efficient strategies for extracting pertinent information and identifying significant markers. Feature selection is important in large genomic data as it helps in enhancing interpretability and computational efficiency. This study focuses on addressing these challenges through a comprehensive investigation into genomic feature selection methodologies, employing a rich soybean ( Glycine max L. Merr.) dataset comprising 966 lines with over 5.5 million single nucleotide polymorphisms. Emphasizing the “ small n large p ” dilemma prevalent in contemporary genomic studies, we compared the efficacy of traditional genome‐wide association studies (GWAS) with two prominent machine learning tools, random forest and extreme gradient boosting, in pinpointing predictive features. Utilizing the expansive soybean dataset, we assessed the performance of these methodologies in selecting features that optimize predictive modeling for various phenotypes. By constructing predictive models based on the selected features, we ascertain the comparative prediction accuracies, thereby illuminating the strengths and limitations of these feature selection methodologies in the realm of genomic data analysis.
Global genotype by environment prediction competition reveals that diverse modeling strategies can deliver satisfactory maize yield estimates Jacob D Washburn, José Ignacio Varela, Alencar Xavier, Qiuyue Chen, David Ertl, Joseph L Gage, James B Holland, Dayane Cristina Lima, Maria Cinta Romay, Marco Lopez-Cruz, Gustavo de los Campos, Wesley Barber, Cristiano Zimmer, Ignacio Trucillo Silva, Fabiani Rocha, Renaud Rincent, Baber Ali, Haixiao Hu, Daniel E Runcie, Kirill Gusev, Andrei Slabodkin, Phillip Bax, Julie Aubert, Hugo Gangloff, Tristan Mary-Huard, Theodore Vanrenterghem, Carles Quesada-Traver, Steven Yates, Daniel Ariza-Suárez, Argeo Ulrich, Michele Wyler, Daniel R Kick, Emily S Bellis, Jason L Causey, Emilio Soriano Chavez, Yixing Wang, Ved Piyush, Gayara D Fernando, Robert K Hu, Rachit Kumar, Annan J Timon, Rasika Venkatesh, Kenia Segura Abá, Huan Chen, Thilanka Ranaweera, Shin-Han Shiu, Peiran Wang, Max J Gordon, B Kirtley Amos, Sebastiano Busato, Daniel Perondi, Abhishek Gogna, Dennis Psaroudakis, Chun-Peng James Chen, Hawlader A Al-Mamun, Monica F Danilevicz, Shriprabha R Upadhyaya, David Edwards, Natalia de Leon Genetics, 2025 Predicting phenotypes from a combination of genetic and environmental factors is a grand challenge of modern biology. Slight improvements in this area have the potential to save lives, improve food and fuel security, permit better care of the planet, and create other positive outcomes. In 2022 and 2023, the first open-to-the-public Genomes to Fields initiative Genotype by Environment prediction competition was held using a large dataset including genomic variation, phenotype and weather measurements, and field management notes gathered by the project over 9 years. The competition attracted registrants from around the world with representation from academic, government, industry, and nonprofit institutions as well as unaffiliated. These participants came from diverse disciplines, including plant science, animal science, breeding, statistics, computational biology, and others. Some participants had no formal genetics or plant-related training, and some were just beginning their graduate education. The teams applied varied methods and strategies, providing a wealth of modeling knowledge based on a common dataset. The winner's strategy involved 2 models combining machine learning and traditional breeding tools: 1 model emphasized environment using features extracted by random forest, ridge regression, and least squares, and 1 focused on genetics. Other high-performing teams’ methods included quantitative genetics, machine learning/deep learning, mechanistic models, and model ensembles. The dataset factors used, such as genetics, weather, and management data, were also diverse, demonstrating that no single model or strategy is far superior to all others within the context of this competition.
Image-based crop disease detection using machine learning Aria Dolatabadian, Ting Xiang Neik, Monica F. Danilevicz, Shriprabha R. Upadhyaya, Jacqueline Batley, David Edwards Plant Pathology, 2025 Crop disease detection is important due to its significant impact on agricultural productivity and global food security. Traditional disease detection methods often rely on labour‐intensive field surveys and manual inspection, which are time‐consuming and prone to human error. In recent years, the advent of imaging technologies coupled with machine learning (ML) algorithms has offered a promising solution to this problem, enabling rapid and accurate identification of crop diseases. Previous studies have demonstrated the potential of image‐based techniques in detecting various crop diseases, showcasing their ability to capture subtle visual cues indicative of pathogen infection or physiological stress. However, the field is rapidly evolving, with advancements in sensor technology, data analytics and artificial intelligence (AI) algorithms continually expanding the capabilities of these systems. This review paper consolidates the existing literature on image‐based crop disease detection using ML, providing a comprehensive overview of cutting‐edge techniques and methodologies. Synthesizing findings from diverse studies offers insights into the effectiveness of different imaging platforms, contextual data integration and the applicability of ML algorithms across various crop types and environmental conditions. The importance of this review lies in its ability to bridge the gap between research and practice, offering valuable guidance to researchers and agricultural practitioners.
Genomics-based plant disease resistance prediction using machine learning Shriprabha R. Upadhyaya, Monica F. Danilevicz, Aria Dolatabadian, Ting Xiang Neik, Fangning Zhang, Hawlader A. Al‐Mamun, Mohammed Bennamoun, Jacqueline Batley, David Edwards Plant Pathology, 2024 Plant disease outbreaks continuously challenge food security and sustainability. Traditional chemical methods used to treat diseases have environmental and health concerns, raising the need to enhance inherent plant disease resistance mechanisms. Traits, including disease resistance, can be linked to specific loci in the genome and identifying these markers facilitates targeted breeding approaches. Several methods, including genome‐wide association studies and genomic selection, have been used to identify important markers and select varieties with desirable traits. However, these traditional approaches may not fully capture the non‐linear characteristics of the effect of genomic variation on traits. Machine learning, known for its data‐mining abilities, offers an opportunity to enhance the accuracy of the existing trait association approaches. It has found applications in predicting various agronomic traits across several species. However, its use in disease resistance prediction remains limited. This review highlights the potential of machine learning as a complementary tool for predicting the genetic loci contributing to pathogen resistance. We provide an overview of traditional trait prediction methods, summarize machine‐learning applications, and address the challenges and opportunities associated with machine learning‐based crop disease resistance prediction.
Local haplotyping reveals insights into the genetic control of flowering time variation in wild and domesticated soybean Shameela Mohamedikbal, Hawlader A. Al‐Mamun, Jacob I. Marsh, Shriprabha Upadhyaya, Monica F. Danilevicz, Henry T. Nguyen, Babu Valliyodan, Adam Mahan, Jacqueline Batley, David Edwards Plant Genome, 2024 The timing of flowering in soybean [ Glycine max (L.) Merr.], a key legume crop, is influenced by many factors, including daylight length or photoperiodic sensitivity, that affect crop yield, productivity, and geographical adaptation. Despite its importance, a comprehensive understanding of the local linkage landscape and allelic diversity within regions of the genome influencing flowering and contributing to phenotypic variation in subpopulations has been limited. This study addresses these gaps by conducting an in‐depth trait association and linkage analysis coupled with local haplotyping using advanced bioinformatics tools, including crosshap , to characterize genomic variation using a pangenome dataset representing 915 domesticated and wild‐type individuals. The association analysis identified eight significant loci on seven chromosomes. Moving beyond traditional association analysis, local haplotyping of targeted regions on chromosomes 6 and 20 identified distinct haplotype structures, variation patterns, and genomic candidates influencing flowering in subpopulations. These results suggest the action of a network of genomic candidates influencing flowering time and an untapped reservoir of genomic variation for this trait in wild germplasm. Notably, GlymaLee.20G147200 on chromosome 20 was identified as a candidate gene that may cause delayed flowering in soybean, potentially through histone modifications of floral repressor loci as seen in Arabidopsis thaliana (L.) Heynh. These findings support future functional validation of haplotype‐based alleles for marker‐assisted breeding and genomic selection to enhance latitude adaptability of soybean without compromising yield.
DNABERT-based explainable lncRNA identification in plant genome assemblies Monica F. Danilevicz, Mitchell Gill, Cassandria G. Tay Fernandez, Jakob Petereit, Shriprabha R. Upadhyaya, Jacqueline Batley, Mohammed Bennamoun, David Edwards, Philipp E. Bayer Computational and Structural Biotechnology Journal, 2023
Pangenomes as a Resource to Accelerate Breeding of Under-Utilised Crop Species Cassandria Geraldine Tay Fernandez, Benjamin John Nestor, Monica Furaste Danilevicz, Mitchell Gill, Jakob Petereit, Philipp Emanuel Bayer, Patrick Michael Finnegan, Jacqueline Batley, David Edwards International Journal of Molecular Sciences, 2022
Expanding Gene-Editing Potential in Crop Improvement with Pangenomes Cassandria G. Tay Fernandez, Benjamin J. Nestor, Monica F. Danilevicz, Jacob I. Marsh, Jakob Petereit, Philipp E. Bayer, Jacqueline Batley, David Edwards International Journal of Molecular Sciences, 2022