Prompt Engineering in Healthcare Rajvardhan Patil, Thomas F. Heston, Vijay Bhuse Electronics Switzerland, 2024 The rapid advancements in artificial intelligence, particularly generative AI and large language models, have unlocked new possibilities for revolutionizing healthcare delivery. However, harnessing the full potential of these technologies requires effective prompt engineering—designing and optimizing input prompts to guide AI systems toward generating clinically relevant and accurate outputs. Despite the importance of prompt engineering, medical education has yet to fully incorporate comprehensive training on this critical skill, leading to a knowledge gap among medical clinicians. This article addresses this educational gap by providing an overview of generative AI prompt engineering, its potential applications in primary care medicine, and best practices for its effective implementation. The role of well-crafted prompts in eliciting accurate, relevant, and valuable responses from AI models is discussed, emphasizing the need for prompts grounded in medical knowledge and aligned with evidence-based guidelines. The article explores various applications of prompt engineering in primary care, including enhancing patient–provider communication, streamlining clinical documentation, supporting medical education, and facilitating personalized care and shared decision-making. Incorporating domain-specific knowledge, engaging in iterative refinement and validation of prompts, and addressing ethical considerations and potential biases are highlighted. Embracing prompt engineering as a core competency in medical education will be crucial for successfully adopting and implementing AI technologies in primary care, ultimately leading to improved patient outcomes and enhanced healthcare delivery.
Redefining Significance: Robustness and Percent Fragility Indices in Biomedical Research Thomas F. Heston Stats, 2024 The p-value has long been the standard for statistical significance in scientific research, but this binary approach often fails to consider the nuances of statistical power and the potential for large sample sizes to show statistical significance despite trivial treatment effects. Including a statistical fragility assessment can help overcome these limitations. One common fragility metric is the fragility index, which assesses statistical fragility by incrementally altering the outcome data in the intervention group until the statistical significance flips. The robustness index takes a different approach by maintaining the integrity of the underlying data distribution while examining changes in the p-value as the sample size changes. The percent fragility index is another useful alternative that is more precise than the fragility index and is more uniformly applied to both the intervention and control groups. Incorporating these fragility metrics into routine statistical procedures could address the reproducibility crisis and increase research efficacy. Using these fragility indices can be seen as a step toward a more mature phase of statistical reasoning, where significance is a multi-faceted and contextually informed judgment.
ChatGPT provides inconsistent risk-stratification of patients with atraumatic chest pain Thomas F. Heston, Lawrence M. Lewis Plos One, 2024 Background ChatGPT-4 is a large language model with promising healthcare applications. However, its ability to analyze complex clinical data and provide consistent results is poorly known. Compared to validated tools, this study evaluated ChatGPT-4’s risk stratification of simulated patients with acute nontraumatic chest pain. Methods Three datasets of simulated case studies were created: one based on the TIMI score variables, another on HEART score variables, and a third comprising 44 randomized variables related to non-traumatic chest pain presentations. ChatGPT-4 independently scored each dataset five times. Its risk scores were compared to calculated TIMI and HEART scores. A model trained on 44 clinical variables was evaluated for consistency. Results ChatGPT-4 showed a high correlation with TIMI and HEART scores (r = 0.898 and 0.928, respectively), but the distribution of individual risk assessments was broad. ChatGPT-4 gave a different risk 45–48% of the time for a fixed TIMI or HEART score. On the 44-variable model, a majority of the five ChatGPT-4 models agreed on a diagnosis category only 56% of the time, and risk scores were poorly correlated (r = 0.605). Conclusion While ChatGPT-4 correlates closely with established risk stratification tools regarding mean scores, its inconsistency when presented with identical patient data on separate occasions raises concerns about its reliability. The findings suggest that while large language models like ChatGPT-4 hold promise for healthcare applications, further refinement and customization are necessary, particularly in the clinical risk assessment of atraumatic chest pain patients.
Prompt Engineering in Medical Education Thomas Heston, Charya Khun International Medical Education, 2023 Artificial intelligence-powered generative language models (GLMs), such as ChatGPT, Perplexity AI, and Google Bard, have the potential to provide personalized learning, unlimited practice opportunities, and interactive engagement 24/7, with immediate feedback. However, to fully utilize GLMs, properly formulated instructions are essential. Prompt engineering is a systematic approach to effectively communicating with GLMs to achieve the desired results. Well-crafted prompts yield good responses from the GLM, while poorly constructed prompts will lead to unsatisfactory responses. Besides the challenges of prompt engineering, significant concerns are associated with using GLMs in medical education, including ensuring accuracy, mitigating bias, maintaining privacy, and avoiding excessive reliance on technology. Future directions involve developing more sophisticated prompt engineering techniques, integrating GLMs with other technologies, creating personalized learning pathways, and researching the effectiveness of GLMs in medical education.
Blockchain Audit Trails Resolve the Electronic Health Record Traceability Problem Created by Generative AI TF Heston Internet Medical Journal 1 (1), e20337356-e20337356 , 2026 2026
Specification Perturbation Measures Fragility, Not Robustness: Preserving the p–fr–nb Distinction TF Heston Internet Medical Journal 1 (1), e20149369-e20149369 , 2026 2026
Statistical Fragility of Saline Nasal Irrigation for Rhinosinusitis Is Incomplete Without Robustness Assessment TF Heston Internet Medical Journal 1 (1), e20074314-e20074314 , 2026 2026
Nonattainability of the Fragility Index TF Heston Cureus 18 (5) , 2026 2026
Reverse Fragility in Cochrane Meta-Analyses with P Values 0.05 to 0.20 Requires a Robustness Dimension TF Heston Internet Medical Journal 1 (1), e19741629-e19741629 , 2026 2026
Reverse Continuous Fragility Index Misleads in Biceps Tenotomy Versus Tenodesis Trials: The Case for a Continuous Fragility Score and Quotient TF Heston Internet Medical Journal 1 (1), e19720537-e19720537 , 2026 2026
Guideline evidence audits require robustness beyond the fragility index TF Heston Internet Medical Journal 1 (1) , 2026 2026
AI and the Soul of Medicine TF Heston Internet Medical Journal 1 (1), e1954401-e1954401 , 2026 2026
Accountable clinical AI requires more than accuracy TF Heston Internet Medical Journal 1 (1), e19519377-e19519377 , 2026 2026 Citations: 1
Beyond fragility: what the fragility index cannot measure TF Heston Internet Medical Journal 1 (1), e19465222-e19465222 , 2026 2026
Bidirectional fragility is a step forward but not far enough: the case for a global fragility index TF Heston Internet Medical Journal 1 (1), e19464166-e19464166 , 2026 2026 Citations: 2
The Continuous Fragility Quotient as a Model-Free Assessment for Continuous Outcomes TF Heston Internet Medical Journal 1 (1), e19445024-e19445024 , 2026 2026
Mathematical incompleteness of the fragility index TF Heston 2026 Citations: 1
Fragility of Assumptions TF Heston Orthopaedic Journal of Sports Medicine 14 (2), 23259671251409150 , 2026 2026
Moving Beyond Sorry: The Acknowledge-Repair-Prevent (ARP) Framework for Colleague Apologies in Medicine TF Heston Cureus 18 (1) , 2026 2026
THE HERMENEUTICS OF PROMPT ENGINEERING: INTERPRETATION, MEANING-MAKING, AND DECISION QUALITY IN LARGE LANGUAGE MODEL INTERACTIONS M White, L Giray, G Marvin, N Knoth, J White, TF Heston 2026
Neutrality Boundary Robustness for Meta-Analyses TF Heston 2026
Evidence-Based Frameworks for Generative Artificial Intelligence TF Heston Int J Blockchain Technol Appl 4, 34-38 , 2026 2026 Citations: 1
Significance, Fragility, and Robustness in Clinical Trials: Stratifying Statistical Evidence TF Heston Cureus 17 (12) , 2025 2025 Citations: 7
MOST CITED SCHOLAR PUBLICATIONS
Prompt Engineering in Medical Education TF Heston, C Khun International Medical Education 2 (3), 198-205 , 2023 2023 Citations: 337
Barriers to Development of Telemedicine in Developing Countries S Bali, TF Heston (ed) Telehealth, 29-42 , 2019 2019 Citations: 157
Safety of large language models in addressing depression TF Heston Cureus 15 (12), e50729 , 2023 2023 Citations: 117
A case study in blockchain health care innovation TF Heston International Journal of Current Research 9 (11), 60587-60588 , 2017 2017 Citations: 113
SNM practice guideline for breast scintigraphy with breast-specific γ-cameras 1.0 SJ Goldsmith, W Parsons, MJ Guiberteau, LH Stern, L Lanzkowsky, ... Journal of Nuclear Medicine Technology 38 (4), 219-224 , 2010 2010 Citations: 108
Gender bias in the evaluation and management of acute nontraumatic chest pain TF Heston, LM Lewis Family Practice Research Journal 12 (4), 383-389 , 1992 1992 Citations: 108
Prompt engineering in healthcare R Patil, TF Heston, V Bhuse Electronics 13 (15), 2961 , 2024 2024 Citations: 94
Standardizing predictive values in diagnostic imaging research TF Heston Journal of Magnetic Resonance Imaging 33 (2), 505-505 , 2011 2011 Citations: 65
PET scanning M Kapoor, TF Heston, A Kasi StatPearls [Internet] , 2025 2025 Citations: 61
Quantifying transient ischemic dilation using gated SPECT TF Heston, DM Sigg Journal of Nuclear Medicine 46 (12), 1990-1996 , 2005 2005 Citations: 58
Molecular imaging in thyroid cancer TF Heston, RL Wahl Cancer Imaging 10 (1), 1 , 2010 2010 Citations: 52
Predictive power of statistical significance TF Heston, JM King World journal of methodology 7 (4), 112 , 2017 2017 Citations: 39
Cardiac risk stratification in renal transplantation using a form of artificial intelligence TF Heston, DJ Norman, JM Barry, WM Bennett, RA Wilson The American journal of cardiology 79 (4), 415-417 , 1997 1997 Citations: 37
Moral injury and the four pillars of bioethics TF Heston, JA Pahang F1000 Research 8 , 2019 2019 Citations: 35
Statistical Significance Versus Clinical Relevance: A Head-to-Head Comparison of the Fragility Index and Relative Risk Index TF Heston Cureus 15 (10), e47741 , 2023 2023 Citations: 34
Nuclear medicine in oral and maxillofacial diagnosis: a review for the practicing dental professional DA Baur, TF Heston, JI Helman The journal of contemporary dental practice 5 (1), 94-104 , 2006 2006 Citations: 33
Subchondral architecture in bones of the canine shoulder. PA Simkin, TF Heston, DJ Downey, RS Benedict, HS Choi Journal of anatomy 175, 213-227 , 1991 1991 Citations: 33
Prompt engineering for students of medicine and their teachers TF Heston arXiv preprint arXiv:2308.11628 , 2023 2023 Citations: 32
The cost of living index as a primary driver of homelessness in the United States: a cross-state analysis TF Heston Cureus 15 (10) , 2023 2023 Citations: 29