Capacity of ChatGPT, Deepseek, and Gemini in predicting major potential drug interactions in adults within the Intensive Care Unit

Authors

DOI:

https://doi.org/10.30968/jhphs.2025.161.1262

Abstract

Objective: evaluate the ability of the ChatGPT v.3.5, DeepSeek v-3, and Gemini 2.0 flash to accurately predict major potential drug interactions (DIs) in critically ill patients. Methods: A list of 20 DIs was compiled from previously published literature. The Micromedex and Drugs.com databases were used as references. A specific prompt was designed to interact with the tools. The generated responses were stored for subsequent analysis by a pharmacist. Specificity, sensitivity, negative predictive value (NPV), positive predictive value (PPV), accuracy, and agreement were calculated for each tool based on the responses regarding DDI severity, which were categorized into five levels: contraindicated, major, moderate, minor, and no interaction. Additionally, the responses related to the mechanism of action and recommended management for each DDI were categorized as “adequate and accurate,” “adequate but inaccurate”, and “inadequate.” Results: When the Micromedex was used as a reference, ChatGPT performed better, achieving an accuracy rate of 75%, while DeepSeek and Gemini scored 70% and 65%, respectively. Overall, there was an improvement in the performance of all tools when Drugs.com was used as the reference, with accuracy rates of 80% for DeepSeek and 75% for both ChatGPT and Gemini. However, the agreement on the severity of DDIs between the tools and references was 0.354 (weak) for Drugs.com and 0.410 (moderate) for Micromedex. In general, two “inadequate” responses and 10 “adequate but inaccurate” responses regarding the mechanism of action and recommended management were observed when compared with Micromedex (14 DDIs analyzed), while eight “inadequate” responses and 21 “adequate but inaccurate” responses were found when compared with Drugs.com (17 DDIs analyzed). Conclusion: The tools analyzed show promise to assist healthcare professionals in predicting DDI in adults hospitalized in the intensive care unit (ICU). However, their use should be approached with caution, as they may generate incorrect/inaccurate information. Additional advancements are required to ensure their reliable application in clinical practice.

Downloads

Download data is not yet available.

Downloads

Published

2025-03-31

How to Cite

1.
LIMA TM. Capacity of ChatGPT, Deepseek, and Gemini in predicting major potential drug interactions in adults within the Intensive Care Unit. J Hosp Pharm Health Serv [Internet]. 2025Mar.31 [cited 2025Apr.7];16(1):e1262. Available from: https://jhphs.org/sbrafh/article/view/1262

Issue

Section

ORIGINAL ARTICLES

Most read articles by the same author(s)