Clinical Intelligence Research Press Clinical Intelligence Research Press

Search

Search results:
Why Most Sepsis Prediction Models Fail at the Bedside: A Position Paper on the Gap Between AUROC and Clinical Utility
Over the past five years, sepsis prediction models have reported strong retrospective performance, often exceeding AUROC 0.85–0.90 by leveraging vital signs, laboratory data, and machine learning to predict sepsis earlier than clinical recognition. However, despite these results, bedside adoption remains minimal, and external or prospective validations frequently show substantial performance decline, with clinicians still relying on traditional criteria such as qSOFA and SIRS. This position paper argues that AUROC is an insufficient and potentially misleading metric for clinical deployment, as it reflects retrospective rank discrimination rather than real-world utility, calibration, or actionable impact. High AUROC scores often conceal poor threshold selection, excessive alert burden, and clinically unacceptable alarm fatigue, while retrospective evaluations create an overly optimistic view that fails in real-time settings. We propose shifting evaluation toward clinically meaningful metrics such as net benefit, alert burden per patient-day, and number needed to alert at clinician-defined thresholds, alongside earlier incorporation of workflow requirements. Ultimately, the continued dominance of AUROC-centric evaluation represents a systemic mismatch between model development and clinical reality, limiting sepsis prediction tools from achieving meaningful impact at the bedside.
Journal of Artificial Intelligence for Healthcare Systems
Original Research | Open access | 20 January 2022 | Article: 54

Reinforcement Learning for Intravenous Fluid Resuscitation in Septic Shock: A Position Paper on Safety Constraints, Reward Design, and Clinical Oversight
Septic shock, defined as sepsis with persistent hypotension despite adequate fluid resuscitation and requiring vasopressors, has a mortality rate of 30–50% despite modern treatment. Intravenous fluids remain the cornerstone of early therapy, with guidelines recommending at least 30 mL/kg of crystalloids within the first three hours. However, both insufficient and excessive fluid administration can be harmful, making individualized, data-driven management essential. Reinforcement learning (RL) has been proposed to optimize fluid and vasopressor dosing in sepsis using retrospective ICU data. While models such as the AI Clinician suggest potential survival benefits, they often prioritize long-term outcomes like mortality and overlook short-term harms such as fluid overload and organ injury, raising safety concerns. Safety constraints and harm-aware reward design are essential in RL systems for septic shock. Pure outcome optimization is insufficient, and clinical AI must include mechanisms to prevent unsafe actions and ensure adherence to safety limits. Offline RL is vulnerable to distributional shift and unsafe extrapolation. Reward functions focused only on survival ignore acute complications, leading to unsafe policies. Human-in-the-loop oversight is necessary to maintain clinical accountability and enable intervention. RL systems should include action constraints, conservative learning with uncertainty estimation, and reward penalties for fluid overload indicators. Regulatory bodies and journals should require safety validation, and clinicians must retain override authority and transparency in decision-making. RL in septic shock management must prioritize patient safety through constraints, harm-aware rewards, and clinical oversight. Without these safeguards, deployment risks patient harm and loss of trust in clinical AI.
Journal of Artificial Intelligence for Healthcare Systems
Original Research | Open access | 20 January 2022 | Article: 59

Uncertainty Quantification for Postoperative Delirium Prediction: A Position Paper on Why Bayesian Deep Learning Matters for Elderly Surgical Patients
Postoperative delirium affects 10–60% of elderly surgical patients and is linked to longer hospital stays, cognitive decline, and increased mortality. Although machine learning models have been developed to predict this condition using perioperative data, most rely on point predictions that fail to express uncertainty, limiting their clinical reliability in high-stakes surgical decision-making. These models often report a single risk estimate without indicating whether predictions are supported by strong or sparse evidence, which can lead to overconfidence and potential patient harm in vulnerable populations with heterogeneous frailty and comorbidity profiles. We argue that Bayesian deep learning is essential for postoperative delirium prediction because it provides distributional outputs and uncertainty estimates that allow clinicians to assess prediction reliability. Incorporating uncertainty quantification can transform these models from opaque tools into clinically trustworthy decision aids. We recommend that uncertainty reporting be required in all predictive models for postoperative delirium and that regulatory and publication standards enforce the use of Bayesian approaches. Overall, replacing point estimates with distributional predictions is necessary to improve safety and clinical utility in perioperative care of elderly patients.
Journal of Artificial Intelligence for Healthcare Systems
Original Research | Open access | 20 July 2022 | Article: 62

Causal Forest Models with Double Machine Learning for Heterogeneous Treatment Effects in Antihypertensive Therapy: A Position Paper on Personalized Prescribing from Observational EHR Data
Hypertension affects about 1.4 billion adults globally and is a major modifiable risk factor for cardiovascular disease. Although several first-line antihypertensive drug classes exist, randomized controlled trials typically report only average treatment effects (ATEs), which mask important variability in individual patient responses. As a result, clinical guidelines often assume a homogeneous patient population, leading to trial-and-error prescribing, delayed blood pressure control, and avoidable adverse effects. I argue that causal forest models combined with double machine learning (DML) enable reliable estimation of heterogeneous treatment effects (HTEs) from observational electronic health record data. These methods can approximate randomized trial validity while capturing clinically meaningful variation in treatment response across patients. Compared with traditional approaches, they are computationally feasible and better suited for individualized treatment assessment. Therefore, comparative effectiveness research in hypertension should move beyond ATE-focused analyses toward routine HTE estimation using causal machine learning. This shift would support more precise, data-driven prescribing and improve patient outcomes.
Journal of Artificial Intelligence for Healthcare Systems
Original Research | Open access | 20 January 2024 | Article: 83

Explainable Graph Neural Networks Integrating Discharge Medications, Social Determinants of Health, and Prior Admissions for Heart Failure Readmission Prediction: A Position Paper
Heart failure affects over 6 million Americans, with 30-day readmission rates remaining 20–25% despite longstanding quality improvement efforts. These readmissions cost about $17 billion annually and are penalized under federal reimbursement programs, yet existing prediction models have not achieved clinically useful performance. Most current models treat patients independently and fail to capture meaningful relationships among patients with similar medication patterns, admission histories, and social circumstances. They also often exclude critical social determinants of health (SDOH), such as housing instability and food insecurity, despite their strong association with readmission risk. In addition, black-box models lack interpretability, limiting clinician trust and usability. I argue that explainable graph neural networks (GNNs) integrating clinical data, SDOH, and prior admissions should replace traditional logistic regression and tree-based models for readmission prediction. Patient similarity graphs can represent clinically relevant relationships that tabular models miss, while graph attention mechanisms provide interpretable, actionable explanations. GNNs enable direct integration of SDOH and prior utilization patterns and offer transparency by highlighting which similar patients most influence predictions. This makes them more suitable for clinical decision support than existing approaches. Overall, persistent readmission rates reflect limitations in current modeling strategies. Explainable GNNs provide a more clinically meaningful and policy-relevant approach to improving prediction and reducing preventable readmissions.
Journal of Artificial Intelligence for Healthcare Systems
Original Research | Open access | 20 July 2024 | Article: 88
Filters
Clear All

Subject
AI-driven Diagnostics Artificial Intelligence in Health Informatics Artificial Intelligence in Healthcare Big Data in Healthcare Clinical Data Mining Clinical Decision Support Systems Clinical Informatics Computer Vision Connected Health Systems Deep Learning Digital Health Digital Healthcare Innovation Digital Transformation in Healthcare Electronic Health Records Ethical AI in Healthcare Explainable AI Health Data Analytics Health Data Privacy Health Informatics Health Information Management Health Information Systems Health System Optimization Health Technology Assessment Healthcare Data Science Healthcare Informatics Healthcare Information Security Healthcare Management Healthcare Management Information Systems Intelligent Medical Systems Internet of Medical Things (IoMT) Interoperability in Healthcare Systems Machine Learning Medical Data Analytics Medical Data Management Medical Imaging Mobile Health (mHealth) Natural Language Processing Precision Medicine Predictive Analytics Remote Patient Monitoring Smart Healthcare Systems Telemedicine Wearable Health Technologies e-Health




Access type