Search results:

A Reinforcement-Governed Treatment Policy Architecture for Clinical Workflow Integration

The integration of artificial intelligence into clinical workflows demands architectures that dynamically adapt treatment policies to real-time patient data while ensuring seamless interoperability with existing healthcare systems. This conceptual manuscript proposes a novel reinforcement-governed treatment policy architecture (RGTPA) designed to orchestrate adaptive decision-making in clinical environments. Drawing from reinforcement learning principles, the RGTPA embeds policy optimization mechanisms within electronic health record (EHR) ecosystems, facilitating continuous feedback loops that refine treatment recommendations without empirical training. The architecture comprises layered components for state representation, reward modeling, and policy governance, emphasizing interoperability standards like HL7 FHIR for data exchange. Theoretical analysis highlights how reinforcement signals mitigate decision latency in high-stakes settings such as intensive care, while governance modules monitor for policy drift. By synthesizing literature on clinical AI systems and decision support pipelines, this work outlines infrastructural pathways for embedding RGTPA into workflows, addressing challenges in human-AI collaboration and regulatory compliance. Conceptual formulas illustrate risk propagation and governance load, providing interpretive tools for system designers. Ultimately, RGTPA advances theoretical frameworks for AI-driven healthcare, promoting resilient, adaptive treatment policies that align with clinical imperatives.

Journal of Artificial Intelligence for Healthcare Systems

Original Research | Open access | 20 January 2023 | Article: 12

Reinforcement Learning for Intravenous Fluid Resuscitation in Septic Shock: A Position Paper on Safety Constraints, Reward Design, and Clinical Oversight

Septic shock, defined as sepsis with persistent hypotension despite adequate fluid resuscitation and requiring vasopressors, has a mortality rate of 30–50% despite modern treatment. Intravenous fluids remain the cornerstone of early therapy, with guidelines recommending at least 30 mL/kg of crystalloids within the first three hours. However, both insufficient and excessive fluid administration can be harmful, making individualized, data-driven management essential. Reinforcement learning (RL) has been proposed to optimize fluid and vasopressor dosing in sepsis using retrospective ICU data. While models such as the AI Clinician suggest potential survival benefits, they often prioritize long-term outcomes like mortality and overlook short-term harms such as fluid overload and organ injury, raising safety concerns. Safety constraints and harm-aware reward design are essential in RL systems for septic shock. Pure outcome optimization is insufficient, and clinical AI must include mechanisms to prevent unsafe actions and ensure adherence to safety limits. Offline RL is vulnerable to distributional shift and unsafe extrapolation. Reward functions focused only on survival ignore acute complications, leading to unsafe policies. Human-in-the-loop oversight is necessary to maintain clinical accountability and enable intervention. RL systems should include action constraints, conservative learning with uncertainty estimation, and reward penalties for fluid overload indicators. Regulatory bodies and journals should require safety validation, and clinicians must retain override authority and transparency in decision-making. RL in septic shock management must prioritize patient safety through constraints, harm-aware rewards, and clinical oversight. Without these safeguards, deployment risks patient harm and loss of trust in clinical AI.

Journal of Artificial Intelligence for Healthcare Systems

Original Research | Open access | 20 January 2022 | Article: 59

Deep Reinforcement Learning for Personalized Adaptive Radiation Therapy Planning in Head and Neck Cancer Using Daily Cone-Beam CT and Dosimetric Constraints

Head and neck cancer radiotherapy requires highly precise dose delivery to ensure tumor control while sparing nearby critical structures, but daily anatomical changes such as tumor shrinkage, weight loss, and setup variability often degrade treatment accuracy. Although cone-beam CT provides valuable daily imaging, current adaptive radiotherapy workflows remain largely manual, time-consuming, and infrequent, limiting their ability to respond to ongoing anatomical changes and often resulting in suboptimal target coverage or increased toxicity risk. To address these limitations, we propose a deep reinforcement learning framework for fully automated daily treatment adaptation using cone-beam CT and dosimetric constraints. The problem is formulated as a sequential decision-making task in which an agent adjusts beam parameters based on evolving patient anatomy, cumulative dose, and constraint satisfaction. The state includes daily imaging and dose history, the action space involves fluence or multileaf collimator adjustments, and the reward function balances target coverage, organ-at-risk sparing, and plan stability. A patient-specific simulator based on historical imaging enables training without real-time patient interaction. This framework enables continuous, personalized, and automated plan adaptation that directly responds to anatomical changes while maintaining clinical safety constraints. By leveraging long-horizon optimization, the system can outperform static planning strategies and better manage stochastic anatomical variations in head and neck cancer treatment. Overall, this approach provides a foundation for closed-loop adaptive radiotherapy that could improve treatment accuracy, reduce toxicity, and reduce reliance on manual planning.

Journal of Artificial Intelligence for Healthcare Systems

Original Research | Open access | 20 January 2024 | Article: 76

From Protocols to Preferences: Why Reinforcement Learning from Human Feedback Must Replace Fixed Weaning Protocols for Prolonged Mechanical Ventilation

Prolonged mechanical ventilation (PMV), affecting 5–15% of ICU patients, is associated with high mortality (30–50%), long-term disability, and substantial healthcare costs exceeding $100,000 per admission. These patients often require extended respiratory support beyond 14–21 days and consume significant ICU resources. Current weaning strategies rely on fixed spontaneous breathing trial (SBT) criteria (e.g., RSBI thresholds, oxygenation, respiratory rate), which fail to account for the heterogeneous and evolving physiology of PMV patients. This reduces weaning to discrete events rather than a continuous adaptive process. We propose reinforcement learning from human feedback (RLHF) as a superior framework for weaning, enabling AI systems to learn sequential decision-making policies from clinician preferences across patient trajectories. Traditional protocols ignore temporal dependencies such as prior SBT outcomes, sedation exposure, and respiratory muscle trends. While standard reinforcement learning supports sequential optimization, it depends on difficult-to-define reward functions. RLHF overcomes this by learning reward signals directly from clinician comparisons, aligning model behavior with real-world clinical judgment. Research should shift toward RLHF-based dynamic weaning policies rather than static prediction models. Clinical stakeholders should support data collection and prospective evaluation of RLHF-guided weaning versus standard protocols. RLHF offers a necessary advancement for personalized PMV weaning, addressing limitations of rigid protocols and improving alignment with clinical decision-making.

Journal of Artificial Intelligence for Healthcare Systems

Original Research | Open access | 20 July 2024 | Article: 90

Federated Reinforcement Learning for Coordinated Bed Allocation and Nurse Staffing During Pandemic Surges

Pandemic surges can rapidly overwhelm hospital capacity, where shortages of beds and nurse fatigue contribute directly to increased excess mortality, making coordinated decision-making across emergency departments, intensive care units, and general wards essential yet difficult to achieve under centralized control systems. Centralized approaches to bed allocation and nurse staffing optimization are limited because each hospital unit holds critical local information—such as real-time patient acuity, staff availability, and infection control status—that cannot be easily shared due to privacy constraints and communication delays during crisis conditions. To address these challenges, we propose a federated multi-agent reinforcement learning framework that enables coordinated decision-making for bed distribution and nurse staffing across hospital units without requiring centralization of sensitive clinical or workforce data. The system consists of local reinforcement learning agents deployed in each unit that participate in federated aggregation, a coordination mechanism that aligns inter-unit policies, and a surge detection module that dynamically switches operational strategies during pandemic escalation periods. This distributed architecture maintains data privacy while supporting adaptive, system-wide coordination under surge conditions, overcoming the limitations of both centralized optimization models and rule-based heuristic approaches.

Journal of Artificial Intelligence for Healthcare Systems

Original Research | Open access | 20 July 2025 | Article: 104

Deep Reinforcement Learning with Safety Shielding for Personalized Anticoagulation Management in Atrial Fibrillation Patients at High Bleeding Risk Using INR Measurements

Atrial fibrillation affects over 30 million people worldwide and requires long-term anticoagulation, with warfarin still widely used due to its efficacy and reversibility, but its narrow therapeutic window (INR 2.0–3.0) makes dosing particularly challenging, especially in high bleeding-risk patients where both under- and over-anticoagulation can lead to serious complications. Conventional dosing approaches rely on population-based nomograms and clinician judgment, failing to capture individual variability driven by genetics, diet, comorbidities, and drug interactions. To address this limitation, this article proposes a conceptual framework that integrates deep reinforcement learning with a safety-shield mechanism for personalized warfarin dosing. The system uses a deep Q-network trained on historical patient trajectories within an offline Markov Decision Process to recommend dose adjustments based on INR history and clinical risk factors, while a deterministic rule-based safety layer blocks unsafe actions, such as dose increases when INR exceeds 3.5 or extreme adjustments requiring clinician review. Conservative offline reinforcement learning further reduces the risk of unsafe policy extrapolation by limiting overestimation of out-of-distribution actions. Together, this hybrid architecture aims to improve time in therapeutic range while minimizing bleeding risk, providing a structured and clinically constrained approach for safer, individualized anticoagulation management in high-risk atrial fibrillation patients.

Journal of Artificial Intelligence for Healthcare Systems

Original Research | Open access | 20 July 2025 | Article: 111

Hierarchical Reinforcement Learning Framework for Personalized Perioperative Antibiotic Prophylaxis Timing and Intraoperative Redosing

Surgical site infections (SSIs) remain a significant source of postoperative morbidity despite established guidelines for perioperative antibiotic prophylaxis. Current protocols emphasize fixed preoperative timing and interval-based intraoperative redosing, yet fail to account for patient heterogeneity, pharmacokinetic variability, and uncertainty in procedure duration. This study proposes a hierarchical reinforcement learning (HRL) framework for personalized optimization of antibiotic prophylaxis across the perioperative timeline. The framework decomposes decision-making into two coordinated levels: a high-level policy that determines optimal preoperative antibiotic timing based on predicted procedure duration and patient-specific infection risk, and a low-level policy that adaptively manages intraoperative redosing using real-time updates on elapsed time, remaining duration, and cumulative drug exposure. Procedure duration is estimated using machine learning models that provide both point predictions and uncertainty intervals, enabling risk-sensitive decision-making. The problem is formalized as a Markov decision process with a reward structure balancing SSI prevention against antibiotic stewardship, incorporating penalties for unnecessary dosing and suboptimal timing. Off-policy evaluation using historical surgical data is proposed to assess performance relative to guideline-based and clinician-driven strategies. By integrating predictive modeling with multi-timescale decision optimization, the framework aims to reduce SSI incidence while minimizing antibiotic overuse. This approach highlights the potential of reinforcement learning to advance precision perioperative care and improve clinical outcomes through adaptive, data-driven prophylaxis strategies.

Journal of Artificial Intelligence for Healthcare Systems

Original Research | Open access | 20 January 2026 | Article: 119

Reinforcement Learning Framework for Dynamic Optimization of Extracorporeal Membrane Oxygenation Settings Using Real-Time Blood Gas, Hemodynamic, and Pump Flow Measurements

Extracorporeal membrane oxygenation (ECMO) is used to support patients with severe cardiac or respiratory failure, requiring constant manual adjustments of pump flow, sweep gas flow, and oxygen fraction. However, current ECMO management lacks a real-time optimization system tailored to individual patient needs. This manuscript proposes an offline reinforcement learning framework for dynamic ECMO optimization, utilizing real-time measurements of blood gases, hemodynamics, and pump flow. The framework includes a state encoder for various patient data, an action space for adjustments to ECMO settings, and a reward function that balances oxygenation, hemodynamic support, and complication avoidance. A safety shield filters unsafe recommendations before clinician review. The system aims to provide personalized, proactive, and safety-constrained ECMO management, with the goal of guiding future research validation rather than claiming experimental results.

Journal of Artificial Intelligence for Healthcare Systems

Original Research | Open access | 20 July 2026 | Article: 132

Deep Reinforcement Learning with Inverse Reinforcement Learning for Learning Optimal Personalized Rehabilitation Exercise Prescriptions from Physical Therapist Demonstrations

Personalized rehabilitation exercise prescriptions are essential for recovery after neurological injury, orthopedic surgery, and chronic decline. While physical therapists have valuable expertise, translating it into scalable computational systems is challenging. Standard deep reinforcement learning relies on manually defined reward functions, but in rehabilitation, clinically significant goals like movement quality, fatigue, pain, safety, motivation, and adherence are difficult to quantify. This paper introduces a framework combining inverse reinforcement learning (IRL) and deep reinforcement learning (DRL) to learn personalized rehabilitation prescriptions from therapist demonstrations. IRL would derive expert-aligned rewards, and DRL would use these to create adaptive exercise plans. The framework encompasses therapist demonstration collection, movement trajectory representation, reward inference, policy learning, safety constraints, and clinical oversight. Demonstrations would include exercise selection, progression decisions, and therapist responses to patient fatigue, pain, or adherence issues. IRL could capture implicit clinical priorities, while DRL would adjust prescriptions based on patient conditions such as fatigue, progress, and engagement. The framework aims to create scalable, personalized rehabilitation prescriptions, offering a conceptual model for future rehabilitation robotics, exergaming, and home-based digital rehabilitation systems.

Journal of Artificial Intelligence for Healthcare Systems

Original Research | Open access | 20 July 2026 | Article: 139

Filters

Clear All

Subject

AI-driven Diagnostics Artificial Intelligence in Health Informatics Artificial Intelligence in Healthcare Big Data in Healthcare Clinical Data Mining Clinical Decision Support Systems Clinical Informatics Computer Vision Connected Health Systems Deep Learning Digital Health Digital Healthcare Innovation Digital Transformation in Healthcare Electronic Health Records Ethical AI in Healthcare Explainable AI Health Data Analytics Health Data Privacy Health Informatics Health Information Management Health Information Systems Health System Optimization Health Technology Assessment Healthcare Data Science Healthcare Informatics Healthcare Information Security Healthcare Management Healthcare Management Information Systems Intelligent Medical Systems Internet of Medical Things (IoMT) Interoperability in Healthcare Systems Machine Learning Medical Data Analytics Medical Data Management Medical Imaging Mobile Health (mHealth) Natural Language Processing Precision Medicine Predictive Analytics Remote Patient Monitoring Smart Healthcare Systems Telemedicine Wearable Health Technologies e-Health

Journal

Journal of Artificial Intelligence for Healthcare Systems Journal of Health Informatics and Digital Systems

Year

2026 2025 2024 2023 2022 2021

Article type

Original Research Review Systematic Review Mini Review Meta-Analysis Case Report Case Study Clinical Trial Methods Methodology Article Data Report Dataset Paper Perspective Opinion Editorial Letter to the Editor Commentary General Commentary Policy and Practice Review Policy Brief Educational Material Hypothesis and Theory Short Communication Technical Report Research Report Cross-Sectional Study Cohort Study Case-Control Study Classification Correction Erratum Retraction Replication Study Philosophical Analysis Protocol Registered Report Brief Report Conference Paper Book Review Article

Access type

Open access