Neoadjuvant chemotherapy (NAC) is standard for locally advanced breast cancer, with pathologic complete response (pCR) strongly predicting improved survival. However, only 30–40% of patients achieve pCR, while the rest undergo toxicity and delayed surgery without benefit. Current prediction methods rely on tumor volume at isolated time points or simple pre- and post-treatment comparisons, ignoring continuous tumor dynamics during therapy. Sparse and irregular MRI sampling further limits accurate modeling. We introduce a Neural Ordinary Differential Equation (Neural ODE) framework to model continuous tumor growth from sparse serial MRI during NAC. The model learns a time-continuous function describing tumor evolution and predicts individual response trajectories and final pCR status. The framework includes (1) MRI-based tumor segmentation, (2) construction of sparse longitudinal tumor volume series, (3) Neural ODE modeling of continuous dynamics via a neural network–parameterized derivative function, and (4) classification of the final latent state for pCR prediction. An optional module enables trajectory visualization and interpretability. This approach captures hidden continuous tumor behavior between scans, handles irregular sampling without imputation, and enables earlier response prediction. It is also computationally efficient using adjoint-based training and may reveal distinct growth patterns between responders and non-responders. Neural ODE-based modeling offers a more informative framework for predicting NAC response by capturing continuous tumor dynamics, with potential to improve pCR prediction over conventional volume-based methods.
Breast cancer remains the most commonly diagnosed malignancy among women worldwide, with neoadjuvant chemotherapy administered prior to surgical resection representing a standard treatment approach for locally advanced and high-risk early-stage disease [1, 2]. The primary goal of NAC is to downstage tumors, increase rates of breast-conserving surgery, and eradicate micrometastatic disease, with pathologic complete response serving as a powerful surrogate marker for favorable long-term outcomes [3, 4]. Patients achieving pCR demonstrate significantly improved disease-free survival and overall survival compared to those with residual disease, making pCR a critical endpoint in both clinical practice and therapeutic trials [5, 6].
Despite the prognostic importance of pCR, only 30-40% of patients receiving standard NAC regimens achieve this favorable outcome, meaning that the majority undergo ineffective treatment characterized by toxicity, delayed surgery, and disease progression [7, 8]. The inability to predict which patients will respond to NAC before treatment completion represents a significant clinical gap, as early identification of non-responders could enable treatment modification, switching to alternative regimens, or earlier surgical intervention [9, 10]. Current clinical practice relies on tumor volume changes measured from serial imaging—typically MRI at baseline, mid-treatment, and post-treatment—using criteria such as RECIST, but these approaches provide only coarse, discrete assessments that fail to capture the continuous dynamics of tumor response [11, 12].
The fundamental limitation of existing methods lies in their treatment of tumor evolution as a discrete process measured at isolated time points rather than a continuous dynamical system governed by underlying biological mechanisms [1, 13]. Serial MRI during NAC typically acquires images at 2-3 time points over 6-8 weeks, creating sparse and irregularly spaced data that conventional machine learning models struggle to handle without arbitrary interpolation or imputation [14, 15]. We propose a conceptual framework that addresses these limitations by employing Neural Ordinary Differential Equations to model continuous tumor growth dynamics from sparse serial MRI measurements, enabling more accurate prediction of pathologic complete response and potentially enabling earlier treatment guidance [1, 16].
Standard NAC regimens for breast cancer typically combine anthracycline-based and taxane-based agents administered over 4-6 cycles spanning approximately 12-18 weeks, with the specific regimen chosen based on tumor subtype, stage, and biomarker profile [2, 4]. Pathologic complete response is rigorously defined as ypT0/is ypN0—no residual invasive carcinoma in the breast and no tumor involvement in axillary lymph nodes—though some definitions permit isolated tumor cells, and pCR rates vary substantially by subtype from approximately 20-30% in hormone receptor-positive/HER2-negative tumors to 50-60% in triple-negative and HER2-positive disease [5, 6]. Clinical factors associated with pCR include younger age, higher tumor grade, elevated Ki-67 proliferation index, and specific genomic signatures, but these factors lack sufficient predictive accuracy to guide individual treatment decisions [7, 9].
The clinical significance of pCR extends beyond immediate treatment outcomes, as patients achieving pCR following NAC have consistently demonstrated 5-year disease-free survival rates exceeding 85% compared to approximately 60% for those with residual disease, making pCR an accepted surrogate endpoint for regulatory approval of new neoadjuvant regimens [10, 11]. Conversely, patients with residual disease after NAC face elevated risks of recurrence and mortality, driving intense interest in developing methods to identify non-responders early enough to modify treatment during the therapeutic window [12, 13]. The inability to predict pCR before treatment completion represents a major barrier to personalized neoadjuvant therapy, as current decision-making relies on population-level averages rather than individual tumor biology [14, 15].
Dynamic contrast-enhanced magnetic resonance imaging (DCE-MRI) is the preferred imaging modality for monitoring breast tumor response during NAC due to its superior soft tissue contrast, high spatial resolution, and ability to assess tumor vascularity through contrast uptake kinetics [5, 17]. Standard clinical protocols acquire MRI scans at three time points: baseline prior to treatment initiation (week 0), mid-treatment after 2-3 cycles (approximately weeks 3-6), and post-treatment following NAC completion (week 12-18), though the exact timing varies across institutions and clinical trials [18, 19]. Tumor volume is typically measured by manual or semi-automated segmentation of enhancing lesions on post-contrast sequences, with volumes calculated by summing voxel counts multiplied by voxel dimensions, achieving inter-reader reliability coefficients of 0.85-0.95 in experienced centers [20, 21].
The Response Evaluation Criteria in Solid Tumors (RECIST) guidelines define response categories based on percentage change in tumor diameter or volume, with partial response requiring at least 30% decrease in sum of diameters and progressive disease requiring at least 20% increase [11, 22]. However, RECIST and similar volume-based criteria were designed for assessing response at a single endpoint rather than modeling the continuous trajectory of tumor evolution during treatment, and they treat all volume changes as equally informative regardless of timing or growth dynamics [12, 23]. The sparse temporal resolution of serial MRI—typically only 2-3 measurements over months of treatment—means that conventional methods cannot distinguish between different dynamic patterns that might reach the same final volume but reflect distinct biological responses, such as rapid initial shrinkage versus slow gradual decline [14, 24].
Mathematical models of tumor growth have been developed for decades, with the exponential model representing the simplest formulation where tumor volume increases at a rate proportional to current volume, yielding the equation
The Gompertz model provides an alternative formulation where the growth rate decays exponentially over time, d
Table 1 clarifies why the manuscript’s central contribution is not merely a new classifier but a shift from discrete radiologic response assessment to continuous-time dynamical inference of treatment response.
Table 1. Analytical Comparison of Discrete Response Assessment versus Continuous-Time Neural ODE Tumor Dynamics Modeling
Analytical dimension | Conventional volume-change / RECIST-style approaches | Classical parametric growth models | Proposed Neural ODE framework | Theoretical implication for pCR prediction |
Representation of tumor evolution | Treats response as discrete change between isolated scans | Assumes tumor follows a pre-specified mathematical growth law | Treats response as a learned continuous-time dynamical process | Shifts prediction from static assessment to trajectory-based inference |
Temporal information use | Primarily baseline-to-mid or baseline-to-final difference | Uses serial timing but under rigid functional constraints | Uses actual scan times directly and integrates between irregular observations | Preserves clinically meaningful timing effects that discrete summaries discard |
Capacity to model heterogeneous response shapes | Limited; different trajectories may map to identical net volume change | Moderate, but constrained to exponential, logistic, or Gompertz-like behavior | High; nonlinearity is learned from data rather than pre-imposed | Enables separation of early rapid shrinkage, delayed response, plateau, or regrowth patterns |
Handling of irregular scan timing | Weak; often depends on fixed comparison windows or simplification | Possible in principle, but unstable with sparse clinical measurements | Native continuous-time modeling allows patient-specific observation times | Better aligned with real-world NAC imaging schedules |
Dependence on interpolation or imputation | Often implicit when comparing heterogeneous schedules | May require strong assumptions for sparse fitting | Does not require interpolation before modeling | Reduces distortion introduced by arbitrary temporal preprocessing |
Biological flexibility | Low; response treated as geometric size change only | Moderate; embeds simplified biological constraints | Higher; latent dynamics can encode unobserved treatment sensitivity and resistance processes | Better suited to capturing hidden mechanisms linked to eventual pCR |
Individualization through covariates | Usually added only as parallel predictors | Limited unless explicitly embedded in parametric structure | Covariates can condition the initial state and/or derivative function | Allows subtype-specific and patient-specific response dynamics |
Usefulness for early prediction | Limited because reliable classification often requires later endpoint measurements | Limited by poor identifiability from few points | Stronger potential because partial trajectories can be extrapolated to projected treatment completion | Creates a conceptual basis for mid-treatment treatment adaptation |
Interpretability of response pattern | Simple but shallow; percentage reduction is easy to read | Moderate through named parameters like growth rate or carrying capacity | Trajectory visualization plus time-varying growth-rate analysis | Supports clinically meaningful interpretation beyond binary classification |
Main failure mode | Oversimplifies tumor biology and timing | Model misspecification when true dynamics depart from assumed form | Data hunger, latent-state opacity, and dependence on robust training/validation | Improvement in flexibility must be matched by careful evaluation and guardrails |
The proposed framework takes as input serial MRI scans acquired at baseline (week 0), mid-treatment (week 3-6), and post-treatment (week 12-18), processes these images through tumor segmentation to extract volume measurements at each time point, and then uses a Neural ODE to model the continuous trajectory of tumor volume from the sparse observations [1, 10]. The Neural ODE component learns a parameterized function f_θ that predicts the derivative of tumor volume with respect to time, enabling integration from any starting time to any target time to generate a complete growth curve that passes through or near the observed volume points [1, 13]. From this learned continuous trajectory, the framework extracts the final latent state at treatment completion and passes it through a multilayer perceptron classification head to output a probability of achieving pathologic complete response [5, 11].
Figure 1 shows the proposed framework, which conceptualizes pCR prediction as a continuous-time tumor dynamics problem in which sparse serial MRI measurements are transformed into a Neural ODE-derived latent trajectory that supports both response classification and trajectory-level interpretation.

Figure 1. Conceptual Architecture of Neural ODE-Based Continuous Tumor Dynamics Modeling for Pathologic Complete Response Prediction from Serial Breast MRI
The framework additionally incorporates clinical and pathological covariates—including patient age, tumor subtype (hormone receptor status, HER2 status), grade, and Ki-67 proliferation index—as initial condition information to modulate the learned dynamics for individual patients [7, 17]. By conditioning the Neural ODE on these covariates, the model can learn distinct growth dynamics for different tumor subtypes, potentially capturing known biological differences in chemosensitivity between triple-negative, HER2-positive, and hormone receptor-positive disease [18, 19]. The entire framework is trained end-to-end to simultaneously optimize both the accuracy of the reconstructed tumor volume trajectory and the classification of final response status [20, 21].
The framework rests on several core assumptions that must hold for valid application to clinical data, beginning with the assumption that tumor volume can be reliably and reproducibly measured from DCE-MRI using either manual segmentation by expert radiologists or automated deep learning segmentation methods [22, 23]. A second key assumption is that at least two MRI time points are available per patient (baseline and either mid-treatment or post-treatment), as the Neural ODE requires at least two observations to constrain the learned dynamics, though three or more time points provide substantially better parameter identification [1, 24]. The framework also assumes that the underlying tumor growth dynamics during NAC can be approximated by a continuous ordinary differential equation, meaning that volume changes smoothly between observation times without abrupt jumps or discontinuities not captured by the measurement schedule [25, 26].
Additional assumptions concern the identifiability of model parameters from available data, including that the number of patients in the training cohort (typically N > 100-200) is sufficient to learn the neural network parameters of f_θ without overfitting, and that the time points across patients are sufficiently aligned to enable batch training while still accommodating individual variability in scan timing [27, 28]. The framework assumes that measurement errors in tumor volume extraction are independent and approximately normally distributed, which may not hold for very small tumors or those with irregular morphology where segmentation is challenging [5, 29]. Finally, the framework assumes that the relationship between tumor volume dynamics and pCR is stable across the clinical settings and patient populations to which the model is applied, requiring external validation before clinical deployment [15, 16].
The first component of the framework extracts tumor volume measurements from serial DCE-MRI scans, requiring accurate segmentation of the enhancing tumor region from surrounding breast tissue, chest wall, and blood vessels [5, 17]. Manual segmentation by expert radiologists remains the reference standard, with typical protocols involving slice-by-slice contouring of tumor boundaries on post-contrast subtraction images, but this approach is time-consuming (requiring 15-30 minutes per scan) and subject to inter-observer variability [18, 20]. Automated deep learning segmentation methods, particularly U-Net and its variants including nnU-Net and attention U-Net, have demonstrated performance approaching or matching human experts for breast tumor segmentation on DCE-MRI, with Dice similarity coefficients of 0.80-0.90 reported in validation studies [22, 23].
Quality control procedures are essential regardless of segmentation method, including visual inspection of segmentations to identify obvious errors such as inclusion of pectoral muscle or exclusion of tumor spiculations, and calculation of quality metrics like slice-wise volume consistency [24, 25]. For automated methods, cases with low segmentation confidence or unusual tumor morphology may require manual review and correction, particularly for non-mass enhancing lesions or tumors with ill-defined borders [26, 27]. The extracted volume measurements are recorded in cubic centimeters (cc) or milliliters, with typical breast tumor volumes at baseline ranging from 1-100 cc depending on stage and detection method [5, 28].
Following segmentation, volume measurements are assembled into a time series for each patient, with time indexed from the baseline scan at week 0 and subsequent scans at their actual acquisition dates measured in days or weeks from baseline [5, 17]. The framework accommodates variable timing across patients—for example, mid-treatment scans obtained at week 3 for some patients and week 5 for others—by treating time as a continuous variable rather than requiring fixed intervals [1, 13]. Missing time points are handled naturally by the Neural ODE formulation, which does not require imputation or interpolation prior to modeling, though the presence of at least two time points per patient is necessary for parameter identification [1, 24].
The constructed time series includes not only volume measurements but also the timing of each scan relative to treatment initiation and the specific NAC regimen administered, as different chemotherapy combinations may induce different growth dynamics [14, 26]. For patients with more than three scans (e.g., additional early response assessment at week 2), all available time points are included to provide more constraints on the learned trajectory [1, 27]. The framework optionally incorporates uncertainty estimates for each volume measurement based on segmentation confidence or inter-rater variability, allowing the model to downweight noisy observations during training [28, 29].
The core of the framework is a Neural Ordinary Differential Equation that models the continuous evolution of tumor state over time, formulated as
The Neural ODE integrates from an initial time t0 to any target time t1 using a numerical ODE solver, typically a Runge-Kutta method or adaptive solver such as Dormand-Prince
The initial latent state z(0) is encoded from the baseline MRI tumor volume and available clinical covariates using a small encoder neural network that maps the observed volume at week 0 (V0) and covariate vector c to a latent representation [1, 18]. This encoding allows the framework to account for the fact that different tumors with identical baseline volumes may have different growth potentials and chemosensitivity based on their underlying biology, reflected in their initial latent state [5, 19]. The encoder can optionally incorporate additional baseline features such as tumor morphology descriptors (spiculation, margin characteristics) or radiomic texture features extracted from the baseline MRI [20, 21].
For patients with only baseline and post-treatment scans (no mid-treatment), the initial state is still encoded from baseline data, and the Neural ODE integrates forward to the post-treatment time point, with the reconstruction loss comparing predicted and observed final volume [1, 22]. The framework can also be extended to bidirectional dynamics by integrating backward from a later time point to earlier times, though this requires careful handling of temporal causality for prediction tasks [1, 23].
Given the initial latent state z(0) and the learned dynamics , the ODE solver generates a continuous trajectory z(t) for t ∈ [0, T] where T is the post-treatment time point, from which tumor volume v(t) is extracted as a linear projection of the latent state or as a component of z(t) [1, 24]. The predicted volume at each observed time point (mid-treatment , post-treatment ) is compared to the extracted MRI volumes and to compute a reconstruction loss, encouraging the learned dynamics to match the observed data [1, 25]. The continuous nature of the trajectory means the framework can predict volumes at any arbitrary time, including unobserved time points such as week 2 or week 10, enabling dense monitoring of tumor evolution from sparse measurements [5, 26].
The final latent state at treatment completion z(T) serves as the input to a classification head that predicts the probability of pathologic complete response, implemented as a multilayer perceptron with one or two hidden layers followed by a sigmoid activation function producing an output in the [0,1] range [1, 5]. This classification head is trained jointly with the Neural ODE dynamics, allowing gradients from the pCR prediction task to influence the learned representation of tumor growth dynamics [6, 7]. By integrating the prediction task into the same optimization objective as trajectory reconstruction, the framework learns dynamics that are not only faithful to observed volumes but also discriminative of eventual response status [10, 11].
The binary classification output corresponds to the two clinically relevant categories: pCR (ypT0/is ypN0) versus non-pCR (any residual invasive disease), though the framework could be extended to ordinal or multi-class outcomes such as residual cancer burden categories [3, 4]. For patients with incomplete treatment courses where post-treatment MRI is unavailable, the framework can predict pCR probability from mid-treatment data alone by integrating the Neural ODE only to the available time point and applying a modified classification head trained for early prediction [12, 13]. The classifier produces both a binary prediction (pCR vs non-pCR) and a confidence score, the latter being valuable for clinical decision-making where uncertain cases may warrant additional imaging or biopsy [14, 15].
A key clinical advantage of the framework is its ability to predict final pCR status from mid-treatment data alone, enabling potential treatment modification before NAC completion [16, 17]. Using only baseline and mid-treatment MRI scans (typically weeks 0 and 3-6), the Neural ODE integrates forward to the projected post-treatment time point T (e.g., week 18) based on the learned dynamics from the training cohort, producing an extrapolated trajectory and a predicted final latent state z(T_projected) [1, 18]. The accuracy of this early prediction depends on how well the learned dynamics generalize across patients and whether mid-treatment volume changes reliably signal eventual response, a relationship that the framework learns directly from training data [19, 20].
Early prediction could support several clinical interventions: switching non-responders to alternative chemotherapy regimens, adding targeted therapies, or proceeding directly to surgery without completing ineffective cycles [9, 10]. The framework can also be applied iteratively as additional time points become available, updating predictions as more data accrues and potentially increasing confidence before committing to treatment changes [21, 22]. Simulation studies using retrospective data would be needed to establish the optimal timing for early prediction and the confidence thresholds that justify different clinical actions [5, 23].
The Neural ODE framework offers inherent interpretability through visualization of learned tumor growth trajectories, enabling clinicians to examine how predicted volume evolves over time for individual patients and compare patterns between responders and non-responders [1, 24]. By extracting the predicted continuous volume curve v(t) for each patient, the framework reveals temporal features such as the rate of initial decline, presence of plateau phases, or late regrowth that may distinguish response phenotypes not apparent from discrete volume measurements [5, 25]. Responders typically demonstrate rapid early volume reduction within the first 2-4 weeks followed by continued decline, while non-responders may show slow reduction, stable disease, or early progression, though the specific patterns are learned from data rather than prescribed [14, 26].
Beyond trajectory visualization, the learned neural network f_θ can be analyzed to extract biologically meaningful parameters such as the instantaneous growth rate at any time point, calculated as (1/v) * dv/dt for volume-proportional dynamics [1, 27]. Comparing growth rate trajectories between response groups may reveal critical windows where divergence occurs, potentially identifying the optimal timing for response assessment [15, 28]. The framework can also generate counterfactual predictions—for example, estimating what a patient's tumor volume would have been without treatment by integrating the dynamics learned from an untreated control cohort, providing a personalized estimate of treatment effect [5, 29].
The total loss function combines three components: trajectory reconstruction loss, classification loss, and optional regularization terms, with hyperparameters , , and controlling their relative contributions [1, 5]. The reconstruction loss is typically mean squared error between predicted and observed tumor volumes at each available time point, averaged across all patients and time points
Regularization terms may include weight decay on neural network parameters to prevent overfitting, especially when training cohorts are modest in size (N < 200), and ODE-specific regularization such as penalizing rapid changes in f_θ to encourage smooth dynamics [1, 14]. For the adjoint sensitivity training method, no additional memory regularization is required as the constant-memory property already provides efficiency [1, 15]. The loss is minimized using stochastic gradient descent or Adam optimizer, with mini-batches of patients and gradients computed through the ODE solve using the adjoint method [16, 17].
A fundamental advantage of Neural ODEs is their natural handling of irregularly sampled data, as the ODE solver can integrate between any time points without requiring interpolation or fixed time grids [1, 18]. Each patient can have a different number of MRI time points (2, 3, or more) and different acquisition timings (e.g., mid-treatment at week 3 vs week 5), and the framework processes them all within the same training loop by computing reconstruction loss only at observed times [1, 19]. This flexibility is particularly valuable for clinical data where scan schedules vary due to patient logistics, protocol changes, or image quality issues requiring repeat scans [20, 21].
For patients with only two time points (baseline and post-treatment), the framework still learns meaningful dynamics because the reconstruction loss constrains the integrated trajectory to match both observed volumes, though parameter identifiability is weaker than with three or more time points [1, 22]. The adjoint method enables memory-efficient training even with long integration windows (0 to 18 weeks) and many solver steps, as memory usage does not scale with integration duration [1, 23]. Data augmentation strategies such as time-point dropout (randomly withholding some observed time points during training) can improve robustness to sparse data at test time [24, 25].
The primary evaluation metric for pCR prediction is the area under the receiver operating characteristic curve (AUROC), which measures the model's ability to discriminate between responders and non-responders across all classification thresholds and is insensitive to class imbalance [5, 11]. Secondary metrics include accuracy (proportion correct predictions), sensitivity (true positive rate for pCR detection), specificity (true negative rate), positive predictive value, and negative predictive value, all of which should be reported with confidence intervals [6, 7]. For the trajectory reconstruction task, root mean squared error (RMSE) between predicted and observed tumor volumes at mid-treatment and post-treatment time points quantifies how faithfully the Neural ODE captures observed dynamics [1, 12].
Comparison to baseline methods is essential to establish the framework's value, with appropriate comparators including: (1) RECIST-based volume change (e.g., percent reduction from baseline to mid-treatment), (2) logistic regression using baseline volume and clinical factors, (3) standard deep learning classifiers applied to volume time series without ODE structure, and (4) classical parametric growth models (exponential, logistic, Gompertz) with parameter estimation [3, 13]. Statistical comparisons of AUROC between models should use DeLong's test for paired samples, with p < 0.05 indicating significant improvement [14, 15].
Internal validation using k-fold cross-validation (typically 5 or 10 folds) is the minimum standard, ensuring that reported performance reflects generalization to unseen patients from the same population [16, 17]. Patients should be split at the patient level, not the scan level, to avoid data leakage where scans from the same patient appear in both training and validation sets [18, 19]. External validation on an independent cohort from a different institution or clinical trial is critical before clinical deployment, as performance often degrades when applied to populations with different case mixes, MRI protocols, or treatment regimens [5, 20].
Table 2 translates the manuscript’s conceptual proposal into a clinical-methodological agenda by showing which assumptions are load-bearing, how they can fail, and what validation strategy is needed before translational use.
Table 2. Assumption–Risk–Validation Matrix for Translating Neural ODE-Based pCR Prediction from Conceptual Framework to Clinical Study Design
Framework assumption or design commitment | Why it is necessary in this manuscript | Principal threat if violated | Observable manifestation of failure | Recommended validation or safeguard |
Tumor volume can be measured reproducibly from serial DCE-MRI | The model depends on volume trajectories as its primary observed signal | Segmentation noise may be mistaken for biological dynamics | Implausible trajectory oscillations, unstable predictions, poor calibration | Multi-reader quality control, automated confidence scoring, sensitivity analysis with segmentation perturbation |
At least two clinically meaningful time points are available per patient | The dynamics require observed temporal anchors | Underconstrained trajectories become weakly identifiable | Good training fit but poor external generalization and unstable extrapolation | Minimum data-availability criterion, subgroup analysis by number of scans |
Actual scan timing carries biological information and should be retained | The framework’s advantage depends on modeling irregular time directly | Temporal normalization may erase clinically informative response timing | Reduced value over simpler baseline-to-endpoint models | Preserve true acquisition times; compare against interval-normalized ablations |
Tumor response during NAC is reasonably approximable by a smooth ODE | Neural ODEs assume continuous latent evolution between observations | Abrupt biological shifts may not be represented well by smooth dynamics | Systematic underfit around sudden progression or treatment-switch effects | Examine residual structure; test controlled extensions with event-aware covariates |
Clinical covariates meaningfully modulate dynamics | Patient heterogeneity is central to individualized prediction | Covariates may add noise or encode site-specific confounding | Apparent subgroup performance differences that fail externally | Pre-specify covariate set, assess incremental value, test external portability |
Training cohort is large and diverse enough to estimate fθ | Flexible dynamics require sufficient sample support | Overfitting to institution-specific patterns or subtype imbalance | Inflated internal AUROC with marked external degradation | Nested cross-validation, external validation, calibration assessment, subtype-stratified reporting |
The learned latent state is predictive of pCR rather than only reconstructive of volume | The framework aims to link dynamics to outcome, not just fit curves | Good reconstruction may coexist with weak response discrimination | Low AUROC despite visually plausible trajectories | Multi-objective training evaluation, ablation of reconstruction-only versus joint training |
Mid-treatment dynamics are informative enough for early prediction | The clinical promise rests on actionable pre-completion inference | Early predictions may be overconfident or clinically premature | Strong final-time performance but weak week 3–6 performance | Time-specific evaluation protocol, decision-threshold analysis, net-benefit assessment |
Performance gains over conventional methods are clinically meaningful | Conceptual novelty must translate into comparative utility | Improvement may be statistically trivial or clinically irrelevant | Marginal AUROC gain without better sensitivity, specificity, or calibration | Compare with RECIST, logistic regression, and parametric growth baselines using paired tests |
Learned trajectories are interpretable enough for clinical trust | Adoption depends on more than raw predictive accuracy | Black-box dynamics may undermine clinician confidence | Correct predictions without understandable temporal rationale | Patient-level trajectory visualization, growth-rate summaries, representative responder/non-responder archetypes |
Model validity is stable across sites, scanners, and regimens | Clinical deployment requires robustness beyond a single dataset | Domain shift may alter both imaging-derived volumes and response patterns | Calibration drift, subtype-specific collapse, site-dependent errors | External multi-institution validation, scanner/regimen subgroup analysis, recalibration protocol |
Sensitivity analyses should examine how prediction performance varies with the number and timing of available MRI time points, for example comparing models trained using: (A) baseline only, (B) baseline + mid-treatment, (C) baseline + post-treatment, and (D) all three time points [1, 21]. Additional sensitivity analyses should assess robustness to segmentation errors by adding synthetic noise to extracted volumes, and to timing variability by jittering scan time stamps [22, 23]. Subgroup analyses by tumor subtype (triple-negative, HER2-positive, hormone receptor-positive/HER2-negative) are essential, as predictive performance may differ substantially across biologically distinct patient populations [4, 24].
We have presented a conceptual framework that leverages Neural Ordinary Differential Equations to model continuous tumor growth dynamics from sparse serial MRI measurements for predicting pathologic complete response to neoadjuvant chemotherapy in breast cancer. By treating tumor evolution as a continuous dynamical system rather than discrete volume measurements, the framework captures the underlying biological processes that distinguish responders from non-responders, potentially enabling earlier and more accurate prediction than conventional volume-based approaches. The framework naturally accommodates irregularly sampled clinical data, integrates clinical covariates, and provides interpretable visualizations of learned growth trajectories that could support clinical decision-making.
The key advantages of this approach include: continuous modeling of tumor dynamics between sparse observation points, graceful handling of variable scan timing and missing time points without imputation, end-to-end training that simultaneously optimizes trajectory reconstruction and response prediction, memory-efficient training via the adjoint method enabling scaling to large cohorts, and inherent interpretability through trajectory visualization and growth rate extraction. These properties address fundamental limitations of current methods that treat MRI time points as independent or rely on simple volume change calculations, ignoring the rich dynamical information contained in how tumors evolve during treatment.
Implementation of this framework on existing breast cancer NAC datasets with serial MRI—such as the I-SPY2 trial or ACRIN 6657 trial—would establish whether Neural ODE-based growth modeling improves pCR prediction compared to conventional methods. Future extensions could incorporate additional data modalities including dynamic contrast-enhanced pharmacokinetic parameters, diffusion-weighted MRI, or liquid biopsy biomarkers into the latent state representation. As computational oncology moves toward personalized treatment planning, frameworks that respect the continuous, dynamical nature of tumor response to therapy will become increasingly essential for precision medicine.
None
None
None
None
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.