Rare diseases identified via chest radiography—such as spontaneous pneumothorax, solitary pulmonary nodules, pleural effusions, and cardiomegaly—occur far less frequently than common conditions like pneumonia or chronic obstructive pulmonary disease. Deep learning models require large, balanced datasets for reliable performance, yet rare pathologies remain underrepresented in clinical repositories, limiting real-world deployability. Conventional augmentation methods (geometric and intensity transformations, elastic deformations) add limited variability without creating new pathological patterns. GAN-based approaches can generate synthetic images but often suffer from mode collapse and unrealistic artifacts that reduce lesion fidelity, restricting their effectiveness for rare disease augmentation. We propose a framework based on denoising diffusion probabilistic models (DDPMs) for conditional synthesis of high-fidelity chest X-ray images. The model supports generation conditioned on class labels, segmentation masks, or text prompts, enabling controlled synthesis of rare pathologies and improving dataset balance. The framework includes a forward diffusion process, a U-Net-based reverse denoising model with attention, a multi-modal conditioning mechanism, a lesion-preserving loss function, and an augmentation pipeline combining real and synthetic data. This allows control over lesion type, location, size, and severity, reducing class imbalance and improving classifier performance on rare diseases, as validated through AUC improvements and radiologist assessment. Overall, diffusion-based models provide a scalable and clinically relevant solution for rare disease augmentation in chest radiography, overcoming key limitations of traditional and GAN-based methods and enabling effective use of datasets such as CheXpert, MIMIC-CXR, and ChestX-ray14.
Deep learning models for arrhythmia detection require large, balanced datasets to achieve clinically acceptable performance. Rare arrhythmias such as ventricular tachycardia, ventricular fibrillation, and complete heart block are severely under-represented in public ECG repositories, leading to classifiers that perform well on normal sinus rhythm but fail catastrophically on minority classes. Traditional data augmentation techniques including scaling, noise addition, and time warping cannot generate new arrhythmia morphological patterns. Real-world collection of rare arrhythmia events is impractical due to low prevalence, ethical constraints, and the need for expert annotation. We present a diffusion-based generative framework that synthesizes realistic ECG signals with controlled arrhythmia patterns. The architecture comprises a conditional denoising diffusion probabilistic model trained on a small set of labeled arrhythmia examples, enabling unlimited generation of specific arrhythmia types including atrial fibrillation, ventricular tachycardia, and premature ventricular contractions. The framework includes three core components: (1) an ECG diffusion model with a 1D U-Net denoising architecture, (2) a condition encoder that accepts arrhythmia class labels and optional morphological parameters, and (3) a downstream classifier training pipeline that leverages synthetic data to correct class imbalance. This approach generates unlimited realistic arrhythmia examples with preserved morphological features including QRS duration, QT interval, and RR interval dynamics. The generative process inherently resists membership inference attacks, providing a privacy-preserving alternative to sharing real patient ECGs. The proposed framework offers a viable pathway toward balanced, privacy-preserving ECG datasets for arrhythmia detection, requiring only a small seed set of labeled rare arrhythmia examples to generate clinically useful synthetic data.