Wearable electrocardiogram (ECG) devices such as smartwatches and ambulatory monitors generate large-scale continuous cardiac data suitable for arrhythmia detection in real-world settings. However, the development of supervised machine learning models is limited by the scarcity of expert-annotated ECG data, class imbalance due to rare arrhythmias, and privacy constraints that restrict data sharing. These challenges make it difficult for traditional deep learning approaches to scale effectively in clinical applications.This work proposes a self-supervised contrastive learning framework that leverages large volumes of unlabeled wearable ECG data to learn meaningful cardiac representations. Using ECG-specific data augmentations, the model is trained to maximize agreement between different views of the same signal while distinguishing between different segments. A deep encoder produces latent embeddings, which are optimized through a contrastive loss, and later adapted for arrhythmia classification using a lightweight classifier with minimal labeled data.The proposed approach reduces dependence on expert annotations, improves generalization across devices and populations, and supports privacy-preserving training. Overall, it offers a scalable and efficient pathway for wearable-based arrhythmia detection, potentially enabling earlier diagnosis and broader deployment of cardiac AI systems in resource-limited healthcare settings.
Long COVID (post-acute sequelae of SARS-CoV-2 infection, PASC) affects roughly 10–30% of COVID-19 survivors and is marked by persistent symptoms such as fatigue, cognitive dysfunction (“brain fog”), shortness of breath, loss of smell, and post-exertional malaise that can last for months or years, while its underlying biological mechanisms and validated diagnostic biomarkers remain unclear. The condition is highly heterogeneous, with patients showing different recovery patterns and no clearly defined clinical subtypes, and the scarcity of labeled datasets further limits the use of supervised machine learning methods for phenotyping. To address this, we propose a self-supervised contrastive multi-view learning framework that integrates three temporal data modalities—pre-infection electronic health records, acute-phase clinical and biomarker data (e.g., CRP, ferritin, D-dimer, lymphocyte counts), and post-acute symptom trajectories—using separate encoders and a shared latent space aligned through contrastive learning without requiring phenotype labels, followed by unsupervised clustering to identify potential subtypes. By exploiting the natural temporal linkage within each patient and contrasts across patients, this approach enables data-driven discovery of long COVID phenotypes, supports early prediction of subgroup membership, and may ultimately inform personalized treatment strategies, clinical trial design, and improved understanding of disease mechanisms.
Rare pediatric cancers are difficult to treat due to their very low incidence, which limits drug development and makes experimental screening of therapies slow, costly, and dependent on scarce tumor samples. Traditional supervised machine learning approaches are also constrained by the lack of labeled drug–response data, while rich but unlabeled protein–protein interaction networks remain underutilized. We propose a self-supervised graph representation learning framework that integrates protein interaction networks with patient gene expression data to support drug repurposing. The model builds a heterogeneous graph of drugs, genes, diseases, and proteins, and uses a graph neural network trained with self-supervised objectives such as contrastive learning and masked prediction to learn molecular representations without labeled data. It is then fine-tuned on small pediatric cancer datasets. The framework enables prediction of candidate drug therapies by combining learned biological network representations with disease-specific expression profiles. This approach reduces reliance on large labeled datasets and allows adaptation to rare cancer contexts, offering a scalable strategy for computational drug repurposing in pediatric oncology.