Rare diseases collectively affect over 300 million people globally, yet individual conditions are often missed due to low clinician familiarity and non-specific presenting symptoms that mimic common disorders. Supervised machine learning requires large numbers of labeled examples for training, but rare diseases have too few diagnosed cases to develop condition-specific predictive models using traditional approaches. We propose a multimodal foundation model pretrained on 10 million de-identified electronic health records (EHRs) combining clinical notes and laboratory values for zero-shot rare disease diagnosis without requiring labeled training examples. The framework comprises four components: a clinical note encoder based on a large language model, a laboratory value encoder using a time-series transformer, a multimodal fusion module with cross-attention, and a zero-shot classifier that compares patient embeddings to disease descriptions. Pretraining on large-scale EHR data enables the model to learn general medical knowledge and disease patterns, allowing diagnosis of rare conditions by recognizing manifestations even when no labeled examples of that specific disease were used for training.