Chest X-ray report generation is time-consuming and contributes to radiologist workload and burnout, motivating the need for AI systems that can reduce cognitive burden while preserving clinical accuracy. Although encoder-decoder models can generate reports from images, they often suffer from hallucinations, producing findings that are not present or missing real abnormalities due to lack of explicit grounding in evidence, making them unreliable for clinical use. To address this, we propose a cross-modal retrieval framework that generates reports by retrieving and assembling clinically validated sentences from existing radiology reports rather than generating text from scratch. The system uses contrastive learning to align chest X-ray image patches with report sentences in a shared embedding space, enabling retrieval of the most relevant clinical descriptions. A patch encoder extracts visual features, a sentence encoder represents report text, and a retrieval module identifies semantically matching sentences, which are then composed into a coherent final report. Because all outputs are sourced from real clinical reports, the method substantially reduces hallucinations while improving factual reliability and interpretability. This retrieval-based approach offers a scalable and safer alternative to generative models and can be evaluated on datasets such as MIMIC-CXR and CheXpert for clinical accuracy and retrieval performance.