Sparse Spectral LoRA: Routed Experts for Medical VLMs

Concordia University, Montreal, Canada
MedQwen teaser figure
Overall architecture of MedQwen, including routed LoRA experts and adaptive priors initialization.

Abstract

Large vision-language models excel on general benchmarks but often lack robustness in medical imaging, where heterogeneous supervision induces cross-dataset interference and sensitivity to the training data regime. In realistic clinical workflows, data and tasks also arrive sequentially, making catastrophic forgetting a major challenge. MedQwen addresses these issues with a parameter-efficient medical VLM that combines a spectrally routed Mixture-of-Experts with a theoretically grounded scaling rule that aligns low-rank updates with a full-rank, fully fine-tuned MoE. Each expert is initialized from a distinct non-overlapping SVD segment of the pretrained weights, and a lightweight router activates only the most relevant experts for a given input. Across 23 medical datasets spanning VQA, report generation, radiology classification, and hallucination mitigation, MedQwen achieves strong performance while remaining efficient, approaching full fine-tuning on zero-shot classification with 339× fewer trainable parameters and reducing sequential forgetting to about 5%.

Method

  • Sparse spectral LoRA: partitions pretrained weights into non-overlapping spectral segments and routes inputs to the most relevant experts.
  • Adaptive priors initialization: assigns distinct SVD-based priors to different experts for expert specialization.
  • Optimization alignment: introduces residual matching and scaling to align LoRA-MoE updates with full MoE training dynamics.
  • Unified medical VLM: supports medical VQA, report generation, zero-shot classification, and hallucination mitigation.
Overview of MedQwen
MedQwen is a parameter-efficient medical VLM that uses spectrally routed LoRA experts to improve robustness across heterogeneous medical datasets and to reduce catastrophic forgetting.
Optimization alignment
Optimization with SVD-structured MoE by separately aligning each expert.

Selected Results

MedQwen reports strong performance across medical VQA, report generation, zero-shot classification, and continual learning.

Model VQA-RAD SLAKE PathVQA OMVQA Avg.
Qwen-2.5-VL 7B 61.8 / 27.2 64.7 / 36.7 60.5 / 33.4 60.8 49.3
HealthGPT-L14 74.5 / 54.5 71.9 / 56.2 75.2 / 42.1 67.2 63.1
MedQwen 78.8 / 59.6 75.3 / 59.9 84.2 / 49.1 70.6 68.2
Catastrophic forgetting figure

Catastrophic Forgetting

Sequential tuning on Harvard-FairVLMed then PathVQA shows only about a 5% drop for MedQwen, versus much larger drops for standard LoRA and MoELoRA.

Convergence and scalability

Convergence and Rank Scaling

MedQwen converges faster than LoRA-MoE baselines and narrows the gap to full fine-tuning as rank increases.

Qualitative examples

Qualitative Examples

MedQwen answers medical questions across multiple modalities and generates medical reports for chest X-ray images.

Highlights

Zero-Shot Classification

On nine radiology benchmarks, MedQwen reaches 58.83 average accuracy and achieves about 95.31% of full FT MoE performance while using 339× fewer trainable parameters.

Report Generation

MedQwen improves over prior methods on MIMIC-CXR and IU-Xray, with strong gains in F1-RadGraph, BLEU-1, ROUGE, and CheXbert.

BibTeX

@article{nejati2026medqwen,
  title   = {Sparse Spectral LoRA: Routed Experts for Medical VLMs},
  author  = {Omid Nejati Manzari and Hojat Asgariandehkordi and Taha Koleilat and Yiming Xiao and Hassan Rivaz},
  journal = {arXiv preprint},
  year    = {2026}
}