MedQwen: Sparse Spectral LoRA for Medical VLMs

Overall architecture of MedQwen, including routed LoRA experts and adaptive priors initialization.

Abstract

Large vision-language models excel on general benchmarks but often lack robustness in medical imaging, where heterogeneous supervision induces cross-dataset interference and sensitivity to the training data regime. In realistic clinical workflows, data and tasks also arrive sequentially, making catastrophic forgetting a major challenge. MedQwen addresses these issues with a parameter-efficient medical VLM that combines a spectrally routed Mixture-of-Experts with a theoretically grounded scaling rule that aligns low-rank updates with a full-rank, fully fine-tuned MoE. Each expert is initialized from a distinct non-overlapping SVD segment of the pretrained weights, and a lightweight router activates only the most relevant experts for a given input. Across 23 medical datasets spanning VQA, report generation, radiology classification, and hallucination mitigation, MedQwen achieves strong performance while remaining efficient, approaching full fine-tuning on zero-shot classification with 339× fewer trainable parameters and reducing sequential forgetting to about 5%.

Method

Sparse spectral LoRA: partitions pretrained weights into non-overlapping spectral segments and routes inputs to the most relevant experts.
Adaptive priors initialization: assigns distinct SVD-based priors to different experts for expert specialization.
Optimization alignment: introduces residual matching and scaling to align LoRA-MoE updates with full MoE training dynamics.
Unified medical VLM: supports medical VQA, report generation, zero-shot classification, and hallucination mitigation.

MedQwen is a parameter-efficient medical VLM that uses spectrally routed LoRA experts to improve robustness across heterogeneous medical datasets and to reduce catastrophic forgetting.

Optimization with SVD-structured MoE by separately aligning each expert.

Selected Results

MedQwen reports strong performance across medical VQA, report generation, zero-shot classification, and continual learning.

Model	VQA-RAD	SLAKE	PathVQA	OMVQA	Avg.
Qwen-2.5-VL 7B	61.8 / 27.2	64.7 / 36.7	60.5 / 33.4	60.8	49.3
HealthGPT-L14	74.5 / 54.5	71.9 / 56.2	75.2 / 42.1	67.2	63.1
MedQwen	78.8 / 59.6	75.3 / 59.9	84.2 / 49.1	70.6	68.2

Catastrophic Forgetting

Sequential tuning on Harvard-FairVLMed then PathVQA shows only about a 5% drop for MedQwen, versus much larger drops for standard LoRA and MoELoRA.

Convergence and Rank Scaling

MedQwen converges faster than LoRA-MoE baselines and narrows the gap to full fine-tuning as rank increases.

Qualitative Examples

MedQwen answers medical questions across multiple modalities and generates medical reports for chest X-ray images.

Highlights

Zero-Shot Classification

On nine radiology benchmarks, MedQwen reaches 58.83 average accuracy and achieves about 95.31% of full FT MoE performance while using 339× fewer trainable parameters.

Report Generation

MedQwen improves over prior methods on MIMIC-CXR and IU-Xray, with strong gains in F1-RadGraph, BLEU-1, ROUGE, and CheXbert.

BibTeX

@article{nejati2026medqwen,
  title   = {Sparse Spectral LoRA: Routed Experts for Medical VLMs},
  author  = {Omid Nejati Manzari and Hojat Asgariandehkordi and Taha Koleilat and Yiming Xiao and Hassan Rivaz},
  journal = {arXiv preprint},
  year    = {2026}
}