MERLIN: Building Low-SNR Robust Multimodal LLMs for Electromagnetic Signals

🔥 Accepted to CVPR
Junyu Shen*, Zhendong She*, Chenghanyu Zhang, Yuchuang Sun, Luqing Luo, Dingwei Tan, Zonghao Guo, Bo Guo, Zehua Han, Wupeng Xie, Yaxin Mu, Peng Zhang, Peipei Li, Fengxiang Wang, Yangang Sun†, Maosong Sun
Tsinghua University | Beijing University of Posts and Telecommunications | Tianjin University | IMECAS | HKUST (Guangzhou)
National University of Defense Technology | Beihang University | Beijing Information Science and Technology University
Artificial Intelligence Institute of China Electronics Technology Group Corporation
arXiv Paper Code EM-134K EM-Bench
Code, dataset, and benchmark are now publicly available.
Overview of MERLIN

Abstract

The paradigm of Multimodal Large Language Models (MLLMs) offers a promising blueprint for advancing the electromagnetic (EM) domain. However, fully realizing the MLLM potential requires overcoming three main challenges: (1) Data scarcity of high-quality paired EM signals and text, (2) the absence of comprehensive benchmarks, and (3) critical performance degradation in low Signal-to-Noise Ratio (SNR) environments.

To address these challenges, we introduce a tripartite contribution:

Comprehensive experiments validate that MERLIN achieves state-of-the-art performance on EM-Bench and exhibits remarkable robustness in noisy conditions.

EM-Bench: Comprehensive Evaluation Dimensions

The hierarchical evaluation framework of EM-Bench, which systematically assesses the perception and reasoning capabilities of MLLMs on electromagnetic IQ signals across 3 levels and 14 sub-tasks

MERLIN Framework

MERLIN Framework

The architecture and training framework of MERLIN. (1) The baseline model architecture consists of a Signal Encoder, a Projector, and a LLM. (2) The knowledge distillation framework enhances low-SNR robustness by using a frozen high-SNR teacher model to guide a student model. (3) The Denoising Subspace Module (DSM) facilitates effective distillation by projecting noisy signal features into a clean, noise-invariant feature space. MERLIN Framework Architecture

Evaluation Results

Main results on EM-Bench. We compare MERLIN with leading proprietary and open-source LLMs. All baseline LLMs process EM signals in a textualized format. The best results are highlighted in bold. ”PE” denotes Parameter Estimation tasks. Its sub-tasks denote BW: Bandwidth, DC: Duty Cycle, NP: Num. Pulses, PRF: Pulse Rep. Freq, PW: Pulse Width. MERLIN achieves state-of-the-art performance across both perception and reasoning tasks.

MERLIN Framework