Radar Tensor-Guided Multi-Model Fusion Framework with Monocular Image for Three-Dimensional Object Detection

Date of Award

5-9-2026

Degree Name

M.S. in Electrical Engineering

Department

Department of Electrical and Computer Engineering

Advisor/Chair

Vijayan Asari

Abstract

Robust object detection in outdoor environments, particularly for autonomous driving, requires reliable performance under diverse and adverse weather conditions. Millimeter-wave (mmWave) radar offers a significant advantage in such scenarios due to its resilience to weather-related degradation. In this work, we propose a transformer-based 3D object detection framework that integrates radar and camera modalities through a novel cross-modal fusion architecture. Specifically, semantic information derived from the 4D radar tensor is utilized to guide image feature learning, thereby promoting effective cross-modal integration and enhancing detection performance. To effectively extract and fuse information from both modalities, we employ a deformable attention mechanism with a normalized spatial grid, enabling multi-scale feature aggregation based on relative spatial locations. This strategy enhances the utilization of radar’s inherent robustness and improves detection reliability in challenging outdoor environments. Experimental results demonstrate consistent performance gains over baseline and existing multi-modal approaches, with improved detection accuracy across both bird-eye-view (BEV) and 3D detection tasks.

Keywords

Electrical Engineering

Comments

OCLC No. 1591830107

Rights Statement

Copyright 2026, author.

Share

COinS
 
 
 

Links