nav emailalert searchbtn searchbox tablepage yinyongbenwen piczone journalimg journalInfo journalinfonormal searchdiv searchzone qikanlogo popupnotification paper paperNew
2026, 05, v.36 36-44
毫米波雷达物理先验引导的多模态3D目标检测
基金项目(Foundation): 山西省基础研究计划项目(青年)(202203021222049)
邮箱(Email):
DOI: 10.20165/j.cnki.ISSN1673-629X.2025.0353
发布时间: 2026-01-04
出版时间: 2026-01-04
网络发布时间: 2026-01-04
移动端阅读
摘要:

针对单一传感器在复杂天气与光照条件下性能受限的问题,融合毫米波雷达与视觉信息的多模态3D目标检测方法成为提升系统鲁棒性的有效途径。当前主流方法仍面临雷达点云稀疏、图像深度估计不准确及异构模态特征交互不足等挑战。为此,提出一种雷达物理特性先验引导的动态多模态融合增强方法。该方法构建了雷达先验增强网络(Radar Prior Enhancement Network, RaPENet),利用雷达反射强度等物理属性通过动态高斯扩展对稀疏点云特征稠密化建模,并结合空间感知信息优化图像模态深度估计。为了缓解在鸟瞰图(Bird’s-Eye View, BEV)空间融合过程中跨模态特征交互不充分的问题,设计了可变形交叉注意力门控融合模块(Deformable Cross-Attention with Gated Fusion, DCAGFusion),通过动态空间采样与模态可信度调控机制实现异构模态BEV特征之间的空间对齐与自适应性融合。在nuScenes基准数据集上的实验表明,该方法在NDS与mAP指标上分别达到57.4%和45.9%,相较于基线模型提升0.6%,验证了该方法在检测精度与环境适应性方面的有效性。

Abstract:

Multimodal 3D object detection has emerged as an effective solution to overcome the limitations of single sensors under adverse weather and lighting conditions. However, existing approaches are hindered by sparse radar point clouds, inaccurate image depth estimation, and weak cross-modal feature interaction. To address these challenges, we propose a radar prior–guided multimodal fusion framework. This framework constructs a radar prior enhancement network(RaPENet) which leverages physical attributes such as Radar Cross Section to densify sparse point clouds through dynamic Gaussian expansion and to enhance image depth estimation with spatially aware constraints. To further improve fusion in Bird's-Eye View(BEV) space, we design a Deformable Cross-Attention with Gated Fusion(DCAGFusion) module that enables spatially aligned and confidence-adaptive integration of cross-modal BEV features. Experiments on the nuScenes benchmark show that the proposed method achieves 57.4% NDS and 45.9% mAP,surpassing baseline models by 0.6%. These results highlight the advantage of incorporating radar physical priors and adaptive fusion for robust and accurate multimodal 3D detection in challenging environments.

参考文献

[1] NABATI R,QI H.Centerfusion:center-based radar and camera fusion for 3d object detection[C]//Proceedings of the IEEE/CVF winter conference on applications of computer vision.Waikoloa:IEEE,2021:1527-1536.

[2] YANG B,GUO R,LIANG M,et al.Radarnet:exploiting radar for robust perception of dynamic objects[C]//European conference on computer vision.Glasgow:Springer,2020:496-512.

[3] 陈建,苏思教,黄立勤,等.自动驾驶中的3D目标检测研究进展[J/OL].电子学报.https://link.cnki.net/urlid/11.2087.TN.20250805.2304.002.

[4] 赵越坤,罗素云,魏丹,等.基于毫米波雷达和视觉的目标检测方法[J].计算机技术与发展,2023,33(6):35-40.

[5] ZHOU T,CHEN J,SHI Y,et al.Bridging the view disparity between radar and camera features for multi-modal fusion 3d object detection[J].IEEE Transactions on Intelligent Vehicles,2023,8(2):1523-1535.

[6] KIM Y,SHIN J,KIM S,et al.Crn:camera radar net for accurate,robust,efficient 3d perception[C]//Proceedings of the IEEE/CVF international conference on computer vision.Vancouver:IEEE,2023:17615-17626.

[7] LIN Z,LIU Z,XIA Z,et al.Rcbevdet:radar-camera fusion in bird's eye view for 3d object detection[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition.Seattle:IEEE,2024:14928-14937.

[8] LIU Z,TANG H,AMINI A,et al.BEVFusion:multi-task multi-sensor fusion with unified bird's-eye view representation[C]//2023 IEEE international conference on robotics and automation (ICRA).[s.l.]:IEEE,2023:2774-2781.

[9] ZHAO H,GUAN R,WU T,et al.Unibevfusion:unified radar-vision bevfusion for 3d object detection[C]//2025 IEEE international conference on robotics and automation (ICRA).Atlanta:IEEE,2025:6321-6327.

[10] SCHRAMM J,VÖDISCH N,PETEK K,et al.Bevcar:camera-radar fusion for bev map and object segmentation[C]//2024 IEEE/RSJ international conference on intelligent robots and systems (IROS).Abu Dhabi:IEEE,2024:1435-1442.

[11] CHU X,DENG J,YOU G,et al.RaCFormer:towards high-quality 3d object detection via query-based radar-camera fusion[C]//Proceedings of the computer vision and pattern recognition conference.Nashville:IEEE,2025:17081-17091.

[12] 李万青.基于毫米波雷达和视觉融合的目标检测方法研究[D].秦皇岛:燕山大学,2024.

[13] BANG G,CHOI K,KIM J,et al.Radardistill:boosting radar-based object detection performance via knowledge distillation from lidar features[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition.Seattle:IEEE,2024:15491-15500.

[14] 龚国强,夏鑫宇,丁晓波,等.基于Transformer的相机-毫米波雷达融合3D目标检测算法[J/OL].计算机应用与软件.https://link.cnki.net/urlid/31.1260.TP.20250903.1633.002.

[15] FENT F,PALFFY A,CAESAR H.Dpft:dual perspective fusion transformer for camera-radar-based object detection[J].IEEE Transactions on Intelligent Vehicles,2024,8(10):1-11.

[16] LIU Y,CHANG S,WEI Z,et al.Fusing mmWave radar with camera for 3-D detection in autonomous driving[J].IEEE Internet of Things Journal,2022,9(20):20408-20421.

[17] KURNIAWAN I T,TRILAKSONO B R.ClusterFusion:leveraging radar spatial features for radar-camera 3D object detection in autonomous vehicles[J].IEEE Access,2023,11:121511-121528.

[18] YUE J,LIN Z,LIN X,et al.RobuRCDet:enhancing robustness of radar-camera fusion in bird's eye view for 3D object detection[J].arXiv:2502.13071,2025.

[19] KIM J,SEONG M,CHOI J W.Crt-fusion:camera,radar,temporal fusion using motion information for 3d object detection[J].Advances in Neural Information Processing Systems,2024,37:108625-108648.

[20] LI Z,WANG W,LI H,et al.Bevformer:learning bird's-eye-view representation from lidar-camera via spatiotemporal transformers[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2025,47(3):2020-2036.

[21] CAESAR H,BANKITI V,LANG A H,et al.Nuscenes:a multimodal dataset for autonomous driving[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition.[s.l.]:IEEE,2020:11621-11631.

[22] CONTRIBUTORS M.MMDetection3D:OpenMMLab next-generation platform for general 3D object detection[EB/OL].2020.https://github.com/open-mmlab/mmdetection3d.

[23] HUANG J,HUANG G,ZHU Z,et al.Bevdet:high-performance multi-camera 3d object detection in bird-eye-view[J].arXiv:2112.11790,2021.

[24] LI Y,GE Z,YU G,et al.Bevdepth:acquisition of reliable depth for multi-view 3d object detection[C]//Proceedings of the AAAI conference on artificial intelligence.Menlo Park:AAAI Press,2023:1477-1485.

[25] PARK J,XU C,YANG S,et al.Time will tell:new outlooks and a baseline for temporal multi-view 3d object detection[J].arXiv:2210.02443,2022.

[26] WANG S,LIU Y,WANG T,et al.Exploring object-centric temporal modeling for efficient multi-view 3d object detection[C]//Proceedings of the IEEE/CVF international conference on computer vision.Vancouver:IEEE,2023:3621-3631.

[27] CHU X,DENG J,YOU G,et al.Rayformer:improving query-based multi-camera 3d object detection via ray-centric strategies[C]//Proceedings of the 32nd ACM international conference on multimedia.New York:ACM,2024:4620-4629.

[28] LI Z,LAN S,ALVAREZ J M,et al.Bevnext:reviving dense bev frameworks for 3d object detection[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition.Seattle:IEEE,2024:20113-20123.

[29] KIM Y,KIM S,CHOI J W,et al.Craft:camera-radar 3d object detection with spatio-contextual fusion transformer[C]//Proceedings of the AAAI conference on artificial intelligence.Menlo Park:AAAI Press,2023:1160-1168.

[30] ZHAO L,SONG J,SKINNER K A.Crkd:enhanced camera-radar object detection with cross- modality knowledge distillation[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition.Seattle:IEEE,2024:15470-15480.

基本信息:

DOI:10.20165/j.cnki.ISSN1673-629X.2025.0353

中图分类号:TN959

引用信息:

[1]郭江涛,高媛,翟双姣,等.毫米波雷达物理先验引导的多模态3D目标检测[J].计算机技术与发展,2026,36(05):36-44.DOI:10.20165/j.cnki.ISSN1673-629X.2025.0353.

基金信息:

山西省基础研究计划项目(青年)(202203021222049)

发布时间:

2026-01-04

出版时间:

2026-01-04

网络发布时间:

2026-01-04

检 索 高级检索

引用

GB/T 7714-2015 格式引文
MLA格式引文
APA格式引文