Neural Rendering for Sensor Adaptation in 3D Object Detection

Felix Embacher*,1,2 David Holtz*,1,3 Jonas Uhrig1 Marius Cordts1 Markus Enzweiler2
*Equal Contribution
1Mercedes-Benz AG 2Esslingen University of Applied Sciences 3University of Stuttgart
IEEE Intelligent Vehicles Symposium (IV) 2025
Sensor Adaptation Pipeline

Abstract

Autonomous vehicles often have varying camera sensor setups, which is inevitable due to restricted placement options for different vehicle types. Training a perception model on one particular setup and evaluating it on a new, different sensor setup reveals the so-called cross-sensor domain gap, typically leading to a degradation in accuracy. In this paper, we investigate the impact of the cross-sensor domain gap on state-of-the-art 3D object detectors. To this end, we introduce CamShift, a dataset inspired by nuScenes and created in CARLA to specifically simulate the domain gap between subcompact vehicles and sport utility vehicles (SUVs). Using CamShift, we demonstrate significant cross-sensor performance degradation, identify robustness dependencies on model architecture, and propose a data-driven solution to mitigate the effect. On the one hand, we show that model architectures based on a dense Bird’s Eye View (BEV) representation with backward projection, such as BEVFormer, are the most robust against varying sensor configurations. On the other hand, we propose a novel data-driven sensor adaptation pipeline based on neural rendering, which can transform entire datasets to match different camera sensor setups. Applying this approach improves performance across all investigated 3D object detectors, mitigating the cross-sensor domain gap by a large margin and reducing the need for new data collection by enabling efficient data reusability across vehicles with different sensor setups.

Sensor Adaptation Benchmark

To systematically evaluate the effectiveness of our neural sensor adaptation pipeline, we introduce a comprehensive sensor adaptation benchmark. This benchmark is designed to quantify the cross-sensor domain gap and assess the performance of 3D object detectors when models are transferred between different vehicle sensor configurations.

By providing standardized evaluation protocols and datasets, our benchmark enables fair and reproducible comparisons of adaptation strategies across a variety of real and synthetic sensor setups.

Training Validation Split
sim-SUV [a] sim-SUB [b]
sim-SUV [A] SUV baseline cross-sensor domain gap
sim-SUB [B] cross-sensor domain gap SUB baseline
nerf-SUV [C] neural rendering domain gap n/a
nerf-SUB [D] n/a neural sensor adaptation

Qualitative Results

The following images demonstrate how our neural sensor adaptation pipeline mitigates the cross-sensor domain gap, using StreamPETR as an example.

Consider the scenario where training data has been collected for the SUV carline (sim-SUV). We then wish to deploy StreamPETR on a new carline, specifically a subcompact vehicle (sim-SUB).

  • Red: StreamPETR trained on sim-SUV. Naively deploying StreamPETR to a new carline results in a cross-sensor domain gap, where object detections are significantly misaligned with the ground truth.
  • Green: StreamPETR trained on nerf-SUB. Our neural sensor adaptation pipeline transforms the entire sim-SUV dataset to match the viewpoints of the new carline, resulting in a synthetic dataset (nerf-SUB). Using this synthetic data, we can retrain StreamPETR and substantially improve object detection performance on the subcompact vehicle.

Quantitative Results

Spider Plot: Quantitative Results

Comparison of 3D object detection performance across different training and adaptation strategies.

Exp. Training Validation mAP ↑ [%]
DETR3D StreamPETR BEVDet BEVFormer
Bb sim-SUB sim-SUB 46.1 52.7 41.9 61.1
Ab sim-SUV sim-SUB 29.7 (−16.4) 17.3 (−35.4) 29.4 (−12.5) 50.8 (−10.3)
Db nerf-SUB sim-SUB 43.1 (+13.4) 44.8 (+27.5) 29.5 (+0.1) 52.1 (+1.3)

CamShift Dataset

We introduce the CamShift dataset, a novel dataset designed to systematically evaluate the cross-sensor domain gap in 3D object detection. CamShift consists of two splits (sim-SUV and sim-SUB). The sim-SUV split captures scenes from the perspective of an SUV, while the sim-SUB split captures the same scenes from the viewpoint of a subcompact vehicle. Despite the difference in camera perspectives, both datasets are identical.

You can download a preview of the CamShift dataset here.

For complete dataset access, please reach out to camshift@gmx.de.

Category Ratio [%]
Ambulance 0.9
Bicycle 3.7
Bus 1.1
Car 60.9
Human 23.6
Motorcycle 5.4
Truck 4.4
Split # Scenes Town ID
Train 750 01, 02,
04, 05,
06, 07,
10, 12,
15
Val 150 03, 13
Camshift Dataset