CamShift Dataset

Abstract

Autonomous vehicles often have varying camera sensor setups, which is inevitable due to restricted placement options for different vehicle types. Training a perception model on one particular setup and evaluating it on a new, different sensor setup reveals the so-called cross-sensor domain gap, typically leading to a degradation in accuracy. In this paper, we investigate the impact of the cross-sensor domain gap on state-of-the-art 3D object detectors. To this end, we introduce CamShift, a dataset inspired by nuScenes and created in CARLA to specifically simulate the domain gap between subcompact vehicles and sport utility vehicles (SUVs). Using CamShift, we demonstrate significant cross-sensor performance degradation, identify robustness dependencies on model architecture, and propose a data-driven solution to mitigate the effect. On the one hand, we show that model architectures based on a dense Bird’s Eye View (BEV) representation with backward projection, such as BEVFormer, are the most robust against varying sensor configurations. On the other hand, we propose a novel data-driven sensor adaptation pipeline based on neural rendering, which can transform entire datasets to match different camera sensor setups. Applying this approach improves performance across all investigated 3D object detectors, mitigating the cross-sensor domain gap by a large margin and reducing the need for new data collection by enabling efficient data reusability across vehicles with different sensor setups.

Sensor Adaptation Benchmark

To systematically evaluate the effectiveness of our neural sensor adaptation pipeline, we introduce a comprehensive sensor adaptation benchmark. This benchmark is designed to quantify the cross-sensor domain gap and assess the performance of 3D object detectors when models are transferred between different vehicle sensor configurations.

By providing standardized evaluation protocols and datasets, our benchmark enables fair and reproducible comparisons of adaptation strategies across a variety of real and synthetic sensor setups.

Training	Validation Split
Training	sim-SUV [a]	sim-SUB [b]
sim-SUV [A]	SUV baseline	cross-sensor domain gap
sim-SUB [B]	cross-sensor domain gap	SUB baseline
nerf-SUV [C]	neural rendering domain gap	n/a
nerf-SUB [D]	n/a	neural sensor adaptation

Qualitative Results

The following images demonstrate how our neural sensor adaptation pipeline mitigates the cross-sensor domain gap, using StreamPETR as an example.

Consider the scenario where training data has been collected for the SUV carline (sim-SUV). We then wish to deploy StreamPETR on a new carline, specifically a subcompact vehicle (sim-SUB).

Red: StreamPETR trained on sim-SUV. Naively deploying StreamPETR to a new carline results in a cross-sensor domain gap, where object detections are significantly misaligned with the ground truth.
Green: StreamPETR trained on nerf-SUB. Our neural sensor adaptation pipeline transforms the entire sim-SUV dataset to match the viewpoints of the new carline, resulting in a synthetic dataset (nerf-SUB). Using this synthetic data, we can retrain StreamPETR and substantially improve object detection performance on the subcompact vehicle.

StreamPETR deployed on sim-SUB. Red: Trained on sim-SUV, Green: Trained on nerf-SUB.

Quantitative Results

Comparison of 3D object detection performance across different training and adaptation strategies.

Exp.	Training	Validation	mAP ↑ [%]
Exp.	Training	Validation	DETR3D	StreamPETR	BEVDet	BEVFormer
Bb	sim-SUB	sim-SUB	46.1	52.7	41.9	61.1
Ab	sim-SUV	sim-SUB	29.7 (−16.4)	17.3 (−35.4)	29.4 (−12.5)	50.8 (−10.3)
Db	nerf-SUB	sim-SUB	43.1 (+13.4)	44.8 (+27.5)	29.5 (+0.1)	52.1 (+1.3)