DDAD - Dense Depth for Autonomous Driving Dataset

DDAD is a new autonomous driving benchmark from TRI (Toyota Research Institute) for long range (up to 250m) and dense depth estimation in challenging and diverse urban conditions. It contains monocular videos and accurate ground-truth depth (across a full 360 degree field of view) generated from high-density LiDARs mounted on a fleet of self-driving cars operating in a cross-continental setting.

Overview

Dataset Name: DDAD - Dense Depth for Autonomous Driving

Organization: Toyota Research Institute (TRI)

Abstract: DDAD is a new autonomous driving benchmark for long range (up to 250m) and dense depth estimation in challenging and diverse urban conditions. It contains monocular videos and accurate ground-truth depth generated from high-density LiDARs mounted on a fleet of self-driving cars operating in a cross-continental setting.

Locations: United States (San Francisco, Bay Area, Cambridge, Detroit, Ann Arbor) and Japan (Tokyo, Odaiba)

Core Stats: 150 training scenes (12,650 frames), 50 validation scenes (3,950 frames), 3,080 test images, 360° coverage, 250m range, 10 Hz capture rate

License: Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License

Dataset Specifications

# DDAD dataset configuration
ddad_config = {
    "max_range": "250m",
    "depth_precision": "sub-1cm",
    "lidar_sensor": "Luminar-H2",
    "camera_count": 6,
    "camera_resolution": "2.4MP (1936 x 1216)",
    "camera_type": "Global-shutter",
    "camera_coverage": "360° (60° intervals)",
    "capture_rate": "10 Hz",
    "training_scenes": 150,
    "training_frames": 12650,
    "validation_scenes": 50,
    "validation_frames": 3950,
    "test_images": 3080,
    "total_rgb_images": 75900
}

Sample & Results Showcase

Dataset Visualization

DDAD Visualization

Panoramic Views

Panoramic View 1 Panoramic View 2 Panoramic View 3

Location Showcases

Odaiba Visualization

Headquarters Visualization

Ann Arbor Visualization

DDAD Depth Challenge

The DDAD depth challenge consists of two tracks:

Self-supervised monocular depth estimation
Semi-supervised monocular depth estimation

Methods are evaluated against ground truth LiDAR depth, with depth metrics computed per semantic class. The winner is chosen based on the abs_rel metric. Winners receive cash prizes and present their work at the CVPR 2021 Workshop "Frontiers of Monocular 3D Perception".

Experiment Description

Sensor Configuration

LiDAR: High-resolution, long-range Luminar-H2 sensors with:

Maximum range: 250m
Range precision: Sub-1cm
Coverage: 360° (90° intervals)
Frequency: 10 Hz scans

Cameras: Six calibrated cameras time-synchronized at 10 Hz:

Resolution: 2.4MP (1936 x 1216)
Type: Global-shutter
Orientation: 60° intervals for 360° coverage
Datum names: camera_01, camera_05, camera_06, camera_07, camera_08, camera_09

Sensor Placement

DDAD Sensor Placement

The figure shows the placement of DDAD LiDARs and cameras. Both LiDAR and camera sensors are positioned to provide 360° coverage around the vehicle. All sensor data is time-synchronized and reported at 10 Hz. The Luminar sensors report as a single point cloud in the vehicle frame of reference with origin on the ground below the center of the vehicle rear axle.

Dataset Structure

Training Set:

150 scenes (5 or 10 seconds long)
12,650 individual samples
75,900 RGB images (6 cameras per sample)

Validation Set:

50 scenes (5 or 10 seconds long)
3,950 individual samples
23,700 RGB images (6 cameras per sample)

Test Set:

3,080 images with intrinsic calibration
200 images with panoptic labels (similar to validation split)
Ground truth depth and panoptic labels not publicly available

Dataset Statistics

Training Split

Location	Num Scenes (50 frames)	Num Scenes (100 frames)	Total frames
SF	0	19	1,900
ANN	23	53	6,450
DET	8	0	400
Japan	16	31	3,900

Total: 150 scenes and 12,650 frames

Validation Split

Location	Num Scenes (50 frames)	Num Scenes (100 frames)	Total frames
SF	1	10	1,050
ANN	11	14	1,950
Japan	9	5	950

Total: 50 scenes and 3,950 frames

Location Codes:

USA: ANN - Ann Arbor, MI; SF - San Francisco Bay Area, CA; DET - Detroit, MI; CAM - Cambridge, Massachusetts
Japan: Tokyo and Odaiba

Code Implementation

Dataset Loading

The data can be downloaded here: train+val (257 GB, md5 checksum: c0da97967f76da80f86d6f97d0d98904) and test.

To load the dataset, use the TRI Dataset Governance Policy (DGP) codebase:

from dgp.datasets import SynchronizedSceneDataset

# Load synchronized pairs of camera and lidar frames
dataset = SynchronizedSceneDataset(
    '<path_to_dataset>/ddad.json',
    datum_names=('lidar', 'CAMERA_01', 'CAMERA_05'),
    generate_depth_from_datum='lidar',
    split='train'
)

# Iterate through the dataset
for sample in dataset:
    # Each sample contains a list of the requested datums
    lidar, camera_01, camera_05 = sample[0:3]
    
    # Access point cloud data
    point_cloud = lidar['point_cloud']  # Nx3 numpy.ndarray
    
    # Access camera images
    image_01 = camera_01['rgb']  # PIL.Image
    
    # Access depth maps (generated from lidar)
    depth_01 = camera_01['depth']  # (H,W) numpy.ndarray
    
    # Access camera intrinsics
    intrinsics_01 = camera_01['intrinsics']  # 3x3 numpy.ndarray
    
    # Access camera extrinsics
    extrinsics_01 = camera_01['extrinsics']  # 4x4 numpy.ndarray

Evaluation Metrics

For detailed depth evaluation metrics, refer to the Packnet-SfM codebase.

We also provide an evaluation script compatible with our Eval.AI challenge:

cd evaluation
python3 main.py gt_val.zip pred_val_sup.zip semi

Evaluation Resources:

Ground-truth depth maps for validation: gt_val.zip
Example submission file: pred_val_sup.zip

Submission Format: Single zip file with same file name convention as test split (000000.png ... 003079.png). Each entry should be a 16-bit single channel PNG image. Predictions can be at full image resolution or downsampled (will be upsampled using nearest neighbor interpolation if needed).

IPython Notebook

The associated IPython notebook provides detailed instructions on:

Instantiating the dataset with various options
Loading frames with context
Visualizing RGB and depth images for various cameras
Displaying the LiDAR point cloud

DDAD Notebook

Version History

Date	Version	Details
2021	v1.0	Initial release with training, validation, and test splits. CVPR 2021 challenge launch.

Future Releases

v1.1 (Planned): Additional urban locations and scenarios
v2.0 (Planned): Extended range capabilities and additional sensor modalities

Contact & Support

Organization: Toyota Research Institute (TRI)

Website: https://www.tri.global/

Dataset Repository: TRI-ML/dgp

Challenge Platform: Eval.AI DDAD Challenge

Packnet-SfM Codebase: TRI-ML/packnet-sfm

Getting Help

For questions about dataset usage, evaluation metrics, or integration with your depth estimation pipeline, please refer to:

The DGP codebase documentation
The Packnet-SfM repository for training/inference/evaluation examples
The Eval.AI challenge page for submission guidelines

References

3D Packing for Self-Supervised Monocular Depth Estimation (CVPR 2020 oral)

Vitor Guizilini, Rares Ambrus, Sudeep Pillai, Allan Raventos and Adrien Gaidon

Paper: arXiv:1905.02693
Video: YouTube

@inproceedings{packnet,
  author = {Vitor Guizilini and Rares Ambrus and Sudeep Pillai and Allan Raventos and Adrien Gaidon},
  title = {3D Packing for Self-Supervised Monocular Depth Estimation},
  booktitle = {IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
  primaryClass = {cs.CV},
  year = {2020},
}

Privacy

To ensure privacy, the DDAD dataset has been anonymized using state-of-the-art object detectors for license plate and face blurring.

License

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

Dataset Impact

DDAD provides the first comprehensive benchmark for long-range (up to 250m) dense depth estimation in autonomous driving scenarios, enabling research into advanced monocular depth estimation methods for self-driving vehicles.