Datasets

DDAD - Dense Depth for Autonomous Driving

DDAD is a new autonomous driving benchmark from TRI (Toyota Research Institute) for long range (up to 250m) and dense depth estimation in challenging and diverse urban conditions. It contains monocular videos and accurate ground-truth depth (across a full 360 degree field of view) generated from high-density LiDARs mounted on a fleet of self-driving cars operating in a cross-continental setting.


Overview

Dataset Name: DDAD - Dense Depth for Autonomous Driving

Organization: Toyota Research Institute (TRI)

Abstract: DDAD is a new autonomous driving benchmark for long range (up to 250m) and dense depth estimation in challenging and diverse urban conditions. It contains monocular videos and accurate ground-truth depth generated from high-density LiDARs mounted on a fleet of self-driving cars operating in a cross-continental setting.

Locations: United States (San Francisco, Bay Area, Cambridge, Detroit, Ann Arbor) and Japan (Tokyo, Odaiba)

Core Stats: 150 training scenes (12,650 frames), 50 validation scenes (3,950 frames), 3,080 test images, 360° coverage, 250m range, 10 Hz capture rate

License: Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License

Dataset Specifications

# DDAD dataset configuration
ddad_config = {
    "max_range": "250m",
    "depth_precision": "sub-1cm",
    "lidar_sensor": "Luminar-H2",
    "camera_count": 6,
    "camera_resolution": "2.4MP (1936 x 1216)",
    "camera_type": "Global-shutter",
    "camera_coverage": "360° (60° intervals)",
    "capture_rate": "10 Hz",
    "training_scenes": 150,
    "training_frames": 12650,
    "validation_scenes": 50,
    "validation_frames": 3950,
    "test_images": 3080,
    "total_rgb_images": 75900
}

Sample & Results Showcase

Dataset Visualization

DDAD Visualization

Panoramic Views

Panoramic View 1 Panoramic View 2 Panoramic View 3

Location Showcases

Odaiba Visualization

Headquarters Visualization

Ann Arbor Visualization

DDAD Depth Challenge

The DDAD depth challenge consists of two tracks:

  • Self-supervised monocular depth estimation
  • Semi-supervised monocular depth estimation

Methods are evaluated against ground truth LiDAR depth, with depth metrics computed per semantic class. The winner is chosen based on the abs_rel metric. Winners receive cash prizes and present their work at the CVPR 2021 Workshop "Frontiers of Monocular 3D Perception".


Experiment Description

Sensor Configuration

LiDAR: High-resolution, long-range Luminar-H2 sensors with:

  • Maximum range: 250m
  • Range precision: Sub-1cm
  • Coverage: 360° (90° intervals)
  • Frequency: 10 Hz scans

Cameras: Six calibrated cameras time-synchronized at 10 Hz:

  • Resolution: 2.4MP (1936 x 1216)
  • Type: Global-shutter
  • Orientation: 60° intervals for 360° coverage
  • Datum names: camera_01, camera_05, camera_06, camera_07, camera_08, camera_09

Sensor Placement

DDAD Sensor Placement

The figure shows the placement of DDAD LiDARs and cameras. Both LiDAR and camera sensors are positioned to provide 360° coverage around the vehicle. All sensor data is time-synchronized and reported at 10 Hz. The Luminar sensors report as a single point cloud in the vehicle frame of reference with origin on the ground below the center of the vehicle rear axle.

Dataset Structure

Training Set:

  • 150 scenes (5 or 10 seconds long)
  • 12,650 individual samples
  • 75,900 RGB images (6 cameras per sample)

Validation Set:

  • 50 scenes (5 or 10 seconds long)
  • 3,950 individual samples
  • 23,700 RGB images (6 cameras per sample)

Test Set:

  • 3,080 images with intrinsic calibration
  • 200 images with panoptic labels (similar to validation split)
  • Ground truth depth and panoptic labels not publicly available

Dataset Statistics

Training Split

LocationNum Scenes (50 frames)Num Scenes (100 frames)Total frames
SF0191,900
ANN23536,450
DET80400
Japan16313,900

Total: 150 scenes and 12,650 frames

Validation Split

LocationNum Scenes (50 frames)Num Scenes (100 frames)Total frames
SF1101,050
ANN11141,950
Japan95950

Total: 50 scenes and 3,950 frames

Location Codes:

  • USA: ANN - Ann Arbor, MI; SF - San Francisco Bay Area, CA; DET - Detroit, MI; CAM - Cambridge, Massachusetts
  • Japan: Tokyo and Odaiba

Code Implementation

Dataset Loading

The data can be downloaded here: train+val (257 GB, md5 checksum: c0da97967f76da80f86d6f97d0d98904) and test.

To load the dataset, use the TRI Dataset Governance Policy (DGP) codebase:

from dgp.datasets import SynchronizedSceneDataset

# Load synchronized pairs of camera and lidar frames
dataset = SynchronizedSceneDataset(
    '<path_to_dataset>/ddad.json',
    datum_names=('lidar', 'CAMERA_01', 'CAMERA_05'),
    generate_depth_from_datum='lidar',
    split='train'
)

# Iterate through the dataset
for sample in dataset:
    # Each sample contains a list of the requested datums
    lidar, camera_01, camera_05 = sample[0:3]
    
    # Access point cloud data
    point_cloud = lidar['point_cloud']  # Nx3 numpy.ndarray
    
    # Access camera images
    image_01 = camera_01['rgb']  # PIL.Image
    
    # Access depth maps (generated from lidar)
    depth_01 = camera_01['depth']  # (H,W) numpy.ndarray
    
    # Access camera intrinsics
    intrinsics_01 = camera_01['intrinsics']  # 3x3 numpy.ndarray
    
    # Access camera extrinsics
    extrinsics_01 = camera_01['extrinsics']  # 4x4 numpy.ndarray

Evaluation Metrics

For detailed depth evaluation metrics, refer to the Packnet-SfM codebase.

We also provide an evaluation script compatible with our Eval.AI challenge:

cd evaluation
python3 main.py gt_val.zip pred_val_sup.zip semi

Evaluation Resources:

Submission Format: Single zip file with same file name convention as test split (000000.png ... 003079.png). Each entry should be a 16-bit single channel PNG image. Predictions can be at full image resolution or downsampled (will be upsampled using nearest neighbor interpolation if needed).

IPython Notebook

The associated IPython notebook provides detailed instructions on:

  • Instantiating the dataset with various options
  • Loading frames with context
  • Visualizing RGB and depth images for various cameras
  • Displaying the LiDAR point cloud

DDAD Notebook


Version History

DateVersionDetails
2021v1.0Initial release with training, validation, and test splits. CVPR 2021 challenge launch.

Future Releases

  • v1.1 (Planned): Additional urban locations and scenarios
  • v2.0 (Planned): Extended range capabilities and additional sensor modalities

Contact & Support

Organization: Toyota Research Institute (TRI)

Website: https://www.tri.global/

Dataset Repository: TRI-ML/dgp

Challenge Platform: Eval.AI DDAD Challenge

Packnet-SfM Codebase: TRI-ML/packnet-sfm

Getting Help

For questions about dataset usage, evaluation metrics, or integration with your depth estimation pipeline, please refer to:


References

3D Packing for Self-Supervised Monocular Depth Estimation (CVPR 2020 oral)

Vitor Guizilini, Rares Ambrus, Sudeep Pillai, Allan Raventos and Adrien Gaidon

@inproceedings{packnet,
  author = {Vitor Guizilini and Rares Ambrus and Sudeep Pillai and Allan Raventos and Adrien Gaidon},
  title = {3D Packing for Self-Supervised Monocular Depth Estimation},
  booktitle = {IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
  primaryClass = {cs.CV},
  year = {2020},
}

Privacy

To ensure privacy, the DDAD dataset has been anonymized using state-of-the-art object detectors for license plate and face blurring.


License

Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

Dataset Impact

DDAD provides the first comprehensive benchmark for long-range (up to 250m) dense depth estimation in autonomous driving scenarios, enabling research into advanced monocular depth estimation methods for self-driving vehicles.

Previous
Uncrewed Aerial Vehicles (UAVs)