[CVPR'21] Projecting Your View Attentively: Monocular Road Scene Layout Estimation via Cross-view Transformation

Overview

Projecting Your View Attentively: Monocular Road Scene Layout Estimation via Cross-view Transformation

Weixiang Yang, Qi Li, Wenxi Liu, Yuanlong Yu, Yuexin Ma, Shengfeng He, Jia Pan

Paper

Accepted to CVPR 2021

图片

Abstract

HD map reconstruction is crucial for autonomous driving. LiDAR-based methods are limited due to the deployed expensive sensors and time-consuming computation. Camera-based methods usually need to separately perform road segmentation and view transformation, which often causes distortion and the absence of content.  To push the limits of the technology, we present a novel framework that enables reconstructing a local map formed by road layout and vehicle occupancy in the bird's-eye view given a front-view monocular image only.  In particular, we propose a cross-view transformation module, which takes the constraint of cycle consistency between views into account and makes full use of their correlation to strengthen the view transformation and scene understanding. Considering the relationship between vehicles and roads, we also design a context-aware discriminator to further refine the results. Experiments on public benchmarks show that our method achieves the state-of-the-art performance in the tasks of road layout estimation and vehicle occupancy estimation. Especially for the latter task, our model outperforms all competitors by a large margin. Furthermore, our model runs at 35 FPS on a single GPU, which is efficient and applicable for real-time panorama HD map reconstruction.

Contributions

  • We propose a novel framework that reconstructs a local map formed by top-view road scene layout and vehicle occupancy using a single monocular front-view image only. In particular, we propose a cross-view transformation module which leverages the cycle consistency between views and their correlation to strengthen the view transformation.
  • We also propose a context-aware discriminator that considers the spatial relationship between vehicles and roads in the task of estimating vehicle occupancies.
  • On public benchmarks, it is demonstrated that our model achieves the state-of-the-art performance for the tasks of road layout and vehicle occupancy estimation.

Approach overview

图片

Repository Structure

cross-view/
├── crossView            # Contains scripts for dataloaders and network/model architecture
└── datasets             # Contains datasets
    ├── argoverse        # argoverse dataset
    ├── kitti            # kitti dataset 
├── log                  # Contains a log of network/model
├── losses               # Contains scripts for loss of network/model
├── models               # Contains the saved model of the network/model
├── output               # Contains output of network/model
└── splits
    ├── 3Dobject         # Training and testing splits for KITTI 3DObject Detection dataset 
    ├── argo             # Training and testing splits for Argoverse Tracking v1.0 dataset
    ├── odometry         # Training and testing splits for KITTI Odometry dataset
    └── raw              # Training and testing splits for KITTI RAW dataset(based on Schulter et. al.)

Installation

We recommend setting up a Python 3.7 and Pytorch 1.0 Virtual Environment and installing all the dependencies listed in the requirements file.

git clone https://github.com/JonDoe-297/cross-view.git

cd cross-view
pip install -r requirements.txt

Datasets

In the paper, we've presented results for KITTI 3D Object, KITTI Odometry, KITTI RAW, and Argoverse 3D Tracking v1.0 datasets. For comparison with Schulter et. al., We've used the same training and test splits sequences from the KITTI RAW dataset. For more details about the training/testing splits one can look at the splits directory. And you can download Ground-truth from Monolayout.

# Download KITTI RAW
./data/download_datasets.sh raw

# Download KITTI 3D Object
./data/download_datasets.sh object

# Download KITTI Odometry
./data/download_datasets.sh odometry

# Download Argoverse Tracking v1.0
./data/download_datasets.sh argoverse

The above scripts will download, unzip and store the respective datasets in the datasets directory.

datasets/
└── argoverse                          # argoverse dataset
    └── argoverse-tracking
        └── train1
            └── 1d676737-4110-3f7e-bec0-0c90f74c248f
                ├── car_bev_gt         # Vehicle GT
                ├── road_gt            # Road GT
                ├── stereo_front_left  # RGB image
└── kitti                              # kitti dataset 
    └── object                         # kitti 3D Object dataset 
        └── training
            ├── image_2                # RGB image
            ├── vehicle_256            # Vehicle GT
    ├── odometry                       # kitti odometry dataset 
        └── 00
            ├── image_2                # RGB image
            ├── road_dense128  # Road GT
    ├── raw                            # kitti raw dataset 
        └── 2011_09_26
            └── 2011_09_26_drive_0001_sync
                ├── image_2            # RGB image
                ├── road_dense128      # Road GT

Training

  1. Prepare the corresponding dataset
  2. Run training
# Corss view Road (KITTI Odometry)
python3 train.py --type static --split odometry --data_path ./datasets/odometry/ --model_name <Model Name with specifications>

# Corss view Vehicle (KITTI 3D Object)
python3 train.py --type dynamic --split 3Dobject --data_path ./datasets/kitti/object/training --model_name <Model Name with specifications>

# Corss view Road (KITTI RAW)
python3 train.py --type static --split raw --data_path ./datasets/kitti/raw/  --model_name <Model Name with specifications>

# Corss view Vehicle (Argoverse Tracking v1.0)
python3 train.py --type dynamic --split argo --data_path ./datasets/argoverse/ --model_name <Model Name with specifications>

# Corss view Road (Argoverse Tracking v1.0)
python3 train.py --type static --split argo --data_path ./datasets/argoverse/ --model_name <Model Name with specifications>
  1. The training model are in "models" (default: ./models)

Testing

  1. Download pre-trained models
  2. Run testing
python3 test.py --type <static/dynamic> --model_path <path to the model directory> --image_path <path to the image directory>  --out_dir <path to the output directory> 
  1. The results are in "output" (default: ./output)

Evaluation

  1. Prepare the corresponding dataset
  2. Download pre-trained models
  3. Run evaluation
# Evaluate on KITTI Odometry 
python3 eval.py --type static --split odometry --model_path <path to the model directory> --data_path ./datasets/odometry --height 512 --width 512 --occ_map_size 128

# Evaluate on KITTI 3D Object
python3 eval.py --type dynamic --split 3Dobject --model_path <path to the model directory> --data_path ./datasets/kitti/object/training

# Evaluate on KITTI RAW
python3 eval.py --type static --split raw --model_path <path to the model directory> --data_path ./datasets/kitti/raw/

# Evaluate on Argoverse Tracking v1.0 (Road)
python3 eval.py --type static --split argo --model_path <path to the model directory> --data_path ./datasets/kitti/argoverse/

# Evaluate on Argoverse Tracking v1.0 (Vehicle)
python3 eval.py --type dynamic --split argo --model_path <path to the model directory> --data_path ./datasets/kitti/argoverse
  1. The results are in "output" (default: ./output)

Pretrained Models

The following table provides links to the pre-trained models for each dataset mentioned in our paper. The table also shows the corresponding evaluation results for these models.

Dataset Segmentation Objects mIOU(%) mAP(%) Pretrained Model
KITTI 3D Object Vehicle 38.85 51.04 link
KITTI Odometry Road 77.47 86.39 link
KITTI Raw Road 68.26 79.65 link
Argoverse Tracking Vehicle 47.87 62.69 link
Argoverse Tracking Road 76.56 87.30 link

Results

图片

Contact

If you meet any problems, please describe them in issues or contact:

License

This project is released under the MIT License (refer to the LICENSE file for details).This project partially depends on the sources of Monolayout

Direct Multi-view Multi-person 3D Human Pose Estimation

Implementation of NeurIPS-2021 paper: Direct Multi-view Multi-person 3D Human Pose Estimation [paper] [video-YouTube, video-Bilibili] [slides] This is

Sea AI Lab 251 Dec 30, 2022
Official implementation of the paper ``Unifying Nonlocal Blocks for Neural Networks'' (ICCV'21)

Spectral Nonlocal Block Overview Official implementation of the paper: Unifying Nonlocal Blocks for Neural Networks (ICCV'21) Spectral View of Nonloca

91 Dec 14, 2022
Language Models for the legal domain in Spanish done @ BSC-TEMU within the "Plan de las Tecnologías del Lenguaje" (Plan-TL).

Spanish legal domain Language Model ⚖️ This repository contains the page for two main resources for the Spanish legal domain: A RoBERTa model: https:/

Plan de Tecnologías del Lenguaje - Gobierno de España 12 Nov 14, 2022
Prompt Tuning with Rules

PTR Code and datasets for our paper "PTR: Prompt Tuning with Rules for Text Classification" If you use the code, please cite the following paper: @art

THUNLP 118 Dec 30, 2022
FocusFace: Multi-task Contrastive Learning for Masked Face Recognition

FocusFace This is the official repository of "FocusFace: Multi-task Contrastive Learning for Masked Face Recognition" accepted at IEEE International C

Pedro Neto 21 Nov 17, 2022
This is a simple framework to make object detection dataset very quickly

FastAnnotation Table of contents General info Requirements Setup General info This is a simple framework to make object detection dataset very quickly

Serena Tetart 1 Jan 24, 2022
A Python framework for developing parallelized Computational Fluid Dynamics software to solve the hyperbolic 2D Euler equations on distributed, multi-block structured grids.

pyHype: Computational Fluid Dynamics in Python pyHype is a Python framework for developing parallelized Computational Fluid Dynamics software to solve

Mohamed Khalil 21 Nov 22, 2022
CLOCs: Camera-LiDAR Object Candidates Fusion for 3D Object Detection

CLOCs is a novel Camera-LiDAR Object Candidates fusion network. It provides a low-complexity multi-modal fusion framework that improves the performance of single-modality detectors. CLOCs operates on

Su Pang 254 Dec 16, 2022
Official PyTorch(Geometric) implementation of DPGNN(DPGCN) in "Distance-wise Prototypical Graph Neural Network for Node Imbalance Classification"

DPGNN This repository is an official PyTorch(Geometric) implementation of DPGNN(DPGCN) in "Distance-wise Prototypical Graph Neural Network for Node Im

Yu Wang (Jack) 18 Oct 12, 2022
Py-faster-rcnn - Faster R-CNN (Python implementation)

py-faster-rcnn has been deprecated. Please see Detectron, which includes an implementation of Mask R-CNN. Disclaimer The official Faster R-CNN code (w

Ross Girshick 7.8k Jan 03, 2023
An open-source Deep Learning Engine for Healthcare that aims to treat & prevent major diseases

AlphaCare Background AlphaCare is a work-in-progress, open-source Deep Learning Engine for Healthcare that aims to treat and prevent major diseases. T

Siraj Raval 44 Nov 05, 2022
Empowering journalists and whistleblowers

Onymochat Empowering journalists and whistleblowers Onymochat is an end-to-end encrypted, decentralized, anonymous chat application. You can also host

Samrat Dutta 19 Sep 02, 2022
Annotated, understandable, and visually interpretable PyTorch implementations of: VAE, BIRVAE, NSGAN, MMGAN, WGAN, WGANGP, LSGAN, DRAGAN, BEGAN, RaGAN, InfoGAN, fGAN, FisherGAN

Overview PyTorch 0.4.1 | Python 3.6.5 Annotated implementations with comparative introductions for minimax, non-saturating, wasserstein, wasserstein g

Shayne O'Brien 471 Dec 16, 2022
EMNLP'2021: SimCSE: Simple Contrastive Learning of Sentence Embeddings

SimCSE: Simple Contrastive Learning of Sentence Embeddings This repository contains the code and pre-trained models for our paper SimCSE: Simple Contr

Princeton Natural Language Processing 2.5k Dec 29, 2022
Resources related to EMNLP 2021 paper "FAME: Feature-Based Adversarial Meta-Embeddings for Robust Input Representations"

FAME: Feature-based Adversarial Meta-Embeddings This is the companion code for the experiments reported in the paper "FAME: Feature-Based Adversarial

Bosch Research 11 Nov 27, 2022
Fast and robust clustering of point clouds generated with a Velodyne sensor.

Depth Clustering This is a fast and robust algorithm to segment point clouds taken with Velodyne sensor into objects. It works with all available Velo

Photogrammetry & Robotics Bonn 957 Dec 21, 2022
Flow is a computational framework for deep RL and control experiments for traffic microsimulation.

Flow Flow is a computational framework for deep RL and control experiments for traffic microsimulation. See our website for more information on the ap

867 Jan 02, 2023
Structural Constraints on Information Content in Human Brain States

Structural Constraints on Information Content in Human Brain States Code accompanying the paper "The information content of brain states is explained

Leon Weninger 3 Sep 07, 2022
Reinfore learning tool box, contains trpo, a3c algorithm for continous action space

RL_toolbox all the algorithm is running on pycharm IDE, or the package loss error may exist. implemented algorithm: trpo a3c a3c:for continous action

yupei.wu 44 Oct 10, 2022
Code repository accompanying the paper "On Adversarial Robustness: A Neural Architecture Search perspective"

On Adversarial Robustness: A Neural Architecture Search perspective Preparation: Clone the repository: https://github.com/tdchaitanya/nas-robustness.g

Chaitanya Devaguptapu 4 Nov 10, 2022