Unsupervised Confidence for LiDAR Depth Maps and Applications

🎉 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2020) 🎉

Contents

Andrea Conti · Matteo Poggi · Filippo Aleotti · Stefano Mattoccia

Unsupervised Confidence for LiDAR Depth Maps and Applications. Given an image (a) and a LiDAR point cloud (b) the projection of the latter over the image plane does not properly handle occlusions between the two points of view, assigning wrong depth values to the foreground (c). Our method (d) learns to remove these outliers reliably and without supervision.

Overview

Depth perception is pivotal in many fields, such as robotics and autonomous driving, to name a few. Consequently, depth sensors such as LiDARs rapidly spread in many applications. The 3D point clouds generated by these sensors must often be coupled with an RGB camera to understand the framed scene semantically. Usually, the former is projected over the camera image plane, leading to a sparse depth map. Unfortunately, this process, coupled with the intrinsic issues affecting all the depth sensors, yields noise and gross outliers in the final output. As an example, in the image below the outliers formation process due to visual occlusions between camera and depth sensor is showed.

Outliers formation process due to occlusion. When a LiDAR and an RGB camera acquire from different viewpoints, projecting the point cloud into a depth map (a) on the image (b) introduces outliers (blue oval), e.g. points visible by the LiDAR occluced to the camera (red), yet projected near foreground points visible to both (green).

We propose an effective unsupervised framework aimed at explicitly addressing this issue by learning to estimate the confidence of the LiDAR sparse depth map and thus allowing for filtering out the outliers.

Proposed Architecture. A convolutional encoder (orange) extracts features at different resolutions. We query features fro each pixel with a valid LiDAR value and concatenate them (+) in a vector, fed to an MLP (blue) to estimate confidence only in the LiDAR valid coordinates.

To train our framework we model the confidence of LiDAR depth $d$ assuming a Gaussian Distribution and minimize the negative log-likelihood function.

$$ \mathcal{L}_G = - \ln \left( \frac{1}{\sigma\sqrt{2\pi}}e^{-\frac{(d - d^*)^2}{2\sigma^2}} \right) $$

which can be rewritten as follows

$$ \mathcal{L}_G \approx \ln(\sigma) + \frac{(d - d^*)^2}{2\sigma^2} $$

To apply the loss function above the preduction of both $d^, \sigma$ is required. However, doing so means learn the confidence $\sigma$ of the network output $d^$ and this is not our goal. Thus, instead of predict $d^*$ we employ a proxy label computed as follows representing a plausibly correct depth for each original LiDAR depth value.

$$ d^*_x = \min \ \{ d : d \in P(x), d > 0 \} $$

Where $x$ is a valid coordinate and $P(x)$ a patch of size $N \times N$. Using the minimum depth value correctly select the foreground points as reliable in presence of occlusions. As a drawback, it may lead to indiscriminately detecting as outliers most of the pixels in the background, even if not occluded. However, in practice, we will show that the network network is not extremelly affected by this approssimation that is on the other hand fast and unsupervised. Further details are described in our paper.

Qualitative Results

In this section we report a small set of examples, for each one we show respectively the image, the raw lidar, the lidar filtered with our approach and our sparse confidence map.

KITTI Drive 05 26/09/2011.

KITTI Drive 22 26/09/2011.

KITTI Drive 71 29/09/2011.

Reference

@inproceedings{aconti2022lidarconf,
  title={Unsupervised confidence for LiDAR depth maps and applications},
  author={Conti, Andrea and Poggi, Matteo and Aleotti, Filippo and Mattoccia, Stefano},
  booktitle={IEEE/RSJ International Conference on Intelligent Robots and Systems},
  note={IROS},
  year={2022}
}