/images/avatar.jpg

Range-Agnostic Multi-View Depth Estimation With Keyframe Selection

Methods for 3D reconstruction from posed frames require prior knowledge about the scene metric range, usually to recover matching cues along the epipolar lines and narrow the search range. However, such prior might not be directly available or estimated inaccurately in real scenarios – e.g., outdoor 3D reconstruction from video sequences – therefore heavily hampering performance. In this paper, we focus on multi-view depth estimation without requiring prior knowledge about the metric range of the scene by proposing an efficient and purely 2D framework that reverses the depth estimation and matching steps order. Moreover, we demonstrate the capability of our framework to provide rich insights about the quality of the views used for prediction. We achieve state-of-the-art performance on Blended and TartanAir, two challenging benchmarks featuring posed video frames in various scenarios, and demonstrate generalization capabilities and stereo perception applicability on UnrealStereo4K. Finally, we show that our framework is accurate in controlled environments with fixed depth ranges, such as those featured in the DTU dataset.

Sparsity Agnostic Depth Completion

State-of-the-art depth completion approaches yield accurate results only when processing a specific density and distribution of input points, i.e. the one observed during training, narrowing their deployment in real use cases. We present a framework robust to uneven distributions and extremely low densities by structure trained with a fixed pattern and density as competitors

Unsupervised Confidence for LiDAR Depth Maps and Applications

Depth perception is pivotal in many fields, such as robotics and autonomous driving, to name a few. Conse- quently, depth sensors such as LiDARs rapidly spread in many applications. The 3D point clouds generated by these sensors must often be coupled with an RGB camera to understand the framed scene semantically. Usually, the former is projected over the camera image plane, leading to a sparse depth map. Unfortunately, this process, coupled with the intrinsic issues affecting all the depth sensors, yields noise and gross outliers in the final output. Purposely, in this paper, we propose an effective unsupervised framework aimed at explicitly addressing this issue by learning to estimate the confidence of the LiDAR sparse depth map and thus allowing for filtering out the outliers. Experimental results on the KITTI dataset highlight that our framework excels for this purpose. Moreover, we demonstrate how this achievement can improve a wide range of tasks.

Active Stereo Without Pattern Projector

This paper proposes a novel framework integrating the principles of active stereo in standard passive cameras, yet in the absence of a physical pattern projector. Our methodology virtually projects a pattern over left and right images, according to sparse measurements obtained from a depth sensor. Any of such devices can be seamlessly plugged into our framework, allowing for the deployment of a virtual active stereo setup in any possible environments overcoming the limitation of physical patterns, such as limited working range. Exhaustive experiments on indoor/outdoor datasets, featuring both long and close-range, support the seamless effectiveness of our approach, boosting the accuracy of both stereo algorithms and deep networks.

Boosting Multi-Modal Unsupervised Domain Adaptation for LiDAR Semantic Segmentation by Self-Supervised Depth Completion

LiDAR semantic segmentation is receiving increased attention due to its deployment in autonomous driving applications. As LiDARs come often with other sensors such as RGB cameras, multi-modal approaches for this task have been developed, which however suffer from the domain shift problem as other deep learning approaches. To address this, we propose a novel Unsupervised Domain Adaptation (UDA) technique for multi-modal LiDAR segmentation. Unlike previous works in this field, we leverage depth completion as an auxiliary task to align features extracted from 2D images across domains, and as a powerful data augmentation for LiDARs. We validate our method on three popular multi-modal UDA benchmarks and we achieve better performances than other competitors.

Rotation Quaternions

A quaternion is a 4-tuple with which is possible to obtain a concise and efficient representation of a rotation. The set of quaternions together with the two operations of addition and multiplication form a non-commutative ring.

Weighted Linear Regression

If you are here there are high chances you already know how a simple linear regression works, it is the first and simplest algorithm you meet you your machine learning journey, but let's recap since it will be useful to later introduce its weighted form. Let's say that you have a set of values $X$ and for each of them a _target_ value $Y$, if you plot them can be easily seen that they could be approximated by a simple straight line.