Contents

Range-Agnostic Multi-View Depth Estimation With Keyframe Selection

Andrea Conti · Matteo Poggi · Valerio Cambareri · Stefano Mattoccia

[Paper] [Code] [Demo]

Overview

Multi-View 3D reconstruction techniques process a set of source views and a reference view to yield an estimated depth map for the latter. Unluckily, state-of-the-art frameworks

  1. require to know a priori the depth range of the scene, in order to sample a set of depth hypotheses and build a meaningful cost volume.
  2. do not take into account the keyframes selection.

In this paper, we propose a novel framework free from prior knowledge of the scene depth range and capable of distinguishing the most meaningful source frames. The proposed method unlocks the capability to apply multi-view depth estimation to a wider range of scenarios like large-scale outdoor environments, top-view buildings and large-scale outdoor environments.

Method

Our method relies on an iterative approach: starting from a zero-initialized depth map we extract geometrical correlation cues and update the prediction. At each iteration we feed also information extracted from the reference view only (the one on which we desire to compute depth). Moreover, at each iteration we use a different source view to exploit multi-view information in a round-robin fashion. For more details please refer to the paper.

Framework Description. Our model instantiates an initial depth map and builds a pair-wise correlation table for each source image. Then, deformable sampling is iteratively performed over it, and the depth state is updated accordingly. Final depth prediction is upsampled through convex upsampling.

Qualitatives

We provide a wide set of qualitative results from different scenarios, since our approach can easily generalize not being tied to a specific depth range known a priori.

Blended

The Blended dataset is a large dataset providing ground-truth data for wide aerial scenes and short range objects. Our framework provides really accurate predictions without knowing the scene depth range at all.

TartanAir

The TartanAir Dataset is a large synthetic dataset symulating the flight of a drone in a wild scenarios. Moreover, the in the same scene the depth range of each view may be extremelly various. We report a set of examples from indoor and underwater scenes.

UnrealStereo4K

The capability of our framework to not being tied to a specific depth range allows the deployment in stereo scenarios without any other information required other than the rectified stereo frames and the baseline information. To assess this capability we test also on the UnrealStereo4K Dataset.

DTU

Finally, we test also on DTU, one of the most commonly used dataset in the Multi-View Stereo scenario. This dataset is characterized by a set of small objects framed from multiple views by means of a robotic arm.

Reference

@InProceedings{Conti_2024_3DV,
    author    = {Conti, Andrea and Poggi, Matteo and Cambareri, Valerio and Mattoccia, Stefano},
    title     = {Range-Agnostic Multi-View Depth Estimation With Keyframe Selection},
    booktitle = {International Conference on 3D Vision},
    month     = {March},
    year      = {2024},
}
}