Multi-View Guided Multi-View Stereo

🎉 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2020) 🎉

Contents

Matteo Poggi* · Andrea Conti* · Stefano Mattoccia *joint authorship

Multi-View Guided Multi-View Stereo. Deep MVS network struggle at generalizing from synthetic to real images, yielding inaccurate depth maps and poor 3D reconstruction. By guiding the network with a aset of sparse depth measurements, aggregated over the multiple views, we can greatly ameliorate the results. Sparse depth hints are densified by a 2 x 2 dilation filter to ease visualization.

Overview

This paper introduces a novel deep framework for dense 3D reconstruction from multiple image frames, leveraging a sparse set of depth measurements gathered jointly with image acquisition as showed in the image below.

Hints filtering and aggregation. Depth hints from many views (left) can be aggregated on the reference image viewpoint by means of pose information.

Given a deep multi-view stereo network, our framework uses such sparse depth hints to guide the neural network by modulating the plane-sweep cost volume built during the forward step. Such modulation happens following

$$ \mathcal{V}’_s(z_s) = \left[ 1 - v_s + v_s \cdot k \cdot \left( 1 - e^-\frac{z_s - z_s^*}{2c^2} \right) \right] $$

with $v_s$ and $z_s^*$ being respectively the binary mask $v$ and the depth hints map $z^*$ downsampled to resolution $s$ with nearest-neighbor interpolation. For further details we refer to the main paper.

Qualitative Results

Qualitatives. Few qualitative samples of MVSNet trained with and without modulation on Blended-MVS and tested on Blended-MVS and DTU

Reference

@inproceedings{Poggi_2022_IROS,
  title={Multi-View Guided Multi-View Stereo},
  author={Poggi, Matteo and Conti, Andrea and Mattoccia, Stefano},
  booktitle={IEEE/RSJ International Conference on Intelligent Robots and Systems},
  note={IROS},
  year={2022}
}