Multi-View Guided Multi-View Stereo
🎉 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2020) 🎉
Matteo Poggi* · Andrea Conti* · Stefano Mattoccia *joint authorship
data:image/s3,"s3://crabby-images/6d230/6d230e14240b35b51399481c744ced6bca0a62bf" alt="RGB image"
RGB image
data:image/s3,"s3://crabby-images/ec3fa/ec3fa9cfacb6cd2dc0950ca612b39352a9a5a2f9" alt="Dilated sparse depth hints"
Dilated sparse depth hints
data:image/s3,"s3://crabby-images/85a26/85a262dd72fa6891d3f92c5d7efdf4dd207cc2c2" alt="Prediction without hints"
Prediction without hints
data:image/s3,"s3://crabby-images/460cd/460cdc15bf9d0ead426250ac826682751928c8c9" alt="Prediction with hints"
Prediction with hints
Overview
This paper introduces a novel deep framework for dense 3D reconstruction from multiple image frames, leveraging a sparse set of depth measurements gathered jointly with image acquisition as showed in the image below.
data:image/s3,"s3://crabby-images/e2e95/e2e954f29f493b30d18443473685d227b2a4d5f3" alt="(a)"
(a)
data:image/s3,"s3://crabby-images/3048f/3048f0d01184bcc18031a23709737ca3b775ccdd" alt="(b)"
(b)
Given a deep multi-view stereo network, our framework uses such sparse depth hints to guide the neural network by modulating the plane-sweep cost volume built during the forward step. Such modulation happens following
$$ \mathcal{V}’_s(z_s) = \left[ 1 - v_s + v_s \cdot k \cdot \left( 1 - e^-\frac{z_s - z_s^*}{2c^2} \right) \right] $$
with $v_s$ and $z_s^*$ being respectively the binary mask $v$ and the depth hints map $z^*$ downsampled to resolution $s$ with nearest-neighbor interpolation. For further details we refer to the main paper.
Qualitative Results
data:image/s3,"s3://crabby-images/19746/19746f8bd5068decb5eb11f05e57f538c85d2e4e" alt=""
data:image/s3,"s3://crabby-images/2c19f/2c19faf57ddd3dd4f5923abb682dc2f879045f65" alt=""
data:image/s3,"s3://crabby-images/fc5bc/fc5bc3f3ff200e85121f5ac7001b98b7593ecfd1" alt=""
data:image/s3,"s3://crabby-images/f137c/f137cf17e54182d6197b3b42ccc9b209e54c2f89" alt=""
data:image/s3,"s3://crabby-images/a6610/a661038ed1e10092fd378c852667fa08979475da" alt=""
data:image/s3,"s3://crabby-images/504f8/504f800f0e8ae715d76a3e31ac06ff317ab22f7a" alt=""
data:image/s3,"s3://crabby-images/bfe81/bfe8175cda7706bfa5cbb8dbba9e2e41921ff6a2" alt=""
data:image/s3,"s3://crabby-images/996d8/996d8f52052b32cb66bacad4341645d03eb04a6f" alt=""
data:image/s3,"s3://crabby-images/87b6e/87b6ee41dd16a467e2f8257499dccadead12bd15" alt="RGB"
RGB
data:image/s3,"s3://crabby-images/99fe7/99fe76fc75fe8ee1243e00656b82323f8d23686c" alt="w/o Hints"
w/o Hints
data:image/s3,"s3://crabby-images/457e3/457e3425348797116d0a86bc5b00b0b19080538b" alt="with Hints"
with Hints
data:image/s3,"s3://crabby-images/511b2/511b2934368ad7b523489cf0ec60b8246e579eeb" alt="Ground Truth"
Ground Truth
Reference
@inproceedings{Poggi_2022_IROS,
title={Multi-View Guided Multi-View Stereo},
author={Poggi, Matteo and Conti, Andrea and Mattoccia, Stefano},
booktitle={IEEE/RSJ International Conference on Intelligent Robots and Systems},
note={IROS},
year={2022}
}