説明
Dense optical flow estimation is complex and time consuming, with state-of-the-art methods relying either on large synthetic data sets or on pipelines requiring up to a few minutes per frame pair. In this paper, we address the problem of optical flow estimation in the automotive scenario in a self-supervised manner. We argue that optical flow can be cast as a geometrical warping between two successive video frames and devise a deep architecture to estimate such transformation in two stages. First, a dense pixel-level flow is computed with a projective bootstrap on rigid surfaces. We show how such global transformation can be approximated with a homography and extend spatial transformer layers so that they can be employed to compute the flow field implied by such transformation. Subsequently, we refine the prediction by feeding a second, deeper network that accounts for moving objects. A final reconstruction loss compares the warping of frame $X_{t}$ with the subsequent frame $X_{t+1}$ and guides both estimates. The model has the speed advantages of end-to-end deep architectures while achieving competitive performances, both outperforming recent unsupervised methods and showing good generalization capabilities on new automotive data sets.
収録刊行物
-
- IEEE Transactions on Intelligent Transportation Systems
-
IEEE Transactions on Intelligent Transportation Systems 20 3294-3302, 2019-09-01
Institute of Electrical and Electronics Engineers (IEEE)
- Tweet
詳細情報 詳細情報について
-
- CRID
- 1871428068059023232
-
- ISSN
- 15580016
- 15249050
-
- HANDLE
- 11380/1167357
-
- データソース種別
-
- OpenAIRE