6533b81ffe1ef96bd1276d9f

RESEARCH PRODUCT

Indoor Scene Understanding using Non-Conventional Cameras

Clara Fernandez-labrador

subject

Intelligence artificielle - Robotique mobile - Vision par ordinateur[INFO.INFO-TI] Computer Science [cs]/Image Processing [eess.IV]Artificial IntelligenceVision par ordinateur[INFO.INFO-TI]Computer Science [cs]/Image Processing [eess.IV]Robotique mobile[INFO.INFO-RB]Computer Science [cs]/Robotics [cs.RO]Computer visionIntelligence artificielleMobile roboticsMobile robotics - Artificial Intelligence - computer vision[INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI]

description

Humans understand environments effortlessly, under a wide variety of conditions, by the virtue of visual perception. Computer vision for similar visual understanding is highly desirable, so that machines can perform complex tasks by interacting with the real world, to assist or entertain humans. In this regard, we are particularly interested in indoor environments, where humans spend nearly all their lifetime.This thesis specifically addresses the problems that arise during the quest of the hierarchical visual understanding of indoor scenes.On the side of sensing the wide 3D world, we propose to use non-conventional cameras, namely 360º imaging and 3D sensors. On the side of understanding, we aim at three key aspects: room layout estimation; object detection, localization and segmentation; and object category shape modeling, for which novel and efficient solutions are provided.The focus of this thesis is on the following underlying challenges. First, the estimation of the 3D room layout from a single 360º image is investigated, which is used for the highest level of scene modelling and understanding. We exploit the assumption of Manhattan World and deep learning techniques to propose models that handle invisible parts of the room on the image, generalizing to more complex layouts. At the same time, new methods to work with 360º images are proposed, highlighting a special convolution that compensates the equirectangular image distortions.Second, considering the importance of context for scene understanding, we study the problem of object localization and segmentation, adapting the problem to leverage 360º images. We also exploit layout-objects interaction to lift detected 2D objects into the 3D room model.The final line of work of this thesis focuses on 3D object shape analysis. We use an explicit modelling of non-rigidity and a high-level notion of object symmetry to learn, in an unsupervised manner, 3D keypoints that are order-wise correspondent as well as geometrically and semantically consistent across objects in a category.Our models advance state-of-the-art on the aforementioned tasks, when each evaluated on respective reference benchmarks.

https://hal.archives-ouvertes.fr/tel-03097628v2/document