0000000000095961
AUTHOR
Zongwei Wu
Depth Attention for Scene Understanding
Deep learning models can nowadays teach a machine to realize a number of tasks, even with better precision than human beings. Among all the modules of an intelligent machine, perception is the most essential part without which all other action modules have difficulties in safely and precisely realizing the target task under complex scenes. Conventional perception systems are based on RGB images which provide rich texture information about the 3D scene. However, the quality of RGB images highly depends on environmental factors, which further influence the performance of deep learning models. Therefore, in this thesis, we aim to improve the performance and robustness of RGB models with comple…
QaQ: Robust 6D Pose Estimation via Quality-Assessed RGB-D Fusion
RGB-D 6D pose estimation has recently drawn great research attention thanks to the complementary depth information. Whereas, the depth and the color image are often noisy in real industrial scenarios. Therefore, it becomes challenging for many existing methods that fuse equally RGB and depth features. In this paper, we present a novel fusion design to adaptively merge RGB-D cues. Specifically, we created a Qualityassessment block that estimates the global quality of the input modalities. This quality represented as an α parameter is then used to reinforce the fusion. We have thus found a simple and effective way to improve the robustness to low-quality inputs in terms of Depth and RGB. Exte…
Modality-Guided Subnetwork for Salient Object Detection
Recent RGBD-based models for saliency detection have attracted research attention. The depth clues such as boundary clues, surface normal, shape attribute, etc., contribute to the identification of salient objects with complicated scenarios. However, most RGBD networks require multi-modalities from the input side and feed them separately through a two-stream design, which inevitably results in extra costs on depth sensors and computation. To tackle these inconveniences, we present in this paper a novel fusion design named modality-guided subnetwork (MGSnet). It has the following superior designs: 1) Our model works for both RGB and RGBD data, and dynamically estimating depth if not availabl…
RGB-Event Fusion for Moving Object Detection in Autonomous Driving
Moving Object Detection (MOD) is a critical vision task for successfully achieving safe autonomous driving. Despite plausible results of deep learning methods, most existing approaches are only frame-based and may fail to reach reasonable performance when dealing with dynamic traffic participants. Recent advances in sensor technologies, especially the Event camera, can naturally complement the conventional camera approach to better model moving objects. However, event-based works often adopt a pre-defined time window for event representation, and simply integrate it to estimate image intensities from events, neglecting much of the rich temporal information from the available asynchronous ev…
OLF : RGB-D Adaptive Late Fusion for Robust 6D Pose Estimation
RGB-D 6D pose estimation has recently gained significant research attention due to the complementary information provided by depth data. However, in real-world scenarios, especially in industrial applications, the depth and color images are often more noisy. Existing methods typically employ fusion designs that equally average RGB and depth features, which may not be optimal. In this paper, we propose a novel fusion design that adaptively merges RGB-D cues. Our approach involves assigning two learnable weight α 1 and α 2 to adjust the RGB and depth contributions with respect to the network depth. This enables us to improve the robustness against low-quality depth input in a simple yet effec…
Depth-Adapted CNN for RGB-D cameras
Conventional 2D Convolutional Neural Networks (CNN) extract features from an input image by applying linear filters. These filters compute the spatial coherence by weighting the photometric information on a fixed neighborhood without taking into account the geometric information. We tackle the problem of improving the classical RGB CNN methods by using the depth information provided by the RGB-D cameras. State-of-the-art approaches use depth as an additional channel or image (HHA) or pass from 2D CNN to 3D CNN. This paper proposes a novel and generic procedure to articulate both photometric and geometric information in CNN architecture. The depth data is represented as a 2D offset to adapt …
Robust RGB-D Fusion for Saliency Detection
Efficiently exploiting multi-modal inputs for accurate RGB-D saliency detection is a topic of high interest. Most existing works leverage cross-modal interactions to fuse the two streams of RGB-D for intermediate features' enhancement. In this process, a practical aspect of the low quality of the available depths has not been fully considered yet. In this work, we aim for RGB-D saliency detection that is robust to the low-quality depths which primarily appear in two forms: inaccuracy due to noise and the misalignment to RGB. To this end, we propose a robust RGB-D fusion method that benefits from (1) layer-wise, and (2) trident spatial, attention mechanisms. On the one hand, layer-wise atten…