6533b86cfe1ef96bd12c8126

RESEARCH PRODUCT

Deep multimodal fusion for semantic image segmentation: A survey

Desire SidibeOlivier MorelYifei ZhangFabrice Meriaudeau

subject

Computer science02 engineering and technologyMachine learningcomputer.software_genre0202 electrical engineering electronic engineering information engineeringImage fusionSegmentationmutimodal fusionImage segmentationImage fusionHeuristicbusiness.industryDeep learning[INFO.INFO-CV]Computer Science [cs]/Computer Vision and Pattern Recognition [cs.CV]Deep learning020207 software engineeringImage segmentationSemantic segmentationVariety (cybernetics)Multi-modal[INFO.INFO-TI]Computer Science [cs]/Image Processing [eess.IV]Signal ProcessingBenchmark (computing)020201 artificial intelligence & image processingComputer Vision and Pattern RecognitionArtificial intelligencePerformance improvementbusinesscomputer

description

International audience; Recent advances in deep learning have shown excellent performance in various scene understanding tasks. However, in some complex environments or under challenging conditions, it is necessary to employ multiple modalities that provide complementary information on the same scene. A variety of studies have demonstrated that deep multimodal fusion for semantic image segmentation achieves significant performance improvement. These fusion approaches take the benefits of multiple information sources and generate an optimal joint prediction automatically. This paper describes the essential background concepts of deep multimodal fusion and the relevant applications in computer vision. In particular, we provide a systematic survey of multimodal fusion methodolo-gies, multimodal segmentation datasets, and quantitative evaluations on the benchmark datasets. Existing fusion methods are summarized according to a common taxonomy: early fusion, late fusion, and hybrid fusion. Based on their performance, we analyze the strengths and weaknesses of different fusion strategies. Current challenges and design choices are discussed, aiming to provide the reader with a comprehensive and heuristic view of deep multimodal image seg-mentation.

https://doi.org/10.1016/j.imavis.2020.104042