Depression Assessment by Fusing High and Low Level Features from Audio, Video, and Text

6533b82efe1ef96bd1293e83

RESEARCH PRODUCT

Depression Assessment by Fusing High and Low Level Features from Audio, Video, and Text

Anastasia Pampouchidou Georgios Giannakakis Alexandros Roniotis Olympia Simantiraki Matthew Pediaditis Fabrice Meriaudeau Fan Yang Amir Fazlollahi Dimitris Manousos Kostas Marias Panagiotis G. Simos Manolis Tsiknakis

subject

Computer science Speech recognition Posterior probability multimodal fusion ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION Image processing 02 engineering and technology [INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI][SPI]Engineering Sciences [physics]AVEC 2016 Histogram 0202 electrical engineering electronic engineering information engineering Feature (machine learning)[ SPI ] Engineering Sciences [physics]Affective computing affective computing [ INFO.INFO-AI ] Computer Science [cs]/Artificial Intelligence [cs.AI]speech processing [SPI.ACOU]Engineering Sciences [physics]/Acoustics [physics.class-ph]Modality (human–computer interaction)[ SPI.ACOU ] Engineering Sciences [physics]/Acoustics [physics.class-ph]pattern recognition 020206 networking & telecommunications Speech processing image processing Statistical classification depression assessment 13. Climate action Pattern recognition (psychology)020201 artificial intelligence & image processing

description

International audience; Depression is a major cause of disability world-wide. The present paper reports on the results of our participation to the depression sub-challenge of the sixth Audio/Visual Emotion Challenge (AVEC 2016), which was designed to compare feature modalities ( audio, visual, interview transcript-based) in gender-based and gender-independent modes using a variety of classification algorithms. In our approach, both high and low level features were assessed in each modality. Audio features were extracted from the low-level descriptors provided by the challenge organizers. Several visual features were extracted and assessed including dynamic characteristics of facial elements (using Landmark Motion History Histograms and Landmark Motion Magnitude), global head motion, and eye blinks. These features were combined with statistically derived features from pre-extracted features ( emotions, action units, gaze, and pose). Both speech rate and word-level semantic content were also evaluated. Classification results are reported using four different classification schemes: i) gender-based models for each individual modality, ii) the feature fusion model, ii) the decision fusion model, and iv) the posterior probability classification model. Proposed approaches outperforming the reference classification accuracy include the one utilizing statistical descriptors of low-level audio features. This approach achieved f1-scores of 0.59 for identifying depressed and 0.87 for identifying notdepressed individuals on the development set and 0.52/0.81, respectively for the test set.

year	journal	country	edition	language
2016-10-16

https://hal-univ-bourgogne.archives-ouvertes.fr/hal-01464064