6533b835fe1ef96bd129e9b1
RESEARCH PRODUCT
Implicit Wiener Filtering for Speech Enhancement In Non-Stationary Noise
Daniel RomeroRahul Jaiswalsubject
Noise powerComputer scienceSpeech recognitionWiener filterSpectral densityComputer Science::Computation and Language (Computational Linguistics and Natural Language and Speech Processing)Context (language use)Background noiseSpeech enhancementNoisesymbols.namesakeComputer Science::SoundFrequency domainsymbolsdescription
Speech quality is degraded in the presence of background noise, which reduces the quality of experience (QoE) of the end-user and therefore motivates the usage of speech enhancement algorithms. A large number of approaches have been proposed in this context. However most of them have focused on the case where the noise is stationary, an assumption that seldom holds in practice. For instance, in mobile telephony, noise sources with a marked non-stationary spectral signature include vehicles, machines, and other speakers to name a few. On the other hand, the usage of frequency-domain information in existing algorithms for speech enhancement in non-stationary noise environments can be made more effective by leveraging the increased flexibility introduced by implicit Wiener filters, which allow the control of the spectral reconstruction of the speech signal through the adjustment of hyperparameters. To address these limitations, the present paper develops an algorithm that recursively estimates the noise power spectral density and reconstructs the target speech signal in the frequency domain by means of an implicit Wiener filter with judiciously selected hyperparameters. The recursive noise estimation approach relies on the past and the present power spectral values. To evaluate the performance of the speech enhancement algorithm, speech uttered by a male and a female speaker degraded by non-stationary noise produced e.g. by babbling, cars, street noise, trains, restaurants, and airport noise. To this end, the NOIZEUS corpus is used. Objective speech quality measures such as the log-likelihood ratio (LLR), the cepstral distance (CD), and the weighted spectral slope distance (WSS) are evaluated for the enhanced speech signals and compared to the conventional spectral subtraction method. Results demonstrate that the proposed algorithm provides consistent and improved enhancement performance with all tested noise types.
year | journal | country | edition | language |
---|---|---|---|---|
2021-05-21 | 2021 11th International Conference on Information Science and Technology (ICIST) |