Improving Speech Intelligibility Using Psychoacoustic Noise Reduction

Overview

In the following sections various results of different noise reduction algorithms are presented. Starting from the classical Wiener Filter approach, the grade of artefacts in the processed signal are further and further reduced, while trying to keep the speech intelligibility as and noise reduction as high as possible.



Download the thesis (in german): thesis.

Clear Speech and Noisy Signals

The spectrograms below show the clear speech signal, the noisy signal containing the clear speech signal and added white noise and another noisy signal created by superposition of the clear speech signal and a jackhammer noise. Both noisy signal feature a signal-to-noise ratio of 5dB. Adding white noise to the clear speech signal represents the stationary, adding the jackhammer noise the transient case of undesired added signals. Noise reduction aims to reconstruct the unknown clear speech signal as good as possible to increase speech intelligibility. Having a phone conversation on your mobile, when your interlocutor is facing a noisy environment and you can hardly understand him, is one of many everyday life examples. The approaches are based on estimates of the clear speech and noise component of the noisy signal. The quality of the estimations determines the intelligibility of the processed signal, which ideally is noise free and close to the original clear speech signal, i.e. the interculator's voice.

Conventional noise reduction algorithms suffer from the very poor quality of the estimation of the noise component especially for frequency components with low signal-to-noise ratio. Mostly theses frequency are located in the upper part of the frequency scale containing human voice, i.e > 4kHz. The energy of the voice of a human is unevenly distributed along the spectrum. The main part of the energy (90% and more) occurs in the lower band of the spectrum (80Hz to 4kHz). Thus the signal-to-noise ratios for the upper band are rather low, which easily stands to reason considering white noise, whose energy distribution is homogenous distributed over the whole frequency scale. The effect of erroneous estimations of the noise component yields the so called 'musical noise', which are isolated peaks in the spectrum. Each peak represents a sinusoidal tone being perceived as a very unpleasant sound.
a) clear speech
b) noisy signal w/ white noise
c) noisy signal w/
jackhammer noise

Conventional Noise Reduction

Wiener Filter

d) processed signal from b)
using wiener filter
e) processed signal from c)
using wiener filter

Two Stage Noise Reduction w/ Psychoacoustic Model

Threshold of Hearing Based Filter Rule

f) processed signal
(noisy signal w/ white noise)
using threshold based filter weights
g) processed signal
(noisy signal w/ white noise)
using threshold based filter weights

Excitation Based Filter Rule

h) processed signal
(noisy signal w/ white noise)
using excitation based filter weights
i) processed signal
(noisy signal w/ white noise)
using excitation based filter weights

Loudness Based Filter Rule

j) processed signal
(noisy signal w/ white noise)
using loudness based filter weights
k) processed signal
(noisy signal w/ white noise)
using loudness based filter weights

Noise Reduction w/ Combined Loudness and Excitation Based Filtering

In the spectrograms below you can see the denoised signal of a noise reduction system using loudness and excitation based two stage filtering. The second stage is based on the psychoacoustical model of the human hearing. This was just a quick trial to combine the agressive loudness based filter rule for frequencies below 1.6kHz and an excitation based rule for frequencies above. For the white noise case (left spectrogram) this disruptive change of the filter rule at 1.6kHz will introduce a hearable tone containing remaining noise and speech components, which also is easily seen in the spectrogram. For future filter rules a combined filter rule without singularities is desirable.
l) processed signal
(noisy signal w/ white noise)
using two combined psychoacoustic
filter weights
m) processed signal
(noisy signal w/ white noise)
using two combined psychoacoustic
filter weights

Comparison

Perceptive Speech Quality vs. Perceptive Grade of Speech Distortions

Wiener Filter Combined Filter Rule
Perceivable
Musical Noise
- - ++
Perceptive
Speech Quality
++ - -
damped high
frequency components > 8kHz
➝ muffled sound
           Back to top of the page