Does denoising help, and does it survive the codecs?

One noisy speech clip, cleaned with DeepFilterNet3 and then sent through two low-bitrate voice codecs. Listen at each stage, see it in the spectrograms, and check the numbers below.

Clip5.0 s, 48 kHz monoDenoiserDeepFilterNet3Codec ACodec2 3200 bpsCodec BOpus WB 12 kbpsPacket loss15%

What we found

Bottom line

On this clip, DeepFilterNet3 makes the speech clearly cleaner. The full +6.4 dB SNR gain applies before transmission; through Codec2 and Opus it narrows to about +1 to +2.5 dB as the codec takes over. Denoising runs faster than real time, so it is cheap to add ahead of a live link.

One 5-second example, so treat it as an early signal to test more widely, not a finished benchmark.

SNR improvement from denoising

Direct (no codec)

+6.4 dB

Through Codec2 3200

+1.3 dB

Through Opus 12 kbps

+2.5 dB

Measured in each path against the noisy signal carried the same way.

Hear it for yourself

In each stage, compare Noisy (the input) with DeepFilterNet3 (the denoised version, marked Under test). Clean is the ideal reference.

Only one clip plays at a time, so play Noisy then DeepFilterNet3 back to back for a clean A/B. The denoised track is a little quieter, so nudge your volume up.

Direct denoising

No transmission codec

DeepFilterNet3 applied straight to the noisy capture, before any codec.

Noisy

Input recording

DeepFilterNet3Under test

Denoised output

Clean

Reference target

Spectrogram of the noisy track (input recording): brighter areas are more sound energy; haze between words is noise. — Noisy

Spectrogram of the deepfilternet3 track (denoised output): brighter areas are more sound energy; haze between words is noise. — DeepFilterNet3

Spectrogram of the clean track (reference target): brighter areas are more sound energy; haze between words is noise. — Clean

Noisy

DeepFilterNet3

Clean

quieterlouder (energy)vertical = pitch, 0 to 12 kHz · horizontal = time, 0 to 5 s

After Codec2 (3200 bps)

3.2 kbps vocoder

Each signal through Codec2 at 3200 bps, a very low bitrate vocoder used on HF/VHF radio that rebuilds speech from parameters.

Noisy

Input through Codec2

DeepFilterNet3Under test

Denoised through Codec2

Clean

Reference through Codec2

Spectrogram of the noisy track (input through codec2): brighter areas are more sound energy; haze between words is noise. — Noisy

Spectrogram of the deepfilternet3 track (denoised through codec2): brighter areas are more sound energy; haze between words is noise. — DeepFilterNet3

Spectrogram of the clean track (reference through codec2): brighter areas are more sound energy; haze between words is noise. — Clean

Noisy

DeepFilterNet3

Clean

After Opus (12 kbps, 15% loss)

Wideband, packet loss

Each signal through Opus wideband at 12 kbps with 15% packet loss, rebuilt by Opus's neural concealment (FARGAN).

Noisy

Input through Opus

DeepFilterNet3Under test

Denoised through Opus

Clean

Reference through Opus

Spectrogram of the noisy track (input through opus): brighter areas are more sound energy; haze between words is noise. — Noisy

Spectrogram of the deepfilternet3 track (denoised through opus): brighter areas are more sound energy; haze between words is noise. — DeepFilterNet3

Spectrogram of the clean track (reference through opus): brighter areas are more sound energy; haze between words is noise. — Clean

Noisy

DeepFilterNet3

Clean

The measurements

SNR and loudness are measured against the clean reference. The denoised row is highlighted in each path.

Track	Duration (s)	Loudness (dBFS)	SNR (dB)	Δ SNR	SI-SDR (dB)	Δ SI-SDR
1Direct denoising
Noisy	5.00	-15.4	-0.3	baseline	-0.3	baseline
DeepFilterNet3	4.97	-20.9	6.1	▲ +6.4	5.2	▲ +5.5
Clean	5.00	-18.6	reference	n/a	reference	n/a
2After Codec2 (3200 bps)
Noisy	5.00	-17.7	-3.3	baseline	-29.8	n/a
DeepFilterNet3	4.96	-21.0	-2.0	▲ +1.3	-39.5	see note *
Clean	5.00	-20.1	-2.5	n/a	-30.6	n/a
3After Opus (12 kbps, 15% loss)
Noisy	5.04	-16.2	-4.5	baseline	-29.6	n/a
DeepFilterNet3	4.98	-20.7	-2.1	▲ +2.5	-52.7	see note *
Clean	5.04	-18.9	-2.9	n/a	-35.1	n/a

Δ is the denoised minus the noisy track in the same path; positive means denoising helped. * SI-SDR is unreliable through a codec, which rebuilds the waveform, so judge the codec stages by SNR and by ear.

What the metrics mean

SNR (signal-to-noise ratio): How far the speech sits above the noise, measured against the clean reference. Higher is better; about every +3 dB halves the noise.
SI-SDR: A stricter, sample-aligned similarity to the clean reference. Very sensitive to timing, so it stops being meaningful once a codec rebuilds the waveform (see the note above the table).
Δ (delta): The denoised track minus the noisy track within the same path. A positive delta means denoising helped at that stage.
RMS (dBFS): Average loudness of the clip. 0 dBFS is the digital maximum, so more negative is quieter.
Codec2 3200: A 3.2 kbps vocoder for HF/VHF radio. It transmits speech parameters and resynthesizes the voice.
Opus WB 12 kbps, 15% loss: Wideband Opus at 12 kbps with 15% of packets dropped and reconstructed by its neural concealment (FARGAN).