LCT-GAN: Lightweight, Causal, Transformer-based Network for Single-Channel Speech Enhancement



Audio Samples

This page exhibits audio samples with corresponding spectrogram plots for our submission 'Investigation on Transformer Ladder Architecture for Light Single-Channel Speech Enhancement' to EUSIPCO 2025. Here, we present some samples from both the 1. unseen synthetic test set generated from DNS3 (clean speeches are available as reference, and samples with various SNRs are provided) and the 2. DNS3 official blind real-recorded test set. Samples of predictive LCT and LCT-GAN are further compared basd on DNS3 official blind real-recorded testset to exhibit the contribution of 3. discriminators. All experiment parameters are as described in our submission. The comparison of the proposed LCT-GAN and DeepFilterNet 1, 2, & 3 in complexity and parameters are presented here for recap.

Complexity and Parameter Numbers of the Proposed LCT-GAN with Baselines
DeepFilterNet1 DeepFilterNet2 DeepFilterNet3 Proposed LCT-GAN
MACs [G/s] 0.35 0.36 0.36* 0.35
Paramters [M] 1.78 2.31 2.31* 0.14
* MACs and parameter numbers for DeepFilterNet3 are not explicitly reported, but it is expeced to be similar to DeepFilterNet2 due to very slight modifications [10].

1. DNS3 unseen synthetic test set samples

samples (1-8) with various SNRs from the unseen synthetic dataset are further presented. The synthetic dataset is generated from the DNS3 dataset. The proposed LCT-GAN is evaluated compared to DeepFilterNet 1, 2, and 3. Both noisy and clean files are provided for better evaluation.
a. Strong noise suppression and effect artefact elimination by the proposed LCT-GAN model are consistently demonstrated across all samples.
b. The proposed LCT-GAN model also shows superior speech preservation capability for some samples (the beginning of the Sample. 5, 6, and 7).
c. The eliminations of spectral artefacts by the proposed LCT-GAN model can be observed in the Sample. 2 and 4.
Samples Noisy DeepfilterNet1 DeepfilterNet2 DeepfilterNet3 Proposed LCT-GAN Clean
Sample 1
(SNR=-5dB)
Sample 2
(SNR=0dB)
Sample 3
(SNR=1dB)
Sample 4
(SNR=5dB)
Sample 5
(SNR=11dB)
Sample 6
(SNR=13dB)
Sample 7
(SNR=16dB)
Sample 8
(SNR=19dB)



2. DNS3 blind test set samples

Some samples (9-12) from the DNS3 blind real-recorded test set enhanced by the proposed LCT-GAN are shown in comparison to DeepFilterNet 1, 2, and 3. Noisy files are also provided for evaluation.
a. Our LCT-GAN model shows strong noise suppression capability and effective artefact elimination (observed in all samples, especially obvious in sample 9).
b. The attenuated speech components in our model (Due to the trade-off between noise suppression and signal preservation in some specific noise scenarios) are also shown (the word 'my' at around 6.3 seconds in Sample 12), indicating areas for further refinement in future works.
Samples Noisy DeepfilterNet1 DeepfilterNet2 DeepfilterNet3 Proposed LCT-GAN
Sample 9
Sample 10
Sample 11
Sample 12


3. Samples with and without discriminators

Some samples (13-16) enhanced by the proposed predictive LCT and LCT-GAN based on the DNS3 blind real-recorded test set are shown to evaluate the contribution of discriminators.
a. Sample 13 and 14 indicate that discriminators contribute to improved signal preservation.
b. LCT-GAN model also shows strong noise suppression capability and effective artefact elimination (observed in sample 15 and 16).
Samples Noisy LCT Proposed LCT-GAN
Sample 13
Sample 14
Sample 15
Sample 16