Coded Speech Enhancement Model Using Auxiliary Utterance-Level Information
Audio Samples
This page exhibits audio samples for our submission 'Coded Speech Enhancement Model Using Auxiliary Utterance-Level Information' to EURASIP Journal on Audio, Speech, and Music Processing.
In Sec. 1, we present some samples of the proposed LCT-CSE model based on Opus codec to show the enhancement performance of the proposed LCT-CSE architecture across various bitrates.
In Sec. 2, some samples of DLM-based model in tandem coding enhancements are presented to highlight the generalisation capability of the proposed DLM information incorporation method, against MCT and LBT baselines.
In Appendix 1, , some samples of the proposed LCT-CSE model based on other widely used codecs (AMR-WB, LC3+, and EVS) are provied to show the consistent performance of the proposed LCT-CSE architecture across various codecs.
In Appendix 2, , audio samples used for both MUSHRA tests A and B are presented for possible broader reproducibility and accessibility.
All experiment parameters are as described in our submission.
1. Samples of the proposed LCT-CSE model on Opus codec
Samples of the proposed LCT-CSE model based on Opus codec are presented in this section to show the enhancement performance of the LCT-CSE architecture across various bitrates
Bitrate and Sample ID
Coded Speech
Enhanced Speech by the Proposed LCT-CSE Model
Clean Speech
6 kbps, Sample 1
6 kbps, Sample 2
6 kbps, Sample 3
9 kbps, Sample 1
9 kbps, Sample 2
9 kbps, Sample 3
12 kbps, Sample 1
12 kbps, Sample 2
12 kbps, Sample 3
16 kbps, Sample 1
16 kbps, Sample 2
16 kbps, Sample 3
2. Samples of the Tandem Coding
(a) The MCT baseline exhibits severe performance deterioration for coded speech signals that are heavily compressed by the previous (the first) code c.
(b) The LBT baseline exhibits sIginificant performance deterioration for coded speech signals that are distorted relatively mildly .
(c) The proposed DLM method consistently demonstrates improved performance and generalisation capability consistently across all encoding conditions .
2.1. Samples of the baselines and the proposed DLM method, for coded speech signals that are heavily compressed by the previous (the first) codec.
For coded speech signals that are heavily compressed by the previous (the first) codec, the MCT baseline samples sound more distorted compared to the LBT baseline and the proposed DLM method. This perceptual observation on samples reflects the consistent trend discussed in the submission.
Sample ID
Coded Speech
Enhanced Speech by MCT Baseline
Enhanced Speech by LBT Baseline
Enhanced Speech by the Proposed DLM
Clean Speech
Sample 1
Sample 2
Sample 3
Sample 4
Sample 5
Sample 6
2.2. Samples of the LBT baseline and the proposed DLM method, for coded speech signals that are distorted relatively mildly.
For coded speech signals that are distorted relatively mildly, the MCT baseline samples sound more distorted compared to the LBT baseline and the proposed DLM method. This perceptual observation on samples reflects the consistent trend discussed in the submission.
Sample ID
Coded Speech
Enhanced Speech by MCT Baseline
Enhanced Speech by LBT Baseline
Enhanced Speech by the Proposed DLM
Clean Speech
Sample 1
Sample 2
Sample 3
Sample 4
Sample 5
Sample 6
Appendix 1: Samples of the proposed LCT-CSE model on AMR-WB, LC3+, and EVS codecs
Samples of the proposed LCT-CSE model based on AMR-WB, LC3+, and EVS, codecs are presented in this section to show the enhancement performance of the LCT-CSE architecture across various bitrates.
A.1.1 Samples on AMR-WB codec
Bitrate and Sample ID
Coded Speech
Enhanced Speech by the Proposed LCT-CSE Model
Clean Speech
6.65 kbps, Sample 1
6.65 kbps, Sample 2
6.65 kbps, Sample 3
8.85 kbps, Sample 1
8.85 kbps, Sample 2
8.85 kbps, Sample 3
12.65 kbps, Sample 1
12.65 kbps, Sample 2
12.65 kbps, Sample 3
14.25 kbps, Sample 1
14.25 kbps, Sample 2
14.25 kbps, Sample 3
15.85 kbps, Sample 1
15.85 kbps, Sample 2
15.85 kbps, Sample 3
A.1.2 Samples on LC3+ codec
Bitrate and Sample ID
Coded Speech
Enhanced Speech by the Proposed LCT-CSE Model
Clean Speech
16 kbps, Sample 1
16 kbps, Sample 2
16 kbps, Sample 3
24 kbps, Sample 1
24 kbps, Sample 2
24 kbps, Sample 3
A.1.3 Samples on EVS codec
Bitrate and Sample ID
Coded Speech
Enhanced Speech by the Proposed LCT-CSE Model
Clean Speech
5.9 kbps, Sample 1
5.9 kbps, Sample 2
5.9 kbps, Sample 3
7.2 kbps, Sample 1
7.2 kbps, Sample 2
7.2 kbps, Sample 3
8.0 kbps, Sample 1
8.0 kbps, Sample 2
8.0 kbps, Sample 3
9.6 kbps, Sample 1
9.6 kbps, Sample 2
9.6 kbps, Sample 3
13.2 kbps, Sample 1
13.2 kbps, Sample 2
13.2 kbps, Sample 3
16.4 kbps, Sample 1
16.4 kbps, Sample 2
16.4 kbps, Sample 3
Appendix 2: Samples for MUSHAA Tests
Samples for the MUSHRA tests A & B are provided in this Section. Researchers seeking to reproduce these tests may request the complete audio labels and configuration details via email.
Test A