We propose cNAC-SE, a generative speech enhancement model built on a continuous neural audio codec (NAC) framework. Unlike discrete latent modelling approaches, cNAC-SE operates in the continuous latent space of a VQ-based codec, enabling robust and high-fidelity noise suppression.
Below we provide perceptual listening examples from the DNS3 public test set, comparing the noisy input, the open-sourced StoRM baseline, the discrete variant dNAC-SE, and our proposed cNAC-SE. Across the presented DNS3 examples, while dNAC-SE achieves substantial noise reduction, it occasionally exhibits a loss of speech brightness. In contrast, cNAC-SE leverages continuous latent-space modelling to better preserve speech fidelity and high-frequency details, while further improving noise suppression compared with both StoRM and dNAC-SE, producing cleaner enhanced speech with less residual noise and improved overall listening quality.