Welcome to the Symposium on Advances in Audio Signal Processing, where we bring together leading researchers in the field to share their latest insights into speech, voice, and audio technologies. This event will cover a range of topics, from auditory feedback modulation to speaker recognition and deep learning-based audio coding. Below, you will find an overview of our distinguished speakers and their talks.
Detailed schedule and further information about the talks can be found below. Join us for an insightful day of discussions and networking as we explore the latest advancements in audio signal processing!
Here is the program:
| Time | Speaker | Topic |
|---|---|---|
| 9:55-10:00 | Nilesh Madhu | Introduction |
| 10:00-10:30 | Byeong Hyeon Kim (Yonsei University, KR) |
DNN-Based Audio Coding with Prior Knowledge in Signal Processing and Psychoacoustics
View Abstract Abstract:This work provides an overview of DNN-based audio coding research conducted at the DSP&AI Lab. Audio coding aims to represent signals with minimal bits while preserving quality, formulated as an optimization problem constrained by a limited bit budget. Traditional codecs have been designed by leveraging digital signal processing theories and psychoacoustics, which analyze how humans perceive sound. DNN-based audio coding can also benefit from this prior knowledge to address such constrained optimization problems. For instance, psychoacoustic models can enhance the performance of neural audio coding by weighting loss functions and discriminators or serving as differentiable training objectives. Compared to conventional loss functions, these objectives align more closely with human perception, improving performance under limited bitrates and model capacities. Additionally, the expressive power of DNNs and generative models can be better utilized with guidance from psychoacoustics. By applying DNNs selectively to quantization or using generative models only for perceptually irrelevant components, the codec pipeline can effectively combine the strengths of traditional codecs and DNNs. |
| 10:30-10:35 | Break | |
| 10:35-11:15 | Dr. Isabel Schiller (RWTH Aachen University, DE) |
Auditory Feedback Modulation with VQ-Synth: A New Tool for Voice Therapy and Training?
View Abstract Abstract:The ability to manipulate voice quality in real time has significant implications for both research and clinical applications, particularly in the context of auditory feedback modulation (AFM). In AFM experiments, participants hear an acoustically altered version of their own voice through headphones as they speak, with these perturbations typically triggering vocal adaptations in response.In this talk, I will give an overview of the VQ-Synth project, in which we investigated how different voice-quality perturbations affect speakers’ auditory perception and vocal responses. The AFM system used for this purpose, VQ-Synth, was developed in collaboration with Kiel University. Initially implemented in Matlab, it was later optimized in ANSI-C to minimize processing delay. Integrated with a graphical user interface (GUI), VQ-Synth provides a robust framework for psychological experiments. As part of this project, we conducted a series of participant experiments to identify resynthesis settings that would successfully induce the perception of hoarseness (dysphonia) in participants’ auditory feedback and to determine whether this would trigger compensatory vocal responses, leading to voice quality improvements. This talk will include our first findings as well as a discussion of potential applications of the VQ-Synth system in voice therapy and training. |
| 11:15-11:20 | Break | |
| 11:20-12:00 | Jenthe Thienpondt (Ghent University, BE) | Speaker Embeddings: Advances, Challenges, and Real-World Applications
View Abstract Abstract:After the performance leap induced by deep learning models in various scientific fields, the speaker recognition community followed swiftly. Current deep neural network-based speaker embeddings are able to robustly capture a wide range of speaker characteristics, including gender, language and emotional tonality based on surprisingly short speech utterances. In this presentation, we will provide a concise overview of the recent advancements in speaker embeddings. Subsequently, we will discuss our recent research investigating their potential application in monitoring speech-related pathologies. |
| 12:00-12:05 | Break | |
| 12:05-12:45 | Chuan Wen | Sound Quality in DNN-based Hearing-Aid Algorithms.
View Abstract Abstract:Current hearing aids typically focus on addressing outer-hair-cell (OHC) damage and associated hearing sensitivity loss, but do not consider age- or noise-exposure-related damage to auditory-nerve fibers (i.e., cochlear synaptopathy, CS). To compensate for individual and combined CS and OHC damage patterns, closed-loop systems that include biophysical models of (impaired) auditory signal processing can generate personalized sound processing algorithms. These systems are particularly powerful when implemented as deep neural networks (DNNs), allowing for sound processing algorithms to be optimized via backpropagation. One such system, CoNNear, employs autoencoder-based models of auditory processing that simulate cochlear mechanics, inner-hair-cell function, and auditory-nerve fiber activity. These systems are trained to minimize the differences in auditory processing between normal and impaired hearing models, making them suitable for AI hardware integration. However, such end-to-end systems introduce different artifacts than traditional sound processors, such as tonal artifacts from transposed convolutions in CNN-based auditory modules. The artifacts propagate within the closed-loop framework and ultimately become overamplified and audible in the resulting hearing-aid algorithm.To address this challenge, we propose a dilated CNN-based architecture (dCoNNear) that comprises a sequence of stacked memory blocks, which are most promising and artifact-free for closed-loop audio processing. We applied the dCoNNear architecture to all auditory elements inside the closed-loop system as well as to the sound processors, and evaluated the sound quality and the compensation accuracy of the resulting algorithms. Our results show that dCoNNear can not only accurately simulate all processing stages of a non-DNN-based SOTA biophysical auditory processing system, but also does so without introducing spurious and audible artifacts in the resulting sound processors. The predicted restoration accuracy for simulated auditory-nerve population responses suggests that our algorithms can be used for both OHC and CS pathologies. |