Welcome to the Symposium on Advances in Audio Signal Processing, where we bring together leading researchers in the field to share their latest insights into speech, voice, and audio technologies. This event will cover a range of topics, from auditory feedback modulation to speaker recognition and deep learning-based audio coding. Below, you will find an overview of our distinguished speakers and their talks.
Detailed schedule and further information about the talks can be found below. Join us for an insightful day of discussions and networking as we explore the latest advancements in audio signal processing!
Here is the program:
Time | Speaker | Topic |
---|---|---|
9:55-10:00 | Nilesh Madhu | Introduction |
10:00-10:30 | Byeong Hyeon Kim (Yonsei University, KR) |
DNN-Based Audio Coding with Prior Knowledge in Signal Processing and Psychoacoustics
View Abstract Abstract:This work provides an overview of DNN-based audio coding research conducted at the DSP&AI Lab. Audio coding aims to represent signals with minimal bits while preserving quality, formulated as an optimization problem constrained by a limited bit budget. Traditional codecs have been designed by leveraging digital signal processing theories and psychoacoustics, which analyze how humans perceive sound. DNN-based audio coding can also benefit from this prior knowledge to address such constrained optimization problems. For instance, psychoacoustic models can enhance the performance of neural audio coding by weighting loss functions and discriminators or serving as differentiable training objectives. Compared to conventional loss functions, these objectives align more closely with human perception, improving performance under limited bitrates and model capacities. Additionally, the expressive power of DNNs and generative models can be better utilized with guidance from psychoacoustics. By applying DNNs selectively to quantization or using generative models only for perceptually irrelevant components, the codec pipeline can effectively combine the strengths of traditional codecs and DNNs. |
10:30-10:35 | Break | |
10:35-11:15 | Dr. Isabel Schiller (RWTH Aachen University, DE) |
Auditory Feedback Modulation with VQ-Synth: A New Tool for Voice Therapy and Training?
View Abstract Abstract:The ability to manipulate voice quality in real time has significant implications for both research and clinical applications, particularly in the context of auditory feedback modulation (AFM). In AFM experiments, participants hear an acoustically altered version of their own voice through headphones as they speak, with these perturbations typically triggering vocal adaptations in response.In this talk, I will give an overview of the VQ-Synth project, in which we investigated how different voice-quality perturbations affect speakers’ auditory perception and vocal responses. The AFM system used for this purpose, VQ-Synth, was developed in collaboration with Kiel University. Initially implemented in Matlab, it was later optimized in ANSI-C to minimize processing delay. Integrated with a graphical user interface (GUI), VQ-Synth provides a robust framework for psychological experiments. As part of this project, we conducted a series of participant experiments to identify resynthesis settings that would successfully induce the perception of hoarseness (dysphonia) in participants’ auditory feedback and to determine whether this would trigger compensatory vocal responses, leading to voice quality improvements. This talk will include our first findings as well as a discussion of potential applications of the VQ-Synth system in voice therapy and training. |
11:15-11:20 | Break | |
11:20-12:00 | Jenthe Thienpondt (Ghent University, BE) | Speaker Embeddings: Advances, Challenges, and Real-World Applications
View Abstract Abstract:After the performance leap induced by deep learning models in various scientific fields, the speaker recognition community followed swiftly. Current deep neural network-based speaker embeddings are able to robustly capture a wide range of speaker characteristics, including gender, language and emotional tonality based on surprisingly short speech utterances. In this presentation, we will provide a concise overview of the recent advancements in speaker embeddings. Subsequently, we will discuss our recent research investigating their potential application in monitoring speech-related pathologies. |
12:00-12:05 | Break | |
12:05-12:45 | Morgan Thienpont | To be updated soon
View Abstract Abstract:To be updated soon |