Setup

For this example, we have a 3x4x3 m room. The reverberation time (RT60) is 0.3sec. The critical distnaces overlap here. This makes for a more challenging setup than the previous examples. Room

Mod-MFCC Based Clusters

CLUSTER 1
CLUSTER 2
BACKGROUND CLUSTER
Mod-MFCC-based Clusters image
Mod-MFCC-based Clusters image
Mod-MFCC-based Clusters image
cluster reference microphone masked reference signal DSB signal
1


2


Speaker Embedding Clusters

CLUSTER 1
CLUSTER 2
BACKGROUND CLUSTER
SpVer Clusters image
SpVer Clusters image
SpVer Clusters image
cluster reference microphone masked reference signal DSB signal
1


2


Discussion

Even for this challenging case, the embeddings show to be good clustering features, delivering logically plausible clusters. In contrast, for the Mod-MFCC based features, there seems to be no logic in the clustering. In this example, it seems to us that the Mod-MFCC features cluster more based on the SINR than on the speaker-specific features. Also note that the speaker embedding features avoid taking microphones located in the region with overlapping critical distances of the speakers.