Low-dimensional Audio Inpainting with mixed
generative-predictive model

C. Aironi, L. Gabrielli, S. Squartini (2024)


This page contains some listening samples to evaluate the performance of LDAI, a mixed generative-predictive DNN framework aimed at repairing very long duration corrupted audio fragments in musical sequences. The proposed pipeline incorporates a convolutional encoder-decoder structure for timbre mapping, working in the time domain, and a low-level conditional WGAN for semantic inpainting, working at a lower level. The motivation for splitting the process into a predictive task and a generative task, is to reduce the effort required by the latter, thus ensuring a robust and fast training convergence, while providing richer musical note generation.

LDAI framework

In listening to the following examples, keep in mind that generative-based inpainting does not aim to precisely replicate the missing waveform, but rather to offer a replacement that is perceptually "pleasing" to the listener’s ears.

1024 ms gap

clean (reference)

corrupted

restored


clean (reference)

corrupted

restored


clean (reference)

corrupted

restored


clean (reference)

corrupted

restored


clean (reference)

corrupted

restored


clean (reference)

corrupted

restored


clean (reference)

corrupted

restored


clean (reference)

corrupted

restored


clean (reference)

corrupted

restored