Low-dimensional Audio Inpainting with mixed generative-predictive model

Low-dimensional Audio Inpainting with mixed
generative-predictive model

C. Aironi, L. Gabrielli, S. Squartini (2024)

This page contains some listening samples to evaluate the performance of LDAI, a mixed generative-predictive DNN framework aimed at repairing very long duration corrupted audio fragments in musical sequences. The proposed pipeline incorporates a convolutional encoder-decoder structure for timbre mapping, working in the time domain, and a low-level conditional WGAN for semantic inpainting, working at a lower level. The motivation for splitting the process into a predictive task and a generative task, is to reduce the effort required by the latter, thus ensuring a robust and fast training convergence, while providing richer musical note generation.

In listening to the following examples, keep in mind that generative-based inpainting does not aim to precisely replicate the missing waveform, but rather to offer a replacement that is perceptually "pleasing" to the listener’s ears.