Real-time Music Restoration through Linear Prediction and Generative Spectral Inpainting

C. Aironi, L. Gabrielli, S. Cornell, S. Squartini (2026)

This paper presents an advanced Packet Loss Concealment (PLC) method designed to restore audio integrity in scenarios like Networked Music Performance (NMP), where packet loss can severely degrade the listening experience. Our approach integrates the modeling strength of Linear Predictive Coding (LPC) with the generative power of a specialized Generative Adversarial Network (GAN) for spectral inpainting. Specifically, the method uses an LPC model to generate an initial, coarse estimate of the lost audio segment. This estimate is then employed to condition a bin2bin convolutional GAN, which refines the reconstruction in the time-frequency domain by learning to generate high-fidelity magnitude spectrograms. The architecture features a computationally efficient U-Net design, termed bin2bin-v2, with depthwise separable convolutions to significantly reduce the model's parameter count and computational cost while maintaining high modeling capacity. The final output is achieved by combining the GAN's magnitude reconstruction with the phase predicted by the LPC. The system operates in a fully causal setting, capable of real-time execution on standard consumer hardware. The method was developed and evaluated in the context of the IEEE-IS² Music PLC Challenge, where it achieved the first place, outperforming all competing systems by subjective evaluations. Additionally, when compared to an earlier PLC model, it yielded a 9.68% improvement in the objective PLCMOS metric and a 6.48% increase in DNSMOS, alongside an 8.53% enhancement in the ODG score. Furthermore, a 2.81% reduction in the Fréchet Audio Distance indicates that the reconstructed audio more closely aligns with the statistical properties of real music.