Musical Instrument Separation on Shift-Invariant Spectrograms via Stochastic Dictionary Learning

06/01/2018
by   Sören Schulze, et al.
0

We propose a method for the blind separation of audio signals from musical instruments. While the approach of applying non-negative matrix factorization (NMF) has been studied in many papers, it does not make use of the pitch-invariance that instruments exhibit. This limitation can be overcome by using tensor factorization, in which context the use of log-frequency spectrograms was initiated, but this still requires the specific tuning of the instruments to be hard-coded into the algorithm. We develop a time-frequency representation that is both shift-invariant and frequency-aligned, with a variant that can also be used for wideband signals. Our separation algorithm exploits this shift-invariance in order to find patterns of peaks related to specific instruments, while non-linear optimization enables it to represent arbitrary frequencies and incorporate inharmonicity, and the reasonability of the representation is ensured by a sparsity condition. The relative amplitudes of the harmonics are saved in a dictionary, which is trained via a modified version of ADAM. For a realistic monaural piece with acoustic recorder and violin, we achieve qualitatively good separation with a signal-to-distortion ratio (SDR) of 12.7 dB, a signal-to-interference ratio (SIR) of 27.0 dB, and a signal-to-artifacts ratio (SAR) of 12.9 dB, averaged.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset
Success!
Error Icon An error occurred

Sign in with Google

×

Use your Google Account to sign in to DeepAI

×

Consider DeepAI Pro