In this study, we depart from the reliance on extensive pre-trained models for feature representation or mutual information minimization for diverse feature decoupling. Instead, we revisit decoupling methods based on instance normalization. To achieve this, we introduce a novel feature coupling module named cross-adaptive instance normalization (CAIN), which extends the concept of adaptive instance normalization (AdaIN). Beyond offering style injection capabilities similar to AdaIN, CAIN is explicitly designed to maintain content consistency by reconstructing frame-level statistics in mel-spectrograms. The results indicate that CAIN, serving as a lightweight plugin, significantly improves conventional instance normalization-driven approaches. Building upon this, we introduce RLVC, which achieves robust performance with a mere 5.29M parameters. For the audio samples, please refer to our demo page.
python=3.7+
You can install the dependencies with
pip install -r requirements.txtThe HiFi-GAN vocoder is employed to convert log mel-spectrograms to waveforms. The model is trained on universal datasets with 13.93M parameters. Please edit the path of hifigan model in "./hifivoice/inference_e2e.py".
You can download the pretrained model, and then edit "./Modu/infer/infer_config.yaml".Test Samples could be organized as "wav22050/*.wav".
python ./Modu/infer/infer_base_batch.pyOr you can access "./Modu/infer_samples.py" for the source and target speeches specified by yourself.
The corpus should be organized as "VCTK22050/$figure$/*.wav", and then edit the "train_wav_dir" and "out_dir" in file "./Modu/predata/robust_mels.py". The output "figure_label_mel_map.pkl" will be used for training.
python Modu/predata/robust_mels.pyPlease edit the path "test_wav_dir" and "label_clip_mel_pkl" for evaluation and train corpus in config file "./Modu/config.yaml".
python Modu/solver.py