Skip to content

Problems of Generating tr_label_phn during Inference #15

@LyWangPX

Description

@LyWangPX

In my own inference experiment, I notice the score is mainly determined not by the .wav but by the phn.
There was an extreme pattern for multiple sound files of the same word:

Word A 4 4 4 4 4 4
Word B 5 5 5 5 5 5

Even after messing up the .wav files, the results remain the same.
Then I found a potential reason:

In gen_seq_data_phn.py, tr_label_phn or te_label_phn is generated by the phn_dict that is specific to the dataset that we want to use. However, the pretrain model is based on SpeechOcean762. When trying to inference any other dataset, the model will receive these labels specific to the inference dataset not the SpeechOcean dataset, causing inconsistent inference results.

The correct method is to always generate the phn_dict from the one generated when training the SpeechOcean762.
I will update the inference tutorial if you think it is necessary.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions