Problems of Generating tr_label_phn during Inference

In my own inference experiment, I notice the score is mainly determined not by the .wav but by the phn. 
There was an extreme pattern for multiple sound files of the same word:

Word A 4 4 4 4 4 4
Word B 5 5 5 5 5 5 

Even after messing up the .wav files, the results remain the same.
Then I found a potential reason:

In [`gen_seq_data_phn.py`](https://github.com/YuanGongND/gopt/blob/master/src/prep_data/gen_seq_data_phn.py#L81), `tr_label_phn` or `te_label_phn` is generated by the `phn_dict` that is specific to the dataset that we want to use. However, the pretrain model is based on SpeechOcean762. When trying to inference any other dataset, the model will receive these labels specific to the inference dataset not the SpeechOcean dataset, causing inconsistent inference results.

The correct method is to always generate the phn_dict from the one generated when training the SpeechOcean762.
I will update the inference tutorial if you think it is necessary.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Problems of Generating tr_label_phn during Inference #15

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Problems of Generating tr_label_phn during Inference #15

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions