Skip to content

Add WMT translation example.#133

Merged
copybara-service[bot] merged 4 commits into
google:masterfrom
levskaya:wmt
Apr 17, 2020
Merged

Add WMT translation example.#133
copybara-service[bot] merged 4 commits into
google:masterfrom
levskaya:wmt

Conversation

@levskaya

Copy link
Copy Markdown
Collaborator

This is the first public version of a full encoder-decoder transformer model.

NB: At the moment this uses a simple dynamic bucketing approach to batching sentences, which trains much more slowly than when using a multiple-sentence-packing method. The model here already supports packed-examples and segmented-attention, we just need to add a preprocessing pass to the tfds-based pipeline.

@levskaya levskaya requested review from AlexeyG and marcvanzee and removed request for AlexeyG March 27, 2020 17:49
Comment thread examples/wmt/README.md Outdated

@marcvanzee marcvanzee left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments on the preprocessing. I haven't looked at the rest yet.

Comment thread examples/wmt/bleu.py Outdated
Comment thread examples/wmt/bleu.py Outdated
Comment thread examples/wmt/README.md Outdated
Comment thread examples/wmt/bleu.py Outdated
Comment thread examples/wmt/train.py Outdated
Comment thread examples/wmt/input_pipeline.py Outdated
Comment thread examples/wmt/input_pipeline.py Outdated
Comment thread examples/wmt/input_pipeline.py Outdated
Comment thread examples/wmt/input_pipeline.py Outdated
Comment thread examples/wmt/input_pipeline.py Outdated
Comment thread examples/wmt/train.py Outdated
Comment thread examples/wmt/train.py Outdated
Comment thread examples/wmt/train.py Outdated
Comment thread examples/wmt/models.py
Comment thread examples/wmt/train.py Outdated
Comment thread examples/wmt/train.py Outdated
Comment thread examples/wmt/train.py Outdated
Comment thread examples/wmt/train.py Outdated
Comment thread examples/wmt/train.py Outdated
Comment thread examples/wmt/train.py Outdated
Co-Authored-By: Marc van Zee <marcvanzee@gmail.com>

@marcvanzee marcvanzee left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left some more comments on train.py. I didn't get all the details of the fast decoding, but I can dig into this after this PR as well. I'll now take a look at the remaining model.py and decode.py.

Comment thread examples/wmt/train.py Outdated
Comment thread examples/wmt/train.py
Comment thread examples/wmt/train.py Outdated
Comment thread examples/wmt/train.py
Comment thread examples/wmt/train.py Outdated
Comment thread examples/wmt/train.py Outdated
Comment thread examples/wmt/train.py Outdated
Comment thread examples/wmt/train.py Outdated
Comment thread examples/wmt/models.py Outdated
Comment thread examples/wmt/models.py
Comment thread examples/wmt/decode.py Outdated
Comment thread examples/wmt/models.py
Comment thread examples/wmt/models.py
Comment thread examples/wmt/models.py
Comment thread examples/wmt/models.py Outdated
Comment thread examples/wmt/models.py Outdated
Comment thread examples/wmt/models.py
Comment thread examples/wmt/models.py
levskaya and others added 2 commits April 15, 2020 14:39
Co-Authored-By: Marc van Zee <marcvanzee@gmail.com>
@levskaya levskaya requested a review from marcvanzee April 15, 2020 07:25
Comment thread examples/wmt/train.py
Comment thread examples/wmt/models.py
lambda x: {'inputs': x[features_info.supervised_keys[0]],
'targets': x[features_info.supervised_keys[1]]})
train_data = train_data.map(to_features_dict)
eval_data = eval_data.map(to_features_dict)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can add num_parallel_calls here (~2x speedup)

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah this input pipeline is terrible - I'll have a new one coming soon in another PR.

@copybara-service copybara-service Bot merged commit ac737b4 into google:master Apr 17, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants