Cross-Lingual Training of Neural Models for Document Ranking

Peng Shi, He Bai, Jimmy Lin


Abstract
We tackle the challenge of cross-lingual training of neural document ranking models for mono-lingual retrieval, specifically leveraging relevance judgments in English to improve search in non-English languages. Our work successfully applies multi-lingual BERT (mBERT) to document ranking and additionally compares against a number of alternatives: translating the training data, translating documents, multi-stage hybrids, and ensembles. Experiments on test collections in six different languages from diverse language families reveal many interesting findings: model-based relevance transfer using mBERT can significantly improve search quality in (non-English) mono-lingual retrieval, but other “low resource” approaches are competitive as well.
Anthology ID:
2020.findings-emnlp.249
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2020
Month:
November
Year:
2020
Address:
Online
Editors:
Trevor Cohn, Yulan He, Yang Liu
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
2768–2773
Language:
URL:
https://aclanthology.org/2020.findings-emnlp.249
DOI:
10.18653/v1/2020.findings-emnlp.249
Bibkey:
Cite (ACL):
Peng Shi, He Bai, and Jimmy Lin. 2020. Cross-Lingual Training of Neural Models for Document Ranking. In Findings of the Association for Computational Linguistics: EMNLP 2020, pages 2768–2773, Online. Association for Computational Linguistics.
Cite (Informal):
Cross-Lingual Training of Neural Models for Document Ranking (Shi et al., Findings 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.findings-emnlp.249.pdf