This repo includes:
- A Gazetteer of tokens and NE tags annotated by 3 domain experts
- A Corpus of 475,085 job titles crawled from Linkedin, with NE tags prefixed using BIOES schemes
- Title2Vec pre-trained job title embedding finetuned from ELMo. Checkpoint available for Download.
Please cite the following papers when using IPOD:
@article{liu2019ipod,
title={IPOD: An Industrial and Professional Occupations Dataset and its Applications to Occupational Data Mining and Analysis},
author={Junhua Liu and Yung Chuen Ng and Kristin L. Wood and Kwan Hui Lim},
year={2019},
journal={arXiv preprint arXiv:1910.10495}
}