Skip to content

gracecarrillo/IPOD

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

27 Commits
 
 
 
 
 
 

Repository files navigation

Industrial and Professional Occupations Dataset (IPOD)

License: CC BY 4.0

This repo includes:

  • A Gazetteer of tokens and NE tags annotated by 3 domain experts
  • A Corpus of 475,085 job titles crawled from Linkedin, with NE tags prefixed using BIOES schemes
  • Title2Vec pre-trained job title embedding finetuned from ELMo. Checkpoint available for Download.

Citing IPOD

Please cite the following papers when using IPOD:

@article{liu2019ipod,
    title={IPOD: An Industrial and Professional Occupations Dataset and its Applications to Occupational Data Mining and Analysis},
    author={Junhua Liu and Yung Chuen Ng and Kristin L. Wood and Kwan Hui Lim},
    year={2019},
    journal={arXiv preprint arXiv:1910.10495}
}

About

A Corpus of 475,000 Industrial Occupations

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors