GitHub - ziqizhang/wop_matching

Code and information to replicate experiments on http://webdatacommons.org/largescaleproductcorpus/v2/index.html

Prerequisites

anaconda (or similar for standard packages)
py_entitymatching
xgboost
deepmatcher

Data Preparation

Please download and unzip the WDC LSPC v2 normalized data files into the corresponding folder under data/raw/wdc-lspc/

Run noise-training-sets notebook
Run process-to-magellan and process-to-wordcooc notebooks

Model Learning

Run run-wordcooc, run-magellan or run-deepmatcher notebooks to replicate learning curve and label-noise experiments

Best found parameters for deepmatcher optimization on computers xlarge

Find the best parameter combinations in the file optimized-parameters.txt

Deepmatcher end-to-end training

To allow for gradient updates of the embedding layer, simply change the line embed.weight.requires_grad = False in models/core.py to True in the deepmatcher package

Code for building of small, medium, large and xlarge training sets

Additional requirement: textdistance

The notebook sample-training-sets contains the code used for building the 4 training sets for each product category

Acknowledgements

Project structure based on Cookiecutter Data Science: https://drivendata.github.io/cookiecutter-data-science/

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
cache		cache
notebooks		notebooks
reports		reports
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.MD		README.MD
optimized-parameters.txt		optimized-parameters.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Prerequisites

Data Preparation

Model Learning

Best found parameters for deepmatcher optimization on computers xlarge

Deepmatcher end-to-end training

Code for building of small, medium, large and xlarge training sets

Acknowledgements

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Prerequisites

Data Preparation

Model Learning

Best found parameters for deepmatcher optimization on computers xlarge

Deepmatcher end-to-end training

Code for building of small, medium, large and xlarge training sets

Acknowledgements

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages