This project seeks to build a Python software package that consists of a comprehensive and scalable set of string tokenizers (such as alphabetical tokenizers, whitespace tokenizers) and string similarity measures (such as edit distance, Jaccard, TF/IDF). The package is free, open-source, and BSD-licensed.
- Project Homepage: https://sites.google.com/site/anhaidgroup/projects/magellan/py_stringmatching
- Code repository: https://github.com/anhaidgroup/py_stringmatching
- User Manual: https://anhaidgroup.github.io/py_stringmatching/v0.4.2/index.html
- Tutorial: https://anhaidgroup.github.io/py_stringmatching/v0.4.2/Tutorial.html
- How to Contribute: https://anhaidgroup.github.io/py_stringmatching/v0.4.2/Contributing.html
- Developer Manual: http://pages.cs.wisc.edu/~anhai/py_stringmatching/v0.2.0/dev-manual-v0.2.0.pdf
- Issue Tracker: https://github.com/anhaidgroup/py_stringmatching/issues
- Mailing List: https://groups.google.com/forum/#!forum/py_stringmatching
py_stringmatching has been tested on each Python version between 3.7 and 3.12, inclusive.
The required dependencies to build the package are NumPy 1.7.0 or higher, but lower than 2.0, and a C or C++ compiler. For the development version, you will also need Cython.
py_stringmatching has been tested on Linux, OS X and Windows. At this time we have only tested on x86 architecture.