Highlights
- Pro
Stars
Check for multiple patterns in a single string at the same time: a fast Aho-Corasick algorithm for Python
Python/Flask-based website for text analysis workflow. Previous (stable) release is live at:
What's In My Big Data (WIMBD) - a toolkit for analyzing large text datasets
Bayesian network models for inferring core-periphery structure
A python library of similarity measures which allow measuring the perceptual similarity between set embeddings corpora.
An interactive data visualization tool which brings matplotlib graphics to the browser using D3.
ruptures: change point detection in Python
Code for 'LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders'
Lightweight and extensible compatibility layer between dataframe libraries!
Rapid fuzzy string matching in Python using various string metrics
Medieval Manuscripts in Oxford Libraries: TEI catalogue descriptions
CleverCSV is a Python package for handling messy CSV files. It provides a drop-in replacement for the builtin CSV module with improved dialect detection, and comes with a handy command line applica…
Data and code behind the articles and graphics at FiveThirtyEight
A database of early modern printers and sellers culled from the eMOP source documents
Recipes for using Python's pandas library
A utility and library for imposition -- arranging pages on a sheet of paper for optimal printing
Data and code to support "On Classification with Large Language Models in Cultural Analytics"