Stars
Modeling, training, eval, and inference code for OLMo
A codebase for "Local Model-Agnostic Explanations for Ranking Model Interpretability"
Hebrew PHI identification and redaction toolkit
Neural Modeling for Named Entities and Morphology (Hebrew NER)
Named Entity (NER) annotations of the Hebrew Treebank (Haaretz newspaper) corpus, including: morpheme and token level NER labels, nested mentions, and more.
A neural network layer that enables training of deep neural networks directly from crowdsourced labels (e.g. from Amazon Mechanical Turk) or, more generally, labels from multiple annotators with di…
A CoNLL-U parser that takes a CoNLL-U formatted string and turns it into a nested python dictionary.
Tool for parsing and converting various span encoding schemes.
Pandas Network Analysis by UrbanSim: fast accessibility metrics and shortest paths, using contraction hierarchies 🗺️
A tool for GTFS transit and OSM pedestrian network accessibility analysis by UrbanSim
Tools for the extraction of OpenStreetMap street network data
Human annotations for "Inherent Disagreements in Human Textual Inferences" paper
A field-tested Hebrew tokenizer for dirty texts (ben-yehuda project, bible, cc100, mc4, opensubs, oscar, twitter) focused on multi-word expression extraction.
Analyze and extract Wikipedia article text and attributes and store them into an ElasticSearch index or to json files (multilingual support)
A node.js port to the JavaScriptCore engine and iOS
An interactive grid for sorting, filtering, and editing DataFrames in Jupyter notebooks
An Apache Spark framework for easy data processing, extraction as well as derivation for web archives and archival collections, developed at Internet Archive.
The Archives Unleashed Toolkit is an open-source toolkit for analyzing web archives.
peartree: A library for converting transit data into a directed graph for sketch network analysis.
Collects and parses price data from Israeli supermarkets.
A fast, forgiving GTFS reader built on pandas DataFrames