Stars
Code for collecting, processing, and preparing datasets for the Common Pile
Modeling, training, eval, and inference code for OLMo
A collection of scripts that build docker images for various use-cases.
Code for the paper SciCo: Hierarchical Cross-Document Coreference for Scientific Concepts (AKBC 2021). https://openreview.net/forum?id=OFLbgUP04nC
Apache PDFBox extension for precisely extracting character/symbol locations and identities from born-digital PDF files.
Replication code for "With Little Power Comes Great Responsibility"
We evaluate many models used for biomedical and clinical nlp tasks, and train new models that perform much better.
Parsers for scientific papers (PDF2JSON, TEX2JSON, JATS2JSON)
A large (>5k) collection of search questions asked about Coronavirus 🦠
TensorFlow code and pre-trained models for BERT
Unsupervised text tokenizer for Neural Network-based text generation.
arXiv LaTeX Cleaner: Easily clean the LaTeX code of your paper to submit to arXiv
Code for Defending Against Neural Fake News, https://rowanzellers.com/grover/
Code for the paper "Language Models are Unsupervised Multitask Learners"
A full spaCy pipeline and models for scientific/biomedical documents.
An Interactive Tool for Scalable and Reproducible Error Analysis.
Debugging, monitoring and visualization for Python Machine Learning and Data Science
Acceptance rates for the major AI conferences
Library to scrape and clean web pages to create massive datasets.