- Seattle, WA
- kyleclo.com
- @kylelostat
- @kylelo.bsky.social
Stars
Unsupervised text tokenizer for Neural Network-based text generation.
Debugging, monitoring and visualization for Python Machine Learning and Data Science
arXiv LaTeX Cleaner: Easily clean the LaTeX code of your paper to submit to arXiv
Code for collecting, processing, and preparing datasets for the Common Pile
A full spaCy pipeline and models for scientific/biomedical documents.
Modeling, training, eval, and inference code for OLMo
Acceptance rates for the major AI conferences
Apache PDFBox extension for precisely extracting character/symbol locations and identities from born-digital PDF files.
A collection of scripts that build docker images for various use-cases.
Code for the paper "Language Models are Unsupervised Multitask Learners"
TensorFlow code and pre-trained models for BERT
Parsers for scientific papers (PDF2JSON, TEX2JSON, JATS2JSON)
Code for Defending Against Neural Fake News, https://rowanzellers.com/grover/
Code for the paper SciCo: Hierarchical Cross-Document Coreference for Scientific Concepts (AKBC 2021). https://openreview.net/forum?id=OFLbgUP04nC
We evaluate many models used for biomedical and clinical nlp tasks, and train new models that perform much better.
An Interactive Tool for Scalable and Reproducible Error Analysis.
Library to scrape and clean web pages to create massive datasets.
Replication code for "With Little Power Comes Great Responsibility"
A large (>5k) collection of search questions asked about Coronavirus 🦠