Stars
AudioSample is an optimized numpy-like audio manipulation library, created for researchers, used by developers.
Code repo for "WebArena: A Realistic Web Environment for Building Autonomous Agents"
Code for the model presented in the paper: "code2seq: Generating Sequences from Structured Representations of Code"
Curated list of tools, projects and productions using program-aided reasoning
[ICLR'24 spotlight] An open platform for training, serving, and evaluating large language model for tool learning.
Code for the paper "On the Expressivity Role of LayerNorm in Transformers' Attention" (Findings of ACL'2023)
Public repo for the NeurIPS 2023 paper "Unlimiformer: Long-Range Transformers with Unlimited Length Input"
Training language models to make programs faster
CodeBERTScore: an automatic metric for code generation, based on BERTScore
JEMMA: An Extensible Java dataset for Many ML4Code Applications
Repo for ICML23 "Why do Nearest Neighbor Language Models Work?"
A set of utilities for running few-shot prompting experiments on large-language models
It is my belief that you, the postgraduate students and job-seekers for whom the book is primarily meant will benefit from reading it; however, it is my hope that even the most experienced research…
PaL: Program-Aided Language Models (ICML 2023)
🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.
Language Models of Code are Few-Shot Commonsense Learners (EMNLP 2022)
Collected solutions from Google Code Jam programming competition (2008-2021).
DIRTY: Augmenting Decompiler Output with Learned Variable Names and Types
The official code of EMNLP 2022, "SCROLLS: Standardized CompaRison Over Long Language Sequences".
SacreROUGE is a library dedicated to the use and development of text generation evaluation metrics with an emphasis on summarization.
The official repository for Efficient Long-Text Understanding Using Short-Text Models (Ivgi et al., 2022) paper
PyTorch code for the RetoMaton paper: "Neuro-Symbolic Language Modeling with Automaton-augmented Retrieval" (ICML 2022)
📰 My little corner of the internet for writing notes about papers I read
Implementation of Memorizing Transformers (ICLR 2022), attention net augmented with indexing and retrieval of memories using approximate nearest neighbors, in Pytorch
Data and code for "DocPrompting: Generating Code by Retrieving the Docs" @ICLR 2023
PyTorch + HuggingFace code for RetoMaton: "Neuro-Symbolic Language Modeling with Automaton-augmented Retrieval" (ICML 2022), including an implementation of kNN-LM and kNN-MT
Code for the paper "Symmetric Machine Theory of Mind", presented at ICML 2022.