Stars
Rich is a Python library for rich text and beautiful formatting in the terminal.
Style guides for Google-originated open-source projects
A framework for graph-based dependency parsing.
Speech recognition with word-level timestamps, optimized for batch inference.
The Enhanced Edition versions of Baldur's Gate, Baldur's Gate II, Planescape: Torment and Icewind Dale come with missing dependencies on Linux. Here are the missing files and instructions.
Fast, correct Python JSON library supporting dataclasses, datetimes, and numpy
A library for open-source data processing tools to create language model training datasets
Bringing BERT into modernity via both architecture changes and scaling
This repository contains demos I made with the Transformers library by HuggingFace.
Unofficial implementation of the Ask-LLM paper 'How to Train Data-Efficient LLMs', arXiv:2402.09668.
Efficient Triton Kernels for LLM Training
Automated Identification of Redundant Layer Blocks for Pruning in Large Language Models
A PyTorch native platform for training generative AI models
The RedPajama-Data repository contains code for preparing large datasets for training large language models.
library supporting NLP and CV research on scientific papers
Code for 'LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders'
Ongoing research training transformer language models at scale, including: BERT & GPT-2
Swedish parliamentary proceedings - Riksdagens protokoll 1867-today
Freeing data processing from scripting madness by providing a set of platform-agnostic customizable pipeline processing blocks.
Scaling Data-Constrained Language Models
A fast implementation of T5/UL2 in PyTorch using Flash Attention
Minimalistic large language model 3D-parallelism training
Machine Learning Engineering Open Book