Stars
Meta learning with BERT as a learner
OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, etc) over 100+ datasets.
A collection of benchmarks and datasets for evaluating LLM.
A collection of awesome-prompt-datasets, awesome-instruction-dataset, to train ChatLLM such as chatgpt 收录各种各样的指令数据集, 用于训练 ChatLLM 模型。
A curated list of awesome instruction tuning datasets, models, papers and repositories.
Robust recipes to align language models with human and AI preferences
Collection of links, tutorials and best practices of how to collect the data and build end-to-end RLHF system to finetune Generative AI models
A repo for distributed training of language models with Reinforcement Learning via Human Feedback (RLHF)
Code associated to the article "Multi-annotator Deep Learning: A Probabilistic Framework for Classification"
Friends don't let friends make certain types of data visualization - What are they and why are they bad.
This repository contains a Jax implementation of conformal training corresponding to the ICLR'22 paper "learning optimal conformal classifiers".
Computational Linguistics and Social Networks Group @ IIT Gandhinagar
Retrieves parquet files from Hugging Face, identifies and quantifies junky data, duplication, contamination, and biased content in dataset using pandas
Code for Orlikowski et al. (2023): "The Ecological Fallacy in Annotation: Modelling Human Label Variation goes beyond Sociodemographics"
umanlp / SocioAdapt
Forked from chiachienhung/SocioAdaptCan Demographic Factors Improve Text Classification? Revisiting Demographic Adaptation in the Age of Transformers
Build, evaluate, understand, and fix LLM-based apps
20+ high-performance LLMs with recipes to pretrain, finetune and deploy at scale.
Awesome-LLM: a curated list of Large Language Model
Open source annotation tool for machine learning practitioners.
MixText: Linguistically-Informed Interpolation of Hidden Space for Semi-Supervised Text Classification
Code for the paper "REV: Information-Theoretic Evaluation of Free-Text Rationales"
multi_task_NLP is a utility toolkit enabling NLP developers to easily train and infer a single model for multiple tasks.
"Understanding Dataset Difficulty with V-Usable Information" (ICML 2022, outstanding paper)