Stars
A library for efficient similarity search and clustering of dense vectors.
The standard data-centric AI package for data quality and machine learning with messy, real-world data and labels.
ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators
Cramming the training of a (BERT-type) language model into limited compute.
[CCS'24] A dataset consists of 15,140 ChatGPT prompts from Reddit, Discord, websites, and open-source datasets (including 1,405 jailbreak prompts).
RWKV is an RNN with transformer-level LLM performance. It can be directly trained like a GPT (parallelizable). So it's combining the best of RNN and transformer - great performance, fast inference,…
Social media scraping / data collection library for Facebook, Twitter, Reddit, YouTube, Pinterest, and Tumblr APIs
pytorch-tpu / fairseq
Forked from facebookresearch/fairseqFacebook AI Research Sequence-to-Sequence Toolkit written in Python.
[NeurIPS 2021] COCO-LM: Correcting and Contrasting Text Sequences for Language Model Pretraining
Code repo for "Model-Generated Pretraining Signals Improves Zero-Shot Generalization of Text-to-Text Transformers" (ACL 2023)
Toolkit to segment text into sentences or other semantic units in a robust, efficient and adaptable way.
This is the repository of the EMNLP 2021 paper "BERT, mBERT, or BiBERT? A Study on Contextualized Embeddings for Neural Machine Translation".
A guidance language for controlling large language models.
Indic-BERT-v1: BERT-based Multilingual Model for 11 Indic Languages and Indian-English. For latest Indic-BERT v2, check: https://github.com/AI4Bharat/IndicBERT
JARVIS, a system to connect LLMs with ML community. Paper: https://arxiv.org/pdf/2303.17580.pdf
Code for EMNLP 2020 paper Accurate Word Alignment Induction from Neural Machine Translation
Source code for Twitter's Recommendation Algorithm
Source code for Twitter's Recommendation Algorithm
FinGPT: Open-Source Financial Large Language Models! Revolutionize 🔥 We release the trained model on HuggingFace.
Korean BERT pre-trained cased (KoBERT)