Skip to content
View RuanVisser's full-sized avatar

Highlights

  • Pro

Block or report RuanVisser

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Babysit your preemptible TPUs

Python 84 14 Updated Dec 3, 2022

A library for efficient similarity search and clustering of dense vectors.

C++ 31,184 3,626 Updated Oct 29, 2024

The standard data-centric AI package for data quality and machine learning with messy, real-world data and labels.

Python 9,661 746 Updated Oct 23, 2024

ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators

Python 2,335 354 Updated Mar 23, 2024

Cramming the training of a (BERT-type) language model into limited compute.

Python 1,290 100 Updated Jun 13, 2024

[CCS'24] A dataset consists of 15,140 ChatGPT prompts from Reddit, Discord, websites, and open-source datasets (including 1,405 jailbreak prompts).

Jupyter Notebook 2,654 244 Updated Oct 8, 2024

Pretraining Efficiently on S2ORC!

Python 136 3 Updated Oct 23, 2024

RWKV is an RNN with transformer-level LLM performance. It can be directly trained like a GPT (parallelizable). So it's combining the best of RNN and transformer - great performance, fast inference,…

Python 12,595 858 Updated Oct 21, 2024

BERT-related papers

2,032 282 Updated Aug 12, 2023

Social media scraping / data collection library for Facebook, Twitter, Reddit, YouTube, Pinterest, and Tumblr APIs

Python 553 93 Updated Jun 25, 2020

Facebook AI Research Sequence-to-Sequence Toolkit written in Python.

Python 22 7 Updated Jan 25, 2023

[NeurIPS 2021] COCO-LM: Correcting and Contrasting Text Sequences for Language Model Pretraining

Python 118 14 Updated Jul 25, 2023

Code repo for "Model-Generated Pretraining Signals Improves Zero-Shot Generalization of Text-to-Text Transformers" (ACL 2023)

Python 22 3 Updated Nov 1, 2023

Toolkit to segment text into sentences or other semantic units in a robust, efficient and adaptable way.

Python 714 41 Updated Oct 27, 2024

🌸 A command-line fuzzy finder

Go 65,143 2,397 Updated Oct 30, 2024

the only cheat sheet you need

Python 38,359 1,791 Updated Jun 22, 2024

This is the repository of the EMNLP 2021 paper "BERT, mBERT, or BiBERT? A Study on Contextualized Embeddings for Neural Machine Translation".

Python 31 6 Updated Nov 28, 2022

A guidance language for controlling large language models.

Jupyter Notebook 18,970 1,042 Updated Oct 29, 2024

Indic-BERT-v1: BERT-based Multilingual Model for 11 Indic Languages and Indian-English. For latest Indic-BERT v2, check: https://github.com/AI4Bharat/IndicBERT

Python 276 41 Updated May 11, 2023

LTG-Bert

Python 27 4 Updated Jan 8, 2024

A visualization grammar.

JavaScript 11,212 1,505 Updated Oct 29, 2024

JARVIS, a system to connect LLMs with ML community. Paper: https://arxiv.org/pdf/2303.17580.pdf

Python 23,654 1,966 Updated Sep 26, 2024

Code for EMNLP 2020 paper Accurate Word Alignment Induction from Neural Machine Translation

Python 2 Updated Apr 4, 2023

Source code for Twitter's Recommendation Algorithm

Scala 62,257 12,151 Updated Jul 10, 2024

Source code for Twitter's Recommendation Algorithm

Python 10,091 2,206 Updated Jul 10, 2024

FinGPT: Open-Source Financial Large Language Models! Revolutionize 🔥 We release the trained model on HuggingFace.

Jupyter Notebook 13,915 1,930 Updated Oct 1, 2024

Tools and Modeling Code for the MASSIVE dataset

Python 538 57 Updated Nov 28, 2022

The implementation of DeBERTa

Python 1,984 224 Updated Sep 29, 2023

Korean BERT pre-trained cased (KoBERT)

Jupyter Notebook 1,295 368 Updated Oct 3, 2024
Next