gaotianyu1350

Follow

Tianyu Gao gaotianyu1350

Follow

PhD student at Princeton University.

1k followers · 10 following

Achievements

Achievements

Highlights

Pro

Organizations

Stars

74 results for source starred repositories written in Python

huggingface / transformers

🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.

Python 152,219 31,076 Updated Nov 7, 2025

facebookresearch / fairseq

Facebook AI Research Sequence-to-Sequence Toolkit written in Python.

Python 31,923 6,623 Updated Sep 30, 2025

tatsu-lab / stanford_alpaca

Code and documentation to train Stanford's Alpaca models, and generate the data.

Python 30,211 4,036 Updated Jul 17, 2024

sebastianruder / NLP-progress

Repository to track the progress in Natural Language Processing (NLP), including the datasets and the current state-of-the-art for the most common NLP tasks.

Python 22,963 3,620 Updated Jul 28, 2024

MorvanZhou / Reinforcement-learning-with-tensorflow

Simple Reinforcement learning tutorials, 莫烦Python 中文AI教学

Python 9,346 5,020 Updated Mar 31, 2024

huggingface / accelerate

🚀 A simple way to launch, train, and use PyTorch models on almost any device and distributed configuration, automatic mixed precision (including fp8), and easy-to-configure FSDP and DeepSpeed support

Python 9,271 1,219 Updated Nov 4, 2025

shidenggui / easytrader

提供同花顺客户端/miniqmt/雪球的股票量化交易，支持跟踪 joinquant /ricequant 模拟交易和实盘雪球组合

Python 9,000 2,840 Updated Nov 2, 2025

NVIDIA / apex

A PyTorch Extension: Tools for easy mixed precision and distributed training in Pytorch

Python 8,841 1,500 Updated Nov 7, 2025

bitsandbytes-foundation / bitsandbytes

Accessible large language models via k-bit quantization for PyTorch.

Python 7,728 793 Updated Nov 4, 2025

PaddlePaddle / ERNIE

The official repository for ERNIE 4.5 and ERNIEKit – its industrial-grade development toolkit based on PaddlePaddle.

Python 7,526 1,441 Updated Nov 7, 2025

deepspeedai / DeepSpeedExamples

Example models using DeepSpeed

Python 6,710 1,108 Updated Oct 15, 2025

google-research / arxiv-latex-cleaner

arXiv LaTeX Cleaner: Easily clean the LaTeX code of your paper to submit to arXiv

Python 6,544 375 Updated Jun 2, 2025

kingoflolz / mesh-transformer-jax

Model parallel transformers in JAX and Haiku

Python 6,352 892 Updated Jan 21, 2023

zihangdai / xlnet

XLNet: Generalized Autoregressive Pretraining for Language Understanding

Python 6,176 1,167 Updated May 28, 2023

thunlp / OpenNRE

An Open-Source Package for Neural Relation Extraction (NRE)

Python 4,436 1,053 Updated Jan 10, 2024

thunlp / OpenKE

An Open-Source Package for Knowledge Embedding (KE)

Python 3,992 991 Updated Jan 10, 2024

fla-org / flash-linear-attention

🚀 Efficient implementations of state-of-the-art linear attention models

Python 3,779 296 Updated Nov 6, 2025

princeton-nlp / SimCSE

[EMNLP 2021] SimCSE: Simple Contrastive Learning of Sentence Embeddings https://arxiv.org/abs/2104.08821

Python 3,610 531 Updated Oct 16, 2024

google-research / t5x

Python 2,907 336 Updated Nov 6, 2025

cbfinn / maml

Code for "Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks"

Python 2,688 626 Updated Jan 19, 2020

asappresearch / sru

Training RNNs as Fast as CNNs (https://arxiv.org/abs/1709.02755)

Python 2,110 307 Updated Jan 4, 2022

beir-cellar / beir

A Heterogeneous Benchmark for Information Retrieval. Easy to use, evaluate your models across 15+ diverse IR datasets.

Python 1,991 222 Updated Oct 16, 2025

castorini / pyserini

Pyserini is a Python toolkit for reproducible information retrieval research with sparse and dense representations.

Python 1,975 462 Updated Nov 7, 2025

google-research / language

Shared repository for open-sourced projects from the Google AI Language team.

Python 1,720 355 Updated Oct 29, 2025

timoschick / pet

This repository contains the code for "Exploiting Cloze Questions for Few-Shot Text Classification and Natural Language Inference"

Python 1,630 283 Updated Jun 12, 2023

DeepGraphLearning / torchdrug

A powerful and flexible machine learning platform for drug discovery

Python 1,549 213 Updated Aug 12, 2024

mit-han-lab / smoothquant

[ICML 2023] SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models

Python 1,547 187 Updated Jul 12, 2024

thunlp / ERNIE

Source code and dataset for ACL 2019 paper "ERNIE: Enhanced Language Representation with Informative Entities"

Python 1,420 266 Updated Jan 10, 2024

mosaicml / streaming

A Data Streaming Library for Efficient Neural Network Training

Python 1,410 176 Updated Oct 27, 2025

DeepGraphLearning / KnowledgeGraphEmbedding

Python 1,342 269 Updated Dec 15, 2023