Skip to content
View ryancyeung's full-sized avatar

Block or report ryancyeung

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
48 stars written in Python
Clear filter

Robust Speech Recognition via Large-Scale Weak Supervision

Python 90,444 11,325 Updated Sep 8, 2025

TensorFlow code and pre-trained models for BERT

Python 39,629 9,704 Updated Jul 23, 2024

💫 Industrial-strength Natural Language Processing (NLP) in Python

Python 32,769 4,617 Updated Nov 6, 2025

Deezer source separation library including pretrained models.

Python 27,713 3,045 Updated Apr 2, 2025

Faker is a Python package that generates fake data for you.

Python 18,866 2,023 Updated Nov 5, 2025

Bringing Old Photo Back to Life (CVPR 2020 oral)

Python 15,630 2,082 Updated Oct 26, 2023

BertViz: Visualize Attention in NLP Models (BERT, GPT2, BART, etc.)

Python 7,741 847 Updated Jun 1, 2025

A data augmentations library for audio, image, text, and video.

Python 5,056 310 Updated Oct 31, 2025

Meta Lingua: a lean, efficient, and easy-to-hack codebase to research LLMs.

Python 4,732 273 Updated Jul 18, 2025

Free Motion Capture for Everyone 💀✨

Python 3,918 306 Updated Nov 6, 2025

ALBERT: A Lite BERT for Self-supervised Learning of Language Representations

Python 3,273 573 Updated Apr 14, 2023

Beautiful visualizations of how language differs among document types.

Python 2,321 290 Updated Apr 29, 2025

Large Concept Models: Language modeling in a sentence representation space

Python 2,301 202 Updated Jan 29, 2025

A python tool for evaluating the quality of sentence embeddings.

Python 2,107 308 Updated Mar 19, 2024

Toolkit to segment text into sentences or other semantic units in a robust, efficient and adaptable way.

Python 1,186 75 Updated Oct 8, 2025

💥 Use the latest Stanza (StanfordNLP) research models directly in spaCy

Python 738 62 Updated Aug 15, 2024

A bidirectional recurrent neural network model with attention mechanism for restoring missing punctuation in unsegmented text

Python 681 196 Updated Sep 19, 2021

Hierarchical unsupervised and semi-supervised topic models for sparse count data with CorEx

Python 640 119 Updated Mar 22, 2021

Mnemosyne: efficient learning with powerful digital flash-cards.

Python 554 82 Updated Jun 4, 2025

Concurrently detect the minimum Python versions needed to run code

Python 504 28 Updated Nov 3, 2025

An evolving list of electronic media data sets used to model mental-health status.

Python 448 77 Updated Sep 3, 2021

ACRONYM (Acronym CReatiON for You and Me)

Python 417 30 Updated Dec 1, 2022

Annotated dataset of 100 works of fiction to support tasks in natural language processing and the computational humanities.

Python 365 51 Updated Dec 8, 2022

TweetNLP for all the NLP enthusiasts working on Twitter! The Python library tweetnlp provides a collection of useful tools to analyze/understand tweets such as sentiment analysis, emoji prediction,…

Python 364 35 Updated Apr 2, 2025

TopicGPT: A Prompt-Based Framework for Topic Modeling (NAACL'24)

Python 361 61 Updated Mar 15, 2025

analyze text with empath

Python 338 59 Updated Apr 22, 2017

data⎰describe: Pythonic EDA Accelerator for Data Science

Python 302 18 Updated Feb 22, 2023

Code for collecting, processing, and preparing datasets for the Common Pile

Python 238 23 Updated Sep 10, 2025

Python library for Representational Similarity Analysis

Python 223 47 Updated Oct 27, 2025

Nostril: Nonsense String Evaluator

Python 199 35 Updated May 7, 2022
Next