Skip to content
View yg37's full-sized avatar
  • San Francisco

Block or report yg37

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

A curated list of product management advice for technical people.

4,267 833 Updated Jul 1, 2024

Kepler.gl is a powerful open source geospatial analysis tool for large-scale data sets.

TypeScript 11,745 1,932 Updated Apr 8, 2026

Documentation for the General Bikeshare Feed Specification, a standardized data feed for shared mobility system availability. Maintained by MobilityData

892 304 Updated Apr 7, 2026

A data specification to enable right-of-way regulation, digital policy, geofencing, and two-way communication between mobility companies and public agencies worldwide for any regulated, shared vehi…

731 248 Updated Apr 9, 2026

A sample online store using rails. Video of progress in: https://goo.gl/NYGrTq

Ruby 2 Updated Aug 19, 2016

Source code for the Kafka Streams in Action Book

Java 269 179 Updated Jul 11, 2021

🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.

Python 159,152 32,820 Updated Apr 10, 2026

Human-readable reference marks for scales.

JavaScript 208 107 Updated Oct 6, 2023

Transform the DOM by selecting elements and joining to data.

JavaScript 569 287 Updated Jan 3, 2025

Natural Language Processing Best Practices & Examples

Python 6,443 912 Updated Aug 30, 2022

Tacotron 2 - PyTorch implementation with faster-than-realtime inference

Jupyter Notebook 5,303 1,425 Updated Jun 12, 2024

Code for the paper "Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer"

Python 6,505 794 Updated Jan 14, 2026

Stanford NLP Python library for tokenization, sentence segmentation, NER, and parsing of many human languages

Python 7,767 942 Updated Apr 9, 2026

DeepSpeech is an open source embedded (offline, on-device) speech-to-text engine which can run in real time on devices ranging from a Raspberry Pi 4 to high power GPU servers.

C++ 26,748 4,102 Updated Jun 19, 2025

Miller is like awk, sed, cut, join, and sort for name-indexed data such as CSV, TSV, and tabular JSON

Go 9,835 233 Updated Apr 10, 2026

🎓 Path to a free self-taught education in Computer Science!

HTML 203,023 25,254 Updated Mar 27, 2026

Unsupervised text tokenizer for Neural Network-based text generation.

C++ 11,748 1,335 Updated Apr 8, 2026

Data ingestion library for Amundsen to build graph and search index

Python 204 205 Updated Mar 13, 2024

中文分词 词性标注 命名实体识别 依存句法分析 成分句法分析 语义依存分析 语义角色标注 指代消解 风格转换 语义相似度 新词发现 关键词短语提取 自动摘要 文本分类聚类 拼音简繁转换 自然语言处理

Python 36,235 10,898 Updated Nov 15, 2025

SparkOnHBase

Scala 278 174 Updated Mar 30, 2021

Synthetic Patient Population Simulator

Java 3,083 855 Updated Mar 19, 2026

NLP, before and after spaCy

Python 2,239 249 Updated Sep 22, 2023

System design interview for IT companies

23,085 5,210 Updated Apr 3, 2023

Apache Flink

Java 25,936 13,913 Updated Apr 10, 2026

Learn how to design large-scale systems. Prep for the system design interview. Includes Anki flashcards.

Python 342,119 55,272 Updated Mar 20, 2026

numeric fused-head identification and resolution

Jupyter Notebook 33 3 Updated Oct 16, 2019

BioWordVec & BioSentVec: pre-trained embeddings for biomedical words and sentences

Jupyter Notebook 611 99 Updated Aug 15, 2023

A BERT model for scientific text.

Python 1,687 232 Updated Feb 22, 2022

Definition and DDLs for the OMOP Common Data Model (CDM)

HTML 1,033 492 Updated Nov 5, 2025

Super easy library for BERT based NLP models

Python 1,920 340 Updated Aug 19, 2024
Next