Stars
A curated list of product management advice for technical people.
Kepler.gl is a powerful open source geospatial analysis tool for large-scale data sets.
Documentation for the General Bikeshare Feed Specification, a standardized data feed for shared mobility system availability. Maintained by MobilityData
A data specification to enable right-of-way regulation, digital policy, geofencing, and two-way communication between mobility companies and public agencies worldwide for any regulated, shared vehi…
A sample online store using rails. Video of progress in: https://goo.gl/NYGrTq
Source code for the Kafka Streams in Action Book
🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.
Transform the DOM by selecting elements and joining to data.
Natural Language Processing Best Practices & Examples
Tacotron 2 - PyTorch implementation with faster-than-realtime inference
Code for the paper "Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer"
Stanford NLP Python library for tokenization, sentence segmentation, NER, and parsing of many human languages
DeepSpeech is an open source embedded (offline, on-device) speech-to-text engine which can run in real time on devices ranging from a Raspberry Pi 4 to high power GPU servers.
Miller is like awk, sed, cut, join, and sort for name-indexed data such as CSV, TSV, and tabular JSON
🎓 Path to a free self-taught education in Computer Science!
Unsupervised text tokenizer for Neural Network-based text generation.
Data ingestion library for Amundsen to build graph and search index
Natural Language Processing for the next decade. Tokenization, Part-of-Speech Tagging, Named Entity Recognition, Syntactic & Semantic Dependency Parsing, Document Classification
Synthetic Patient Population Simulator
System design interview for IT companies
Learn how to design large-scale systems. Prep for the system design interview. Includes Anki flashcards.
numeric fused-head identification and resolution
BioWordVec & BioSentVec: pre-trained embeddings for biomedical words and sentences
Definition and DDLs for the OMOP Common Data Model (CDM)
Super easy library for BERT based NLP models