-
Salesforce Research
- https://tingofurro.github.io/
Stars
🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.
A high-throughput and memory-efficient inference and serving engine for LLMs
💫 Industrial-strength Natural Language Processing (NLP) in Python
Automated Machine Learning with scikit-learn
CodeGen is a family of open-source model for program synthesis. Trained on TPU-v4. Competitive with OpenAI Codex.
Html Content / Article Extractor, web scrapping lib in Python
Apache DevLake is an open-source dev data platform to ingest, analyze, and visualize the fragmented data from DevOps tools, extracting insights for engineering excellence, developer experience, and…
GPU Accelerated t-SNE for CUDA with Python bindings
Graph-structured Indices for Scalable, Fast, Fresh and Filtered Approximate Nearest Neighbor Search
Bringing BERT into modernity via both architecture changes and scaling
Textbook on reinforcement learning from human feedback
scripts and baselines for Spider: Yale complex and cross-domain semantic parsing and text-to-SQL challenge
CPU and GPU-accelerated Machine Learning Library
A library for prompt engineering and optimization (SAMMO = Structure-aware Multi-Objective Metaprompt Optimization)
UnifiedQA: Crossing Format Boundaries With a Single QA System
A simple HTML content extractor in Python. Can be run as a wrapper for Mozilla's Readability.js package or in pure-python mode.
A Python script to decode Google News article URLs.
[ACL 2024] This is the code repo for our ACL’24 paper "Cleaner Pretraining Corpus Curation with Neural Web Scraping".
Code that accompanies the public release of the paper Lost in Conversation (https://arxiv.org/abs/2505.06120)
MiniCheck: Efficient Fact-Checking of LLMs on Grounding Documents [EMNLP 2024]
The repo containing the Critical Role Dungeons and Dragons Dataset.