- Berlin
- in/romangrebennikov
Stars
🚀🤖 Crawl4AI: Open-source LLM Friendly Web Crawler & Scraper. Don't be shy, join here: https://discord.gg/jP8KfhDhyN
Code and documentation to train Stanford's Alpaca models, and generate the data.
🤗 The largest hub of ready-to-use datasets for AI models with fast, easy-to-use and efficient data manipulation tools
Unified embedding generation and search engine. Also available on cloud - cloud.marqo.ai
MTEB: Massive Text Embedding Benchmark
dstack is an open-source control plane for running development, training, and inference jobs on GPUs—across hyperscalers, neoclouds, or on-prem.
pytest fixture for benchmarking code
Metric learning and retrieval pipelines, models and zoo.
Finetuning Large Language Models on One Consumer GPU in 2 Bits
Shopping Queries Dataset: A Large-Scale ESCI Benchmark for Improving Product Search
Full text search that feels like a numpy array
Library for automatic retraining and continual learning
Pure-Python Server Side Events (SSE) client
An efficient PyTorch implementation of the evaluation metrics in recommender systems.
Experimental code for our paper on informative and diverse sampling of negative examples for dense retrieval