yurymalkov

Follow

Yury Malkov yurymalkov

Follow

PhD in Laser physics, now doing Computer Science. Author of HNSW (approximate nearest neighbor search).

234 followers · 8 following

OpenAI
San Francisco Bay Area
https://scholar.google.com/citations?user=KvAyakQAAAAJ

Achievements

Achievements

Organizations

Stars

KellerJordan / modded-nanogpt

NanoGPT (124M) in 3 minutes

Python 3,970 520 Updated Dec 17, 2025

xhluca / bm25s

Fast lexical search implementing BM25 in Python using Numpy, Numba and Scipy

Python 1,429 83 Updated Dec 1, 2025

ItzCrazyKns / Perplexica

Perplexica is an AI-powered answering engine. It is an Open source alternative to Perplexity AI

TypeScript 27,749 2,897 Updated Dec 19, 2025

marqo-ai / marqo

Unified embedding generation and search engine. Also available on cloud - cloud.marqo.ai

Python 5,000 225 Updated Dec 19, 2025

getao / icae

The repo for In-context Autoencoder

Jupyter Notebook 157 19 Updated May 11, 2024

jbarrow / tinyhnsw

build your own vector database -- the littlest hnsw

Python 67 2 Updated Jan 7, 2025

hauntsaninja / boostedblob

Command line tool and async library to perform basic file operations on local paths, Google Cloud Storage paths and Azure Blob Storage paths.

Python 35 29 Updated Dec 19, 2025

spotify / voyager

🛰️ An approximate nearest-neighbor search library for Python and Java with a focus on ease of use, simplicity, and deployability.

C++ 1,526 80 Updated Sep 25, 2025

vec2text / vec2text

utilities for decoding deep representations (like sentence embeddings) back to text

Python 1,025 112 Updated Aug 5, 2025

zilliztech / GPTCache

Semantic cache for LLMs. Fully integrated with LangChain and llama_index.

Python 7,878 567 Updated Jul 11, 2025

datastax / jvector

JVector: the most advanced embedded vector search engine

Java 1,662 143 Updated Dec 11, 2025

madaan / minimal-text-diffusion

A minimal implementation of diffusion models for text generation

Python 407 37 Updated May 11, 2023

zilliztech / VectorDBBench

Benchmark for vector databases.

Python 968 305 Updated Dec 19, 2025

yandex-research / tabular-dl-tabr

The implementation of "TabR: Unlocking the Power of Retrieval-Augmented Tabular Deep Learning"

Python 312 34 Updated Nov 17, 2025

iitmdinesh / image2text

Image captioning from scratch (or pre-trained vision/language models) using transformers

Python 7 Updated Feb 13, 2025

hora-search / hora

🚀 efficient approximate nearest neighbor search algorithm collections library written in Rust 🦀 .

Rust 2,654 77 Updated Jan 31, 2024

neondatabase / pg_embedding

Hierarchical Navigable Small World (HNSW) algorithm for vector similarity search in PostgreSQL

C 574 27 Updated Dec 14, 2023

zhao-lang / redis_hnsw

HSNW module for Redis

Rust 59 3 Updated Sep 1, 2020

unum-cloud / USearch

Fast Open-Source Search & Clustering engine × for Vectors & Arbitrary Objects × in C++, C, Python, JavaScript, Rust, Java, Objective-C, Swift, C#, GoLang, and Wolfram 🔍

C++ 3,479 249 Updated Nov 30, 2025

openai / prm800k

800,000 step-level correctness labels on LLM solutions to MATH problems

Python 2,081 122 Updated Jun 1, 2023

NVlabs / ODISE

Official PyTorch implementation of ODISE: Open-Vocabulary Panoptic Segmentation with Text-to-Image Diffusion Models [CVPR 2023 Highlight]

Python 929 53 Updated Jul 6, 2024

ray-project / llm-numbers

Numbers every LLM developer should know

4,275 140 Updated Jan 16, 2024

zilliztech / pyglass

Graph Library for Approximate Similarity Search

C++ 136 24 Updated Sep 9, 2025

ShravanSunder / hnswlib-wasm

hnswlib-wasm attempts to create a browser friendly version of hnswlib

C++ 60 15 Updated Jul 21, 2023

yoshoku / hnswlib-node

hnswlib-node provides Node.js bindings for Hnswlib

C++ 126 12 Updated Dec 12, 2025

LudwigStumpp / llm-leaderboard

A joint community effort to create one central leaderboard for LLMs.

Python 308 36 Updated Aug 23, 2024

FrancescoSaverioZuppichini / LinkedInGPT

Skynet

Python 87 16 Updated Jun 20, 2023

facebookresearch / ImageBind

ImageBind One Embedding Space to Bind Them All

Python 8,905 835 Updated Nov 21, 2025

hpcaitech / ColossalAI

Making large AI models cheaper, faster and more accessible

Python 41,297 4,545 Updated Dec 8, 2025

mlc-ai / mlc-llm

Universal LLM Deployment Engine with ML Compilation

Python 21,764 1,889 Updated Dec 11, 2025