⚡ World’s Fastest Vector Database for AI & RAG
-
Updated
Dec 7, 2025 - Python
⚡ World’s Fastest Vector Database for AI & RAG
vector db built by someone with no idea how to build a vector db
Multilingual toolkit for evaluating LLMs using embeddings
An open-source project for crawling RSS feeds and websites, extracting news content, and storing it with vector embeddings for semantic search, clustering and visualization..
Learning project: modular RAG pipeline for legal document search & Q&A using SBERT, Pinecone, and FastAPI.
RAG Mini Project — Retrieval‑Augmented Generation chatbot with FastAPI backend (Docker on Hugging Face Spaces) and Streamlit frontend (Render), featuring document ingestion, vector search, and LLM‑powered answers
A command-line tool to index and perform hybrid semantic & lexical search over text files
Demonstrating RAG with streamlit.
Experimenting with Pinecone as vector data continues to take center stage in AI-native systems. The purpose of this project is to explore the core capabilities, benchmark performance across different embedding models, and better understand what is possible with vector search in production environments.
A Python dictionary that uses semantic similarity for key matching instead of exact matches. This library allows you to retrieve values using keys that are semantically similar to the ones stored, making it ideal for natural language interfaces, etc.
A Python-based semantic search system to find relevant transcript chunks based on user queries. Supports TF-IDF and Hugging Face LLM (llm2) search methods, with a Streamlit web interface and CLI for interactive querying. Outputs results in the format [timestamp], <chunk> and logs them to output/output.txt.
A Streamlit app to evaluate the accuracy of automatic speech recognition (ASR) transcription services.
Embedding Studio is a framework which allows you transform your Vector Database into a feature-rich Search Engine.
Contextual Code Exploration for Developers
A Streamlit app to visualize text similarity using embeddings and cosine distance. Compare and analyze texts interactively!
Localume is a powerful desktop application that enables semantic search across your documents using advanced vector embeddings and retrieval technology. The application monitors specified directories in real-time, automatically indexing new and modified files to maintain an up-to-date searchable database.
An essentia-based tool for extracting features from a collection of audio files. Two simple user interfaces, to create playlists and explore track similarities based on extracted audio features and embeddings.
Dockerized application that embeds text in a pgvecto.rs database and retrieves data with a similarity search to generate a response with an llm from ollama.
Building an Event Retrieval System from Visual Data participating in Ho Chi Minh's AI Challenge in 2024
A Cross-Lingual, Context-Aware and Fully-Neural Sentence Alignment System for Long Texts.
Add a description, image, and links to the embeddings-similarity topic page so that developers can more easily learn about it.
To associate your repository with the embeddings-similarity topic, visit your repo's landing page and select "manage topics."