Lists (1)
Sort Name ascending (A-Z)
Stars
A GPT-empowered penetration testing tool
The open-source RAG platform: built-in citations, deep research, 22+ file formats, partitions, MCP server, and more.
A collection of (mostly) technical things every software developer should know about
This repository is the source code for examples and illustrations discussed in the book - A Simple Introduction to Retrieval Augmented Generation
Demystify RAG by building it from scratch. Local LLMs, no black boxes - real understanding of embeddings, vector search, retrieval, and context-augmented generation.
The 500 AI Agents Projects is a curated collection of AI agent use cases across various industries. It showcases practical applications and provides links to open-source projects for implementation…
Tiny Model, Big Logic: Diversity-Driven Optimization Elicits Large-Model Reasoning Ability in VibeThinker-1.5B
Text preprocessing, representation and visualization from zero to hero.
Data Science at the Command Line
Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
A fast type checker and language server for Python
MIMIC Code Repository: Code shared by the research community for the MIMIC family of databases
🏥🤖 Query MIMIC-IV medical data using natural language through Model Context Protocol (MCP). Transform healthcare research with AI-powered database interactions - supports both local MIMIC-IV SQLite…
verl: Volcano Engine Reinforcement Learning for LLMs
Send a phone call from AI agent, in an API call. Or, directly call the bot from the configured phone number!
SQL Native Memory Layer for LLMs, AI Agents & Multi-Agent Systems
Milvus is a high-performance, cloud-native vector database built for scalable vector ANN search
CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image
Private Evolution: Generating DP Synthetic Data without Training [ICLR 2024, ICML 2024 Spotlight]
Summaries and resources for Designing Machine Learning Systems book (Chip Huyen, O'Reilly 2022)
This repository focusing on comparing different synthetic data generators on Social Sciences Data.
Context-optimized MCP server for web scraping. Reduces LLM token usage by 70-90% through server-side CSS filtering and HTML-to-markdown conversion.
LLM-powered framework for deep document understanding, semantic retrieval, and context-aware answers using RAG paradigm.