Highlights
- Pro
Stars
A Python library for extracting structured information from unstructured text using LLMs with precise source grounding and interactive visualization.
Curated list of data science interview questions and answers
An AI-powered research assistant that performs iterative, deep research on any topic by combining search engines, web scraping, and large language models. The goal of this repo is to provide the si…
Fully local web research and report writing assistant
LLM-powered framework for deep document understanding, semantic retrieval, and context-aware answers using RAG paradigm.
A curated list of recent diffusion models for video generation, editing, and various other applications.
Tongyi Deep Research, the Leading Open-source Deep Research Agent
A python module to repair invalid JSON from LLMs
Resources and paper list for "Thinking with Images for LVLMs". This repository accompanies our survey on how LVLMs can leverage visual information for complex reasoning, planning, and generation.
Get started with building Fullstack Agents using Gemini 2.5 and LangGraph
Leveraging BERT and c-TF-IDF to create easily interpretable topics.
Train transformer language models with reinforcement learning.
Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)
This is a python API which allows you to get the transcript/subtitles for a given YouTube video. It also works for automatically generated subtitles and it does not require an API key nor a headles…
The Unofficial TikTok API Wrapper In Python
DeepSeek-VL: Towards Real-World Vision-Language Understanding
DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding
The image prompt adapter is designed to enable a pretrained text-to-image diffusion model to generate images with image prompt.
Wan: Open and Advanced Large-Scale Video Generative Models
RAGFlow is a leading open-source Retrieval-Augmented Generation (RAG) engine that fuses cutting-edge RAG with Agent capabilities to create a superior context layer for LLMs
Qwen3-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
Solve Visual Understanding with Reinforced VLMs
VideoGen-Eval: Agent-based System for Video Generation Evaluation
Janus-Series: Unified Multimodal Understanding and Generation Models
Tarsier -- a family of large-scale video-language models, which is designed to generate high-quality video descriptions , together with good capability of general video understanding.
An extremely fast Python package and project manager, written in Rust.
A fast Rust based tool to serialize text-based files in a repository or directory for LLM consumption