Highlights
- Pro
Stars
Official codebase for the paper Latent Visual Reasoning
PyTorch implementation of MAE https//arxiv.org/abs/2111.06377
🤖 The analysis of Claude Code
LMCache: Supercharge Your LLM with the Fastest KV Cache Layer
Dataflow-MM, multi-media operators for Dataflow. We aim to prepare data for Multimodal Large Language Models.
[CVPR 2025] MicroVQA eval and 🤖RefineBot code for "MicroVQA: A Multimodal Reasoning Benchmark for Microscopy-Based Scientific Research" code for MicroVQA benchmark and RefineBot method
BiomedCLIP data pipeline
Unofficial implementation of the Dreamer 4 world model in PyTorch.
Easy Data Preparation with latest LLMs-based Operators and Pipelines.
Code and data for "Medical Dialogue Generation via Dual Flow Modeling" (ACL 2023 Findings)
This is the official repository for our paper "Doctor-R1: Mastering Clinical Inquiry with Experiential Agentic Reinforcement Learning" published in ICRL 2026.
轻量级大语言模型MiniMind的源码解读,包含tokenizer、RoPE、MoE、KV Cache、pretraining、SFT、LoRA、DPO等完整流程
[AAAI'26 Oral] DexGraspVLA: A Vision-Language-Action Framework Towards General Dexterous Grasping
AutoDL平台服务器适配梯子, 使用 Clash 作为代理工具
Qwen2.5-Omni is an end-to-end multimodal model by Qwen team at Alibaba Cloud, capable of understanding text, audio, vision, video, and performing real-time speech generation.
A windows application which brings the maximize to new virtual desktop feature in MacBooks to Windows laptops.
Python module for manipulating Windows 10/11 windows and virtual desktops.
Curated embodied AI list: surveys, VLA models, datasets, simulators, humanoids, robot learning, and safety resources.
《大模型白盒子构建指南》:一个全手搓的Tiny-Universe
Training Sparse Autoencoders on Language Models
A library for mechanistic interpretability of GPT-style language models
Create feature-centric and prompt-centric visualizations for sparse autoencoders (like those from Anthropic's published research).
Tensors and Dynamic neural networks in Python with strong GPU acceleration