Skip to content
View zzxxxl's full-sized avatar
💭
lonely and anxious
💭
lonely and anxious

Highlights

  • Pro

Block or report zzxxxl

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

A low-latency & high-throughput serving engine for LLMs

Python 223 29 Updated Sep 12, 2024

[ICLR 2024] Efficient Streaming Language Models with Attention Sinks

Python 6,626 363 Updated Jul 11, 2024

[MLSys 2024 Best Paper Award] AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration

Python 2,481 191 Updated Oct 16, 2024

[ICML 2023] SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models

Python 1,235 144 Updated Jul 12, 2024

DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads

Python 311 14 Updated Oct 30, 2024

KAG is a knowledge-enhanced generation framework based on OpenSPG engine, which is used to build knowledge-enhanced rigorous decision-making and information retrieval knowledge services

Python 331 18 Updated Oct 30, 2024

Code and documents of LongLoRA and LongAlpaca (ICLR 2024 Oral)

Python 2,621 274 Updated Aug 14, 2024

This includes the original implementation of SELF-RAG: Learning to Retrieve, Generate and Critique through self-reflection by Akari Asai, Zeqiu Wu, Yizhong Wang, Avirup Sil, and Hannaneh Hajishirzi.

Python 1,806 167 Updated May 25, 2024

[EMNLP'23, ACL'24] To speed up LLMs' inference and enhance LLM's perceive of key information, compress the prompt and KV-Cache, which achieves up to 20x compression with minimal performance loss.

Python 4,571 253 Updated Aug 22, 2024

华中科技大学计算机学院2019级系统能力培养DBMS方向_扌四去队

C++ 4 Updated Oct 27, 2022

Huazhong University of Science and Technology System Capability Training-DBMS.华中科技大学系统能力培养-DBMS

C++ 7 Updated Dec 18, 2023

A collection of LLM papers, blogs, and projects, with a focus on OpenAI o1 and reasoning techniques.

4,919 273 Updated Oct 23, 2024
Jupyter Notebook 2,549 768 Updated Jul 9, 2024

Collecting awesome papers of RAG for AIGC. We propose a taxonomy of RAG foundations, enhancements, and applications in paper "Retrieval-Augmented Generation for AI-Generated Content: A Survey".

1,218 86 Updated Aug 20, 2024

Awesome-LLM-RAG: a curated list of advanced retrieval augmented generation (RAG) in Large Language Models

934 61 Updated Sep 27, 2024

🎉 Modern CUDA Learn Notes with PyTorch: fp32/tf32, fp16/bf16, fp8/int8, flash_attn, rope, sgemm, sgemv, warp/block reduce, dot, elementwise, softmax, layernorm, rmsnorm.

Cuda 1,345 149 Updated Oct 30, 2024

Graph-structured Indices for Scalable, Fast, Fresh and Filtered Approximate Nearest Neighbor Search

C++ 1,114 218 Updated Oct 24, 2024

Flash Attention in ~100 lines of CUDA (forward pass only)

Cuda 598 53 Updated Apr 7, 2024

An LLM Based Diagnosis System (https://arxiv.org/pdf/2312.01454.pdf)

Python 553 77 Updated Sep 13, 2024

Kernl lets you run PyTorch transformer models several times faster on GPU with a single line of code, and is designed to be easily hackable.

Jupyter Notebook 1,529 95 Updated Feb 16, 2024

🤖 The free, Open Source alternative to OpenAI, Claude and others. Self-hosted and local-first. Drop-in replacement for OpenAI, running on consumer-grade hardware. No GPU required. Runs gguf, transf…

C++ 24,273 1,861 Updated Oct 30, 2024

Dify is an open-source LLM app development platform. Dify's intuitive interface combines AI workflow, RAG pipeline, agent capabilities, model management, observability features and more, letting yo…

TypeScript 50,106 7,170 Updated Oct 30, 2024

RAGFlow is an open-source RAG (Retrieval-Augmented Generation) engine based on deep document understanding.

Python 21,329 2,087 Updated Oct 30, 2024

Langchain-Chatchat(原Langchain-ChatGLM)基于 Langchain 与 ChatGLM, Qwen 与 Llama 等语言模型的 RAG 与 Agent 应用 | Langchain-Chatchat (formerly langchain-ChatGLM), local knowledge based LLM (like ChatGLM, Qwen and…

TypeScript 31,793 5,547 Updated Oct 15, 2024

LlamaIndex is a data framework for your LLM applications

Python 36,378 5,193 Updated Oct 30, 2024

🦜🔗 Build context-aware reasoning applications

Jupyter Notebook 94,128 15,209 Updated Oct 30, 2024

A Survey on Benchmarks of Multimodal Large Language Models

56 2 Updated Oct 12, 2024

Ceph is a distributed object, block, and file storage platform

C++ 14,128 6,008 Updated Oct 30, 2024

ASTRA-sim2.0: Modeling Hierarchical Networks and Disaggregated Systems for Large-model Training at Scale

C++ 260 111 Updated Oct 28, 2024
Next