Skip to content
View ydli-ai's full-sized avatar
🎯
Focusing
🎯
Focusing

Organizations

@jenkins-zh @CLUEbenchmark @CVI-SZU

Block or report ydli-ai

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results
Python 4 Updated Dec 1, 2025

Graphic notes on Gilbert Strang's "Linear Algebra for Everyone"

PostScript 20,962 2,517 Updated Jun 30, 2025

Generative Agents: Interactive Simulacra of Human Behavior

20,168 2,786 Updated Aug 5, 2024

ReLE评测:中文AI大模型能力评测(持续更新):目前已囊括335个大模型,覆盖chatgpt、gpt-5.2、o4-mini、谷歌gemini-3-pro、Claude-4.5、文心ERNIE-X1.1、ERNIE-5.0-Thinking、qwen3-max、百川、讯飞星火、商汤senseChat等商用模型, 以及kimi-k2、ernie4.5、minimax-M2、deepseek-…

5,280 211 Updated Dec 15, 2025

语言模型中文认知能力分析

Python 236 24 Updated Sep 9, 2023

Implementation of the LLaMA language model based on nanoGPT. Supports flash attention, Int8 and GPTQ 4bit quantization, LoRA and LLaMA-Adapter fine-tuning, pre-training. Apache 2.0-licensed.

Python 6,090 527 Updated Jul 1, 2025

🎉 Repo for LaWGPT, Chinese-Llama tuned with Chinese Legal knowledge. 基于中文法律知识的大语言模型

Python 6,026 553 Updated Jun 11, 2024

MNBVC(Massive Never-ending BT Vast Chinese corpus)超大规模中文语料集。对标chatGPT训练的40T数据。MNBVC数据集不但包括主流文化,也包括各个小众文化甚至火星文的数据。MNBVC数据集包括新闻、作文、小说、书籍、杂志、论文、台词、帖子、wiki、古诗、歌词、商品介绍、笑话、糗事、聊天记录等一切形式的纯文本中文数据。

4,058 285 Updated Nov 26, 2025

The RedPajama-Data repository contains code for preparing large datasets for training large language models.

Python 4,899 368 Updated Dec 7, 2024

GAOKAO-Bench is an evaluation framework that utilizes GAOKAO questions as a dataset to evaluate large language models.

Python 697 52 Updated Jan 7, 2025

BELLE: Be Everyone's Large Language model Engine(开源中文对话大模型)

HTML 8,279 768 Updated Oct 16, 2024

We unified the interfaces of instruction-tuning data (e.g., CoT data), multiple LLMs and parameter-efficient methods (e.g., lora, p-tuning) together for easy use. We welcome open-source enthusiasts…

Jupyter Notebook 2,790 253 Updated Dec 12, 2023

llama inference for tencentpretrain

Python 99 11 Updated Jun 8, 2023

Open Instruction Generalist is an assistant trained on massive synthetic instructions to perform many millions of tasks

Python 210 19 Updated Jan 13, 2024

LLM inference in C/C++

C++ 91,470 14,138 Updated Dec 17, 2025

4 bits quantization of LLaMA using GPTQ

Python 3,080 460 Updated Jul 13, 2024

pCLUE: 1000000+多任务提示学习数据集

Jupyter Notebook 504 59 Updated Oct 4, 2022

A collection of libraries to optimise AI model performances

Python 8,363 633 Updated Jul 22, 2024

A High-Performance Pytorch Implementation of face detection models, including RetinaFace and DSFD

Python 233 61 Updated Jun 18, 2025

An implement of the paper of EDA for Chinese corpus.中文语料的EDA数据增强工具。NLP数据增强。论文阅读笔记。

Python 1,383 239 Updated May 31, 2022

FaRL for Facial Representation Learning [Official, CVPR 2022]

Python 440 25 Updated Jun 9, 2023

Synthetic Faces High Quality (SFHQ) Dataset. 425,000 curated 1024x1024 synthetic face images

Python 244 7 Updated Oct 14, 2024

State-of-the-Art Text Embeddings

Python 18,011 2,718 Updated Dec 15, 2025

Open Source Pre-training Model Framework in PyTorch & Pre-trained Model Zoo

Python 3,104 526 Updated May 9, 2024

Easily turn large sets of image urls to an image dataset. Can download, resize and package 100M urls in 20h on one machine.

Python 4,242 365 Updated Oct 19, 2025

Implementation / replication of DALL-E, OpenAI's Text to Image Transformer, in Pytorch

Python 5,631 648 Updated Feb 17, 2024

WIT (Wikipedia-based Image Text) Dataset is a large multimodal multilingual dataset comprising 37M+ image-text sets with 11M+ unique images across 100+ languages.

1,090 44 Updated Sep 27, 2024

[CVPR 2021] Multi-Modal-CelebA-HQ: A Large-Scale Text-Driven Face Generation and Understanding Dataset

Python 252 21 Updated Jun 1, 2024

Free English to Chinese Dictionary Database

Python 7,205 1,217 Updated Mar 28, 2025
Next