Skip to content
View ydli-ai's full-sized avatar
🎯
Focusing
🎯
Focusing

Organizations

@jenkins-zh @CLUEbenchmark @CVI-SZU

Block or report ydli-ai

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
29 stars written in Python
Clear filter

TensorFlow code and pre-trained models for BERT

Python 39,746 9,708 Updated Jul 23, 2024

Natural Language Processing for the next decade. Tokenization, Part-of-Speech Tagging, Named Entity Recognition, Syntactic & Semantic Dependency Parsing, Document Classification

Python 35,996 10,881 Updated Nov 15, 2025

State-of-the-Art Text Embeddings

Python 18,011 2,718 Updated Dec 15, 2025

A collection of libraries to optimise AI model performances

Python 8,363 633 Updated Jul 22, 2024

Free English to Chinese Dictionary Database

Python 7,205 1,217 Updated Mar 28, 2025

Implementation of the LLaMA language model based on nanoGPT. Supports flash attention, Int8 and GPTQ 4bit quantization, LoRA and LLaMA-Adapter fine-tuning, pre-training. Apache 2.0-licensed.

Python 6,090 527 Updated Jul 1, 2025

🎉 Repo for LaWGPT, Chinese-Llama tuned with Chinese Legal knowledge. 基于中文法律知识的大语言模型

Python 6,026 553 Updated Jun 11, 2024

Implementation / replication of DALL-E, OpenAI's Text to Image Transformer, in Pytorch

Python 5,631 648 Updated Feb 17, 2024

keras implement of transformers for humans

Python 5,423 927 Updated Nov 11, 2024

The RedPajama-Data repository contains code for preparing large datasets for training large language models.

Python 4,899 368 Updated Dec 7, 2024

Easily turn large sets of image urls to an image dataset. Can download, resize and package 100M urls in 20h on one machine.

Python 4,242 365 Updated Oct 19, 2025

Open Source Pre-training Model Framework in PyTorch & Pre-trained Model Zoo

Python 3,104 526 Updated May 9, 2024

4 bits quantization of LLaMA using GPTQ

Python 3,080 460 Updated Jul 13, 2024

Chinese Data Competitions' Solutions

Python 1,772 396 Updated Apr 5, 2019

An implement of the paper of EDA for Chinese corpus.中文语料的EDA数据增强工具。NLP数据增强。论文阅读笔记。

Python 1,383 239 Updated May 31, 2022

Google Summer of Code 中文相关资料

Python 942 76 Updated Sep 5, 2024

ERNIE Pytorch Version

Python 931 121 Updated Jul 26, 2023

ChineseSemanticKB,chinese semantic knowledge base, 面向中文处理的12类、百万规模的语义常用词典,包括34万抽象语义库、34万反义语义库、43万同义语义库等,可支持句子扩展、转写、事件抽象与泛化等多种应用场景。

Python 776 162 Updated Mar 17, 2023

GAOKAO-Bench is an evaluation framework that utilizes GAOKAO questions as a dataset to evaluate large language models.

Python 697 52 Updated Jan 7, 2025

My Keras implementation of the Deep Semantic Similarity Model (DSSM)/Convolutional Latent Semantic Model (CLSM) described here: http://research.microsoft.com/pubs/226585/cikm2014_cdssm_final.pdf.

Python 522 186 Updated Jun 5, 2017

FaRL for Facial Representation Learning [Official, CVPR 2022]

Python 440 25 Updated Jun 9, 2023

[CVPR 2021] Multi-Modal-CelebA-HQ: A Large-Scale Text-Driven Face Generation and Understanding Dataset

Python 252 21 Updated Jun 1, 2024

Synthetic Faces High Quality (SFHQ) Dataset. 425,000 curated 1024x1024 synthetic face images

Python 244 7 Updated Oct 14, 2024

语言模型中文认知能力分析

Python 236 24 Updated Sep 9, 2023

A High-Performance Pytorch Implementation of face detection models, including RetinaFace and DSFD

Python 233 61 Updated Jun 18, 2025

Open Instruction Generalist is an assistant trained on massive synthetic instructions to perform many millions of tasks

Python 210 19 Updated Jan 13, 2024

llama inference for tencentpretrain

Python 99 11 Updated Jun 8, 2023

The homework of DNS Security Course

Python 26 15 Updated Dec 9, 2018
Python 4 Updated Dec 1, 2025