Skip to content
View qiangqiang199's full-sized avatar

Block or report qiangqiang199

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

WanJuan3.0(“万卷·丝路”)一个作为综合性的纯文本语料库,采集了多个国家地区的网络公开信息、文献、专利等资料,数据总规模超1.2TB,Token总数超过300B,处于国际领先水平,首期开源的语料库主要由泰语、俄语、阿拉伯语、韩语和越南语5个子集构成,每个子集的数据规模均超过150GB

42 1 Updated Feb 13, 2025

DocLayout-YOLO: Enhancing Document Layout Analysis through Diverse Synthetic Data and Global-to-Local Adaptive Perception

Python 1,887 144 Updated Apr 14, 2025

[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型

Python 9,646 749 Updated Sep 22, 2025

A Next-Generation Training Engine Built for Ultra-Large MoE Models

Python 5,030 395 Updated Dec 25, 2025

Official release of InternLM series (InternLM, InternLM2, InternLM2.5, InternLM3).

Python 7,131 501 Updated Oct 30, 2025

🔍 An LLM-based Multi-agent Framework of Web Search Engine (like Perplexity.ai Pro and SearchGPT)

JavaScript 6,711 673 Updated Jul 4, 2025

Transforms complex documents like PDFs into LLM-ready markdown/JSON for your Agentic workflows.

Python 51,004 4,236 Updated Dec 24, 2025

A Collection of Foundation Driving Models by OpenDriveLab

Python 783 33 Updated Jul 2, 2025

WanJuan-CC是以CommonCrawl为基础,经过数据抽取,规则清洗,去重,安全过滤,质量清洗等步骤得到的高质量数据。

14 Updated Apr 18, 2024

New ways of breaking app-integrated LLMs

Jupyter Notebook 2,028 140 Updated Jul 17, 2025

MLLM-DataEngine: An Iterative Refinement Approach for MLLM

Python 48 5 Updated May 24, 2024

AAAI 2024: Visual Instruction Generation and Correction

Python 95 3 Updated Feb 4, 2024

万卷1.0多模态语料

569 28 Updated Oct 20, 2023

Automated dense category annotation engine that serves as the initial semantic labeling for the Segment Anything dataset (SA-1B).

Python 2,296 143 Updated Jun 7, 2023

An open-source codebase for exploring autonomous driving pre-training

Python 567 66 Updated Jan 19, 2024

OpenPCSeg: Open Source Point Cloud Segmentation Toolbox and Benchmark

Python 474 46 Updated Apr 27, 2025

Data annotation toolbox supports image, audio and video data.

Python 1,448 159 Updated Oct 1, 2025
Jupyter Notebook 13 6 Updated May 29, 2024

Data annotation component library --provided as NPM packages

TypeScript 141 46 Updated Nov 19, 2025

datasets resource

127 13 Updated Jul 1, 2025

Data Set Description Language Specification (新一代人工智能数据集描述语言DSDL)

HTML 47 6 Updated May 29, 2024

SDK of OpenDataLab - https://opendatalab.org.cn

Python 58 5 Updated Jul 31, 2025