Skip to content
View mei233's full-sized avatar

Block or report mei233

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Implement a ChatGPT-like LLM in PyTorch from scratch, step by step

Jupyter Notebook 81,423 12,176 Updated Dec 21, 2025

[ICCV 2025] LLaVA-CoT, a visual language model capable of spontaneous, systematic reasoning

Python 2,108 82 Updated Dec 12, 2025

A LITE BERT FOR SELF-SUPERVISED LEARNING OF LANGUAGE REPRESENTATIONS, 海量中文预训练ALBERT模型

Python 3,988 750 Updated Nov 21, 2022

Official release of InternLM series (InternLM, InternLM2, InternLM2.5, InternLM3).

Python 7,128 501 Updated Oct 30, 2025

Transforms complex documents like PDFs into LLM-ready markdown/JSON for your Agentic workflows.

Python 50,789 4,211 Updated Dec 16, 2025

Streaming data platform. Real-time stream processing, low-latency serving, and Iceberg table management.

Rust 8,621 715 Updated Dec 21, 2025

My own note about Financial Market module at Yale University on Coursera

54 19 Updated Apr 4, 2023

Open Source AI Platform - AI Chat with advanced features that works with every LLM

Python 16,877 2,253 Updated Dec 21, 2025

docTR (Document Text Recognition) - a seamless, high-performing & accessible library for OCR-related tasks powered by Deep Learning.

Python 5,704 602 Updated Dec 14, 2025

This repository contains the code and implementation details of the CascadeTabNet paper "CascadeTabNet: An approach for end to end table detection and structure recognition from image-based documents"

Python 1 Updated Aug 27, 2021

Convert PDF to HTML without losing text or format.

HTML 5,347 496 Updated Jul 17, 2025

C++ implementation of the Brown word clustering algorithm.

C++ 430 130 Updated Sep 10, 2023

🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.

Python 154,115 31,501 Updated Dec 21, 2025

Exercises for the XQuery Workshops at XQuery at DH2017

XQuery 1 Updated May 11, 2018

搜集、整理、发布 预训练 中文 词向量/字向量,与 有志之士 共同 促进 中文 自然语言处理 的 发展。

147 32 Updated Jun 1, 2018

中文自然语言处理 (NLP) 标注工具,与 有志之士 共同 促进 中文 自然语言处理 的 发展。

Python 156 33 Updated Jun 22, 2018

搜集、整理、发布 中文 自然语言处理 语料/数据集,与 有志之士 共同 促进 中文 自然语言处理 的 发展。

Jupyter Notebook 6,420 1,423 Updated Jan 29, 2019