-
South China University of Technology
Stars
Official Repo for Paper "Can MLLMs Read Students' Minds? Unpacking Multimodal Error Analysis in Handwritten Math"
Font Synthesis with Pixel-Space Diffusion Transformer
Official Code for ICDAR2025 poster 'DevInSight: Weaving Path Development Into Online Signature Verification'
BPE tokenizer for digital ink (online handwriting) using directional decomposition via Bresenham's line algorithm
ScribeTokens: Digital-ink tokenization via Bresenham decomposition and BPE over Freeman chain codes
This repo contains a curated list of research papers and resources focusing on Handwritten Text Generation (HTG)
[NeurIPS'25 Spotlight🔥]Official implementation of Uni-MuMER: Unified Multi-Task Fine-Tuning of Vision-Language Model for Handwritten Mathematical Expression Recognition
Official implementation for AAAI 2025 paper: TAMER: Tree-Aware Transformer for Handwritten Mathematical Expression Recognition
Official implementation of URaG: Unified Retrieval and Generation in Multimodal LLMs for Efficient Long Document Understanding (AAAI 2026 Oral).
Syntax-Aware Network for Handwritten Mathematical Expression Recognition
Official PyTorch Implementation of "Diffusion Transformers with Representation Autoencoders"
Official PyTorch implementation for "Zero-Shot Styled Text Image Generation, but Make It Autoregressive" (CVPR25)
A handwritten Chemical Structure Image data set named EDU-CHEMC, which consists of totally 52,987 handwritten molecular structure images collected in educational scenarios.
[ICLR 2026] DiffInk: Glyph- and Style-Aware Latent Diffusion Transformer for Text to Online Handwriting Generation
ICDAR2025 Best Paper "Template-Guided Cascaded Diffusion for Stylized Handwritten Chinese Text-Line Generation"
Code for "Unifying Molecular and Textual Representations via Multi-task Language Modelling" @ ICML 2023
Online Handwritten Text Recognition (HTR) system implemented with PyTorch. Based on https://doi.org/10.1007/s10032-020-00350-4.
Official repository for CalliReader: Contextualizing Chinese Calligraphy via an Embedding-aligned Vision Language Model [ICCV 2025]
MambaSTR: Scene Text Recognition with Masked State Space Model
Wan: Open and Advanced Large-Scale Video Generative Models
Qwen-Image is a powerful image generation foundation model capable of complex text rendering and precise image editing.
Text-audio foundation model from Boson AI