用于公式识别任务的端到端合成数据集生成流水线,无需系统级 LaTeX。
-
Updated
Nov 30, 2025 - Python
用于公式识别任务的端到端合成数据集生成流水线,无需系统级 LaTeX。
Automatic machine-learning dataset processing pipelines, using an LLM
Labels images automatically for object detection and image classification tasks using zero-shot models.
A crew of AI agents are here to help you out to collect desired data.
SigilDERG Data Production is an enterprise-grade Rust pipeline that crawls crates, runs rigorous scans (Clippy, Geiger, license checks), and generates instruction-style JSONL shards. It features semantic chunking, configurable splits, observability, and seamless SigilDERG ecosystem integration.
Tool build to generate and edit training dataset for sport activity recognition AI.
In this git you can find the dataset preprocessing and handling.
A simple python tkinter image cell tagger
OCR Dataset creation and Image Augmentations like scan, curve and perspective noise
Jumla is a Python package for generating Lean 4 formal verification tasks from Python specifications.
A bunch of cli tools for deep learning and computer vision.
Script that captures frames from the computer's webcam, tries to detect a face in the frame and then saves the normalized distance between the center of the face detected and 60 facepoints in a Dataframe that can be saved as a dataset.
Generates high-quality fine-tuning pairs for large language models (LLMs) from unstructured documents.
Simple dataset creator in COCO-format.
Wiki scraper and dataset generator for LLM finetuning
Simple HDF5 format NCHW image dataset creator
Dataset preparation for VikX
A project focused on constructing autonomous UAV navigation datasets for natural environments, leveraging generative AI to enhance and generate training data from images of Taiwan's roads and rivers.
MNIST dataset for detection with multiple scales.
A tkinter based app to let the user visually compare images of the same name located in selected folders.
Add a description, image, and links to the dataset-generation topic page so that developers can more easily learn about it.
To associate your repository with the dataset-generation topic, visit your repo's landing page and select "manage topics."