Automatic machine-learning dataset processing pipelines, using an LLM
-
Updated
Aug 31, 2024 - Python
Automatic machine-learning dataset processing pipelines, using an LLM
Labels images automatically for object detection and image classification tasks using zero-shot models.
A crew of AI agents are here to help you out to collect desired data.
用于公式识别任务的端到端合成数据集生成流水线,无需系统级 LaTeX。
SigilDERG Data Production is an enterprise-grade Rust pipeline that crawls crates, runs rigorous scans (Clippy, Geiger, license checks), and generates instruction-style JSONL shards. It features semantic chunking, configurable splits, observability, and seamless SigilDERG ecosystem integration.
Tool build to generate and edit training dataset for sport activity recognition AI.
In this git you can find the dataset preprocessing and handling.
In this repository I have done web scrapping from the official website of cric-buzz and extracted data from it ,Also formed a dataset using the data for further use
The Python script to extract JPG images and categorize them into different classes of dementia using the NIFTI data
A simple python tkinter image cell tagger
IDC (Image Dataset Creator) Tool for Windows
Modular R web-scraping framework that crawls sitemaps, aggregates links by date range, and extracts target HTML fields using the paperboy package (German newspapers)
OCR Dataset creation and Image Augmentations like scan, curve and perspective noise
Dataset do Cadastro Nacional de Pessoas Jurídicas (CNPJ)
Jumla is a Python package for generating Lean 4 formal verification tasks from Python specifications.
Maximum dissimilarity with small group and abnormal data filters
Creating your own dataset. Scraping multiple pages of data from the IMDB website, in a single script, to fetch top 1000 movies metadata.
A bunch of cli tools for deep learning and computer vision.
A package for automatic dataset collection, annotation and generating semantic labels using ROS. Dataset format supported is VOC and KITTI.
Add a description, image, and links to the dataset-generation topic page so that developers can more easily learn about it.
To associate your repository with the dataset-generation topic, visit your repo's landing page and select "manage topics."