-
gdal
GDAL bindings for Rust
-
edgefirst-cli
EdgeFirst Client Library and CLI
-
aws-sdk-dataexchange
AWS SDK for AWS Data Exchange
-
vectordata
tools for dataset.yaml
-
reinfer-cli
Command line interface for Re:infer, the conversational data intelligence platform
-
netcdf3
A pure Rust library for reading and writing NetCDF-3 files
-
rdftk_core
core RDF data model; concrete implementations for Statements and Literals, along with a Resource type that provides a builder-like experience for models
-
hf-xet
Client library and tooling for the Hugging Face Xet data storage system
-
aws-sdk-cognitosync
AWS SDK for Amazon Cognito Sync
-
imperative
Check for imperative mood in text
-
scirs2-datasets
Datasets module for SciRS2 (scirs2-datasets)
-
perspective-server
A data visualization and analytics component, especially well-suited for large and/or streaming datasets
-
slabtastic
A streamable, readable, writeable, randomly accessible file format
-
hdf5-pure
Pure-Rust HDF5 writer library (WASM-compatible, no C dependencies)
-
rustsight
fast, safe CLI tool for dataset analysis and validation. Analyzes CSV files for column types, missing values, basic statistics (min/max/mean), outliers, no-variance columns, and mixed-type…
-
edgefirst-client
EdgeFirst Client Library and CLI
-
anima-tagger-cli
Command-line interface for anima-tagger: tag, caption, and export Stable Diffusion LoRA datasets
-
wikiwho
Fast Rust reimplementation of the WikiWho algorithm for fine-grained authorship attribution on large datasets. Optimized for easy integration in multi-threaded applications.
-
cjval
Schema-validation of CityJSON/Seq datasets
-
dataset-writer
write CSV/Arrow/Parquet files concurrently
-
hdf5-io
Pure Rust HDF5 file reader supporting superblock v2 and v3
-
mariadb-mysql-kbs
An index of the MariaDB and MySQL Knowledge bases
-
z_osmf
z/OSMF Client
-
amlich-core
Vietnamese Lunar Calendar - Core calculation engine
-
hdf5-pure-rust
Pure Rust implementation of the HDF5 file format
-
rdftk_io
traits for reading and writing Statements and Graphs as well as implementations of these for common representations
-
ridal
Speeding up Ground Penetrating Radar (GPR) processing
-
rolling-median
Compute the median using a 'rolling' (online) algorithm
-
ppl
A structured parallel programming library for Rust
-
dicom-test-files
A collection of DICOM files for testing DICOM parsers
-
awful_dataset_builder
Build LLM-ready Q/A datasets from reference text-to-question mappings produced by Awful Knowledge Synthesizer
-
locustdb
Embeddable high-performance analytics database
-
rdf-compare
Fast CLI to compute the diff of two RDF files as a quad dataset with two named graphs
-
lbasedb
Low level DBMS in Rust focusing on datasets
-
hodgepodge
Lightweight dataset crate of enums for prototyping, teaching, and experimentation
-
h5peek
A CLI tool for inspecting HDF5 file structures and metadata
-
spel-right
A fast and lightweight spell checker and suggester
-
dset
processing and managing dataset-related files, with a focus on machine learning datasets, captions, and safetensors files
-
async-hdf5
Asynchronous HDF5 metadata reader
-
spareval
SPARQL evaluator
-
anima-tagger-core
Data model, project config, and sidecar I/O for anima-tagger — a tag/caption editor for local Stable Diffusion LoRA datasets
-
perspective-client
A data visualization and analytics component, especially well-suited for large and/or streaming datasets
-
torsh-data
Data loading and preprocessing utilities for ToRSh
-
linfa-datasets
Collection of small datasets for Linfa
-
anima-tagger-captioner
Qwen3-VL ONNX image captioner used by anima-tagger, with an OpenAI-compatible HTTP backend as an alternative
-
dataverse
interacting with the Dataverse API
-
abgleich-lib
zfs sync tool, core library
-
javelin-tui
Display and work with Lance matrices
-
hdf5-previewer
A command line previewer for HDF5 files (WIP)
-
ycbust
CLI tool for downloading and extracting the YCB Object and Model Set for 3D rendering and simulation
-
jen
CLI generation tool for creating large datasets
-
bids
Rust tools for BIDS (Brain Imaging Data Structure) datasets
-
anima-tagger-tagger
WD14-family ONNX image tagger used by anima-tagger
-
mssqlrust
Lightweight Rust library for Microsoft SQL Server using dataset and datatable
-
vantage-dataset
Dataset traits for the Vantage data framework
-
geodb-wasm
WebAssembly bindings for geodb-core with simple JS API and embedded dataset
-
deep_filter
Noise supression using deep filtering
-
meilisearch-importer
import massive datasets into Meilisearch by sending them in batches
-
anima-tagger-booru
Danbooru-API tag fetcher used by anima-tagger to enrich sidecars with human-curated booru tags
-
neighbourhood
Super fast fixed size K-d Trees for extremely large datasets
-
perspective
A data visualization and analytics component, especially well-suited for large and/or streaming datasets
-
serrf-testdata
Synthetic metabolomics dataset generator for testing SERRF normalization
-
rosbag
reading ROS bag files
-
veks
A vector bulk data processing tool
-
imdb-async
Opinionated and unopinionated async wrappers to efficiently retrieve and parse IMDB's dataset
-
marina
A dataset manager for robotics to organize, share, and discover datasets and metadata across storage backends
-
vantage
type-safe, ergonomic database toolkit for Rust that focuses on developer productivity without compromising performance. It allows you to work with your database using Rust's strong…
-
axonml-text
Text processing utilities for the Axonml ML framework
-
axonml-audio
Audio processing utilities for the Axonml ML framework
-
lumen-dataset
A tiny ML framework
-
dataloader-rs
High-performance DataLoader for Rust with a PyTorch-like interface
-
apify-rs
client for the Apify API
-
voirs-dataset
Dataset utilities for VoiRS (LJSpeech, JVS, etc.)
-
ecitygml
processing CityGML data
-
exporter
Contrail exporter for filtered JSONL datasets
-
veks-pipeline
Pipeline execution engine for veks
-
sam-zfs-unlocker
controlling encrypted ZFS pool datasets
-
css_dataset
CSS dataset about functions, properties, etc
-
dapper
Dependency Analysis Project - identifying dependencies in C/C++ code and packages on filesystems
-
soundevents-dataset
Audio Set Ontology aims to provide a comprehensive set of categories to describe sound events
-
mobilitydata-client
API for the Mobility Database Catalog. See https://mobilitydatabase.org/. The Mobility Database API uses OAuth2 authentication. To initiate a successful API request…
-
brainwires-datasets
Training data pipelines for the Brainwires Agent Framework — JSONL I/O, tokenization, deduplication, format conversion
-
kv-extsort
External sort for key-value data
-
ghostflow-data
Data loading utilities for GhostFlow ML framework
-
sklears-datasets
Dataset utilities and generation for sklears
-
kaggle
Unofficial rust implementation of the kaggle api
-
kurobako
A black-box optimization benchmarking framework
-
tpctools
generating and converting TPC-H and TPC-DS data sets
-
tsai_cli
Command-line interface for tsai-rs time series deep learning
-
standing-relations
Standing relations over a shifting dataset optimized for 'feedback loop' scenarios
-
three-dcf-core
Document-to-dataset encoding library for LLM training data preparation. Converts PDFs, Markdown, HTML into structured formats optimized for machine learning.
-
santoka
Translations of 668 of Taneda Santoka's free-verse haiku
-
libmotiva
Sanctioned entities matching utilities
-
rubbl_miriad
Interfacing to MIRIAD radio astronomy data formats within the Rubbl framework
-
tsai_data
Dataset and dataloader implementations for tsai-rs time series deep learning
-
perspective-js
A data visualization and analytics component, especially well-suited for large and/or streaming datasets
-
perspective-python
A data visualization and analytics component, especially well-suited for large and/or streaming datasets
-
ferrolearn-datasets
Built-in datasets and synthetic data generators for the ferrolearn ML framework
-
wkwrap
webKNOSSOS wrapper is a file format designed for large-scale, three-dimensional voxel datasets. It was optimized for high-speed access to data subvolumes, and supports multi-channel data and dataset compression.
-
h2n5
HTTP 2 N5: Serve N5 datasets over HTTP as tiled image stacks
-
ceres-server
REST API server for Ceres harvesting, embedding, and search workflows
-
wimbd
A CLI for inspecting and analyzing large text datasets
-
undr
protocol implemented in Rust
-
disco-cli
Generate recommendations from CSV files
-
oxkart
Kart utils
-
kuliya
querying Algerian education dataset
-
kitti-dataset
Dataset loader, data parsers and writers for KITTI dataset
-
arff
ARFF file format serializer and deserializer
-
nuscenes-data
NuScenes dataset loader in Rust
-
agnes
A data wrangling library for Rust
-
ferrolearn-fetch
Network fetchers + on-disk cache for sklearn-style datasets (California housing, OpenML, 20newsgroups, etc.)
-
infernum-paimon
LLM Studio - Teaches arts, sciences, and gives good familiars
-
catclustering
Agglomerative Clustering For Categorical Data
-
concon
Collaborative knowledge graph platform for teams — manage RDF datasets, share scientific papers, and query with SPARQL
-
llm-test-bench-datasets
Dataset management and utilities for LLM Test Bench - load, validate, and manage test datasets
-
mmappet
Memory-mapped columnar dataset library
-
rubbl_casatables
Interfacing to the CASA table format within the Rubbl framework
-
field33_rdftk_core_temporary_fork
core RDF data model; concrete implementations for Statements and Literals, along with a Resource type that provides a builder-like experience for models
-
ream
Data language for building maintainable social science datasets
-
burn_dragon_sudoku
Sudoku datasets and training for burn_dragon
-
rdftk_memgraph
Graph traits from rdftk_core::graph for simple in-memory usage
-
qdplot
perform quick plots
-
idx-lib
read the IDX data format as used in, for example, the MNIST dataset
-
rats-rs
Rapid Augmentations for Time Series
-
zenu
Deep Learning library for Rust
-
burn-mnist
train a simple neural network on mnist dataset using burn
-
cortenforge-burn-dataset
Dataset loading, splitting, and Burn-compatible batching utilities for CortenForge
-
delfi
Conveniently writing data to csv-files
-
idhash
Calculate a Row-Invariant ID for Tabular Data
-
dendritic-preprocessing
Package for preprocessing datasets to convert to numerical representation
-
irisdata
In-memory version of Fisher's Iris dataset
-
dataz
High-throughput generative datasets
-
pointcloud
An accessor layer for goko
-
time-func
represents a set of data points as a function of time and performs various mathematical operations on the data
-
voc-dataset
data loader for The PASCAL Visual Object Classes (VOC)
-
hot-ranking-algorithm
Algorithm that measures how relevant a given data set is, kinda like Reddit
-
mnist_reader
Download the MNIST dataset and simply read it
-
kv-par-merge-sort
External sorting algorithm for (key, value) data sets
-
vil_rlhf_data
N03 — RLHF/DPO Pipeline: preference pairs, dataset management, training format export
-
vil_eval
VIL Evaluation Framework — metrics, dataset, batch evaluation, reporting (H10)
-
dendritic-trees
Pacakge for tree based modeling
-
disk-dlmalloc
A fork of [dlmalloc-rs] backed by a memory-mapped file, enabling support for datasets exceeding available RAM
-
cocotools
Package providing functionalities to work with COCO format datasets
-
openml
interface to OpenML
-
auto-regex
Automagically finds a regex that best matches an example and a sample list
-
eo-identifiers
Parsers for naming conventions of earth observation products and datasets
-
toyai
A small collection of ai algorithms to perform some simple prediction on structured data
-
pixmux
An interactive TUI for exploring CSV-backed image datasets
-
esvc-traits
Traits for ESVC
-
hodu_utils
hodu utils
-
tiny-data
A cli tool for building computer vision datasets
-
quicklabel
A fast image labeling tool for creating text-to-image finetuning datasets
-
label_studio_yolo_datasets_converter
converting datasets from Label Studio to YOLO format
-
neardup
near-duplicate matching
-
nove_dataset
lightweight deep learning library wrapped around Candle Tensor
-
aprender-data
Data Loading, Distribution and Tooling in Pure Rust
-
se_dump
Some structs to facilitate parsing of StackExchange dumps into easy-to-use values
-
mule
Strong-headed (yet flexible) parser of columnar datasets from CSV, TSV and other delimiter-separated datasets
-
sklearn-sample-datasets-rs
Rust port of https://scikit-learn.org/stable/datasets/index.html
-
ebd
reading the eBird Basic Dataset (EBD)
-
easy_stats
package to perform basic descriptive stats on a data set
-
dendritic-datasets
Prebuilt datasets that can be imported for ML model training
-
fbw_map_parser
allows extracting height data at an arbitrary geographical coordinate from the dataset used by FlyByWire simulations for Microsoft Flight Simulator
-
rustitude-core
create and operate models for particle physics amplitude analyses
-
sanity_rs_client
client for sanity.io
-
aorist_core
Core abstractions the aorist project
-
sbr
Recommender models
-
rs-cinic-10-index
Data index for CINIC-10 dataset
-
wfdb-rust
reading WFDB-format datasets in Rust
-
esvc-core
Core of ESVC (event sourcing version control)
-
gosh-dataset
short text for crates.io
-
iii-formosa-dataset
Se/Deserialization toolkit for Formosa dataset from Institute for the Information Industry
-
palace
mounting datasets into memory for fast loading in deep learning tasks
-
fvecs_readers
Quick and dirty .fvecs file reader
-
flowrider
High-performance PyTorch-compatible streaming dataset with distributed caching for on-the-fly remote dataset fetching
-
bamcensus-tiger
Work with geospatial data in the TIGER/Lines datasets
-
bamcensus
The Behavior and Advanced Mobility Census Dataset Aggregator
-
esvc-wasm
WASM engine for ESVC
-
r2rs-datasets
Statistical datasets for Rust based on R's datasets package
-
OpenDataSH_twitter_notifier
A twitter bot that posts a message for new datasets on the OpenData platform of Schleswig-Holstein
-
outliers
Identify outliers in a data set
-
axe
split a Reddit dataset by various keys
-
ergast_rust
Collection of utilities to fetch data from amazing F1 dataset, the Ergast API
-
bamcensus-lehd
Work with Longitudinal Employer-Household Dynamics (LEHD) datasets
-
kit-ais-dataset
Data types and loader for KIT AIS data set
-
aocdata
gRPC server interface to database that serves AOC puzzle dataset requests
-
routrs_railways_dataset
Railways dataset for routrs, the geograph-based shortest distance calculator for Rust
-
routrs_maritime_dataset
Maritime dataset for routrs, the geograph-based shortest distance calculator for Rust
-
routrs_highways_dataset
Highways dataset for routrs, the geograph-based shortest distance calculator for Rust
-
bonitox
parsing input/output of Bonito LLM
-
simple-abns
Simplify the ABR's Australian Business Number dataset for easier analysis
-
swc-plugin-add-logging-dataset
swc plugin add dataset for logging
-
hot-ranking-algorithm-rust
Algorithm that measures how relevant a given data set is, kinda like Reddit
-
vision
Computer vision benchmarking datasets
-
mnist-extractor
extract MNIST dataset
-
plotter
A package that plots a dataset to a HTML Canvas
Try searching with DuckDuckGo.