#dataset

  1. gdal

    GDAL bindings for Rust

    v0.19.0 280K #dataset #api-bindings
  2. edgefirst-cli

    EdgeFirst Client Library and CLI

    v2.9.5 #dataset #arrow #annotations #client #studio #profiling #arrow-format #authentication #upload-download #3d
  3. aws-sdk-dataexchange

    AWS SDK for AWS Data Exchange

    v1.102.0 3.3K #aws-sdk #dataset #data-exchange #amazon-s3 #amazon-s3-data #exchange-data #lake-formation #data-access #permissions #api-gateway
  4. vectordata

    tools for dataset.yaml

    v1.1.1 #dataset #automation #ann #testing #testing-automation
  5. reinfer-cli

    Command line interface for Re:infer, the conversational data intelligence platform

    v0.39.1 #command-line-interface #dataset #api-client #context #source #progress-bar #api-token #data-platform #command-context #conversational
  6. netcdf3

    A pure Rust library for reading and writing NetCDF-3 files

    v0.6.1 1.5K #dataset #net-cdf
  7. rdftk_core

    core RDF data model; concrete implementations for Statements and Literals, along with a Resource type that provides a builder-like experience for models

    v0.5.6 #rdf-data #dataset #data-model #blank-node #resources #rdf-star #graph-traits
  8. hf-xet

    Client library and tooling for the Hugging Face Xet data storage system

    v1.5.2 46K #hugging-face #cloud-storage #dataset #large-files
  9. aws-sdk-cognitosync

    AWS SDK for Amazon Cognito Sync

    v1.98.0 950 #aws-sdk #cognito #sync-service #dataset #credentials
  10. imperative

    Check for imperative mood in text

    v1.0.7 104K #mood #mood-of-word #text #dataset #word-list
  11. scirs2-datasets

    Datasets module for SciRS2 (scirs2-datasets)

    v0.4.4 #dataset #scientific #machine-learning #scipy #machine-learning-data
  12. perspective-server

    A data visualization and analytics component, especially well-suited for large and/or streaming datasets

    v4.4.1 2.5K #perspective #analytics #visualization #dataset #streaming #javascript #wasm #data-streaming #visualization-and-analytics #data-model
  13. slabtastic

    A streamable, readable, writeable, randomly accessible file format

    v1.0.1 #dataset #automation #ann #vector #testing #testing-automation
  14. hdf5-pure

    Pure-Rust HDF5 writer library (WASM-compatible, no C dependencies)

    v0.5.0 410 #hdf5 #dataset #matlab #wasm-compatible #ascii #zfp #deflate #mat #compact-storage #compound
  15. rustsight

    fast, safe CLI tool for dataset analysis and validation. Analyzes CSV files for column types, missing values, basic statistics (min/max/mean), outliers, no-variance columns, and mixed-type…

    v1.2.4 #csv #dataset #data-analysis #data-science #cli
  16. edgefirst-client

    EdgeFirst Client Library and CLI

    v2.9.5 #edge-first #dataset #model-training #annotations #arrow-format #rest-api-client #ci-cd #metrics #ml-ops #android
  17. anima-tagger-cli

    Command-line interface for anima-tagger: tag, caption, and export Stable Diffusion LoRA datasets

    v0.2.1 #stable-diffusion #lora #dataset #tagging #captioning
  18. wikiwho

    Fast Rust reimplementation of the WikiWho algorithm for fine-grained authorship attribution on large datasets. Optimized for easy integration in multi-threaded applications.

    v0.3.1 #diff-algorithm #dataset #authorship #python #page #multi-threading #attribution #revision #wikimedia #xml
  19. cjval

    Schema-validation of CityJSON/Seq datasets

    v0.9.0 170 #schema-validation #city-json #dataset #city-json-seq #file #warnings #error-warnings
  20. dataset-writer

    write CSV/Arrow/Parquet files concurrently

    v2.0.0 450 #csv #dataset #parquet #arrow-ipc #feather
  21. hdf5-io

    Pure Rust HDF5 file reader supporting superblock v2 and v3

    v0.1.3 #hdf5 #file-reader #dataset #link #superblock #system-zlib #wasm
  22. mariadb-mysql-kbs

    An index of the MariaDB and MySQL Knowledge bases

    v1.3.1-rc2 460 #knowledge-base #maria-db #mysql #dataset #kb
  23. z_osmf

    z/OSMF Client

    v0.13.5 1.3K #osmf #dataset #user-name #client #list #lower-case-letter #ibm
  24. amlich-core

    Vietnamese Lunar Calendar - Core calculation engine

    v0.1.4 #lunar-calendar #vietnamese #calculations #dataset #contract #recommendations
  25. hdf5-pure-rust

    Pure Rust implementation of the HDF5 file format

    v0.3.2 130 #dataset #hdf5 #symbol-table #b-tree #attributes #compound #datatype #deflate #shuffle #dense
  26. rdftk_io

    traits for reading and writing Statements and Graphs as well as implementations of these for common representations

    v0.3.3 1.1K #rdf #file-extension #dataset #n-triples #traits-for-reading #mime-types #rdftk #json-ld #rdf-xml #turtle
  27. ridal

    Speeding up Ground Penetrating Radar (GPR) processing

    v0.5.1 #processing #radar #gpr #profile #dataset #rad #mhz #speeding #modes #gdal
  28. rolling-median

    Compute the median using a 'rolling' (online) algorithm

    v1.5.5 4.3K #dataset #compute #median #rolling #online
  29. ppl

    A structured parallel programming library for Rust

    v0.1.6 410 #thread-pool #pipeline #data-processing #framework #dataset #word-counter #concurrency #farm #data-parallelism
  30. dicom-test-files

    A collection of DICOM files for testing DICOM parsers

    v0.4.0 950 #dataset #dicom #image #medical
  31. awful_dataset_builder

    Build LLM-ready Q/A datasets from reference text-to-question mappings produced by Awful Knowledge Synthesizer

    v0.1.3 #dataset #llm #cli #finetune
  32. locustdb

    Embeddable high-performance analytics database

    v0.5.6 1.4K #analytics-database #data-analytics #csv #column #dataset #in-memory-database #table-name #database-query #db-path #low-latency
  33. rdf-compare

    Fast CLI to compute the diff of two RDF files as a quad dataset with two named graphs

    v1.2.0 #rdf #dataset #diff #graphs #blank-node #n-quads #format-file #ci #n-triples #trig
  34. lbasedb

    Low level DBMS in Rust focusing on datasets

    v0.1.11 #database #dataset #low-level #integer #column #supposed #rust-focusing #conn
  35. hodgepodge

    Lightweight dataset crate of enums for prototyping, teaching, and experimentation

    v0.2.0 #enums #dataset #teaching #prototype #education
  36. h5peek

    A CLI tool for inspecting HDF5 file structures and metadata

    v0.3.0 #hdf5 #dataset #inspect
  37. spel-right

    A fast and lightweight spell checker and suggester

    v0.5.3 #spell-check #dataset #blob #utf-8 #input-file #cli-and-lib #spelling-correction #matching-algorithm
  38. dset

    processing and managing dataset-related files, with a focus on machine learning datasets, captions, and safetensors files

    v0.1.12 750 #json #safetensors #caption #text-content #dataset #text-processing #deduplicate #json-processing #json-output #file-extension
  39. async-hdf5

    Asynchronous HDF5 metadata reader

    v0.1.0 #hdf5 #dataset #b-tree #object-store #async #byte-offset #file-reader #block-cache #dtype #datatype
  40. spareval

    SPARQL evaluator

    v0.2.6 13K #sparql #dataset
  41. anima-tagger-core

    Data model, project config, and sidecar I/O for anima-tagger — a tag/caption editor for local Stable Diffusion LoRA datasets

    v0.2.1 #stable-diffusion #lora #dataset #tagging #captioning
  42. perspective-client

    A data visualization and analytics component, especially well-suited for large and/or streaming datasets

    v4.4.1 2.6K #perspective #visualization #dataset #analytics #streaming #javascript #websocket #visualization-and-analytics #data-streaming
  43. torsh-data

    Data loading and preprocessing utilities for ToRSh

    v0.1.2 #deep-learning #data-loading #data-loader #dataset
  44. linfa-datasets

    Collection of small datasets for Linfa

    v0.8.1 1.6K #linfa #dataset #testing #collection
  45. anima-tagger-captioner

    Qwen3-VL ONNX image captioner used by anima-tagger, with an OpenAI-compatible HTTP backend as an alternative

    v0.2.1 #stable-diffusion #lora #dataset #tagging #captioning #computer-vision
  46. dataverse

    interacting with the Dataverse API

    v0.1.0 #file-upload #file-metadata #dataset #env-vars #search-query
  47. abgleich-lib

    zfs sync tool, core library

    v0.2.0-alpha6 #zfs #transfer #abgleich #backup-tools #dataset
  48. javelin-tui

    Display and work with Lance matrices

    v0.10.0 #lance #tui #dataset #display #sparse-matrix #dense-matrix #coo #sparse-vector #1d #embedding
  49. hdf5-previewer

    A command line previewer for HDF5 files (WIP)

    v0.0.5 #command-line #preview #hdf5 #content #dataset
  50. ycbust

    CLI tool for downloading and extracting the YCB Object and Model Set for 3D rendering and simulation

    v0.5.0 #dataset #3d #robotics #ycb #simulation
  51. jen

    CLI generation tool for creating large datasets

    v1.7.1 #dataset #generator #json-template #json #template
  52. bids

    Rust tools for BIDS (Brain Imaging Data Structure) datasets

    v0.0.2 #safetensors #dataset #data-structures #mmap #brain #header-parser
  53. anima-tagger-tagger

    WD14-family ONNX image tagger used by anima-tagger

    v0.2.1 #stable-diffusion #lora #dataset #tagging #captioning #computer-vision
  54. mssqlrust

    Lightweight Rust library for Microsoft SQL Server using dataset and datatable

    v1.0.2 #sql-server #dataset #data-table #database
  55. vantage-dataset

    Dataset traits for the Vantage data framework

    v0.4.2 #dataset #vantage #entity #csv #framework #orm #database #change-tracking #deserialize #storage-back-end
  56. geodb-wasm

    WebAssembly bindings for geodb-core with simple JS API and embedded dataset

    v0.1.6 #dataset #countries #cities #wasm-bindings #name #cache #city #js-bindings #data-cache #embedded
  57. deep_filter

    Noise supression using deep filtering

    v0.2.5 450K #speech #audio #speech-enhancement #dataset #filtering #pytorch #audio-processing #embedded-devices #real-time-audio-processing #audio-samples
  58. meilisearch-importer

    import massive datasets into Meilisearch by sending them in batches

    v0.2.4 130 #importer #meilisearch #dataset #batch #document #csv
  59. anima-tagger-booru

    Danbooru-API tag fetcher used by anima-tagger to enrich sidecars with human-curated booru tags

    v0.2.1 #stable-diffusion #lora #dataset #tagging #captioning
  60. neighbourhood

    Super fast fixed size K-d Trees for extremely large datasets

    v0.2.0 #fixed-size #kd-tree #dataset #extremely-large #query
  61. perspective

    A data visualization and analytics component, especially well-suited for large and/or streaming datasets

    v4.4.1 #visualization #dataset #visualization-and-analytics #real-time-analytics #python #javascript #wasm #apache-arrow #local-client
  62. serrf-testdata

    Synthetic metabolomics dataset generator for testing SERRF normalization

    v0.1.0 #random-forest #dataset #validation #serrf #metabolomics #synthetic #systematic #compound #removal #drift
  63. rosbag

    reading ROS bag files

    v0.6.3 550 #ros #robotics #dataset
  64. veks

    A vector bulk data processing tool

    v1.1.2 #dataset #ann #automation #testing #testing-automation
  65. imdb-async

    Opinionated and unopinionated async wrappers to efficiently retrieve and parse IMDB's dataset

    v0.12.2 #imdb #dataset #client #opinionated #parser #cache #local-cache
  66. marina

    A dataset manager for robotics to organize, share, and discover datasets and metadata across storage backends

    v0.2.8 120 #robotics #dataset #metadata
  67. vantage

    type-safe, ergonomic database toolkit for Rust that focuses on developer productivity without compromising performance. It allows you to work with your database using Rust's strong…

    v0.2.0 320 #sql #dataset #entity #business-logic #orm #type-system #query-builder #entity-framework #database #set-operations
  68. axonml-text

    Text processing utilities for the Axonml ML framework

    v0.6.2 #tokenize #vocab #axonml #dataset #utilities #ngrams #synthetic #language-modeling #pad #vocabulary
  69. axonml-audio

    Audio processing utilities for the Axonml ML framework

    v0.6.2 #audio #axonml #dataset #synthetic #pitch #ml #audio-waveform #augmentation #resample #mfcc
  70. lumen-dataset

    A tiny ML framework

    v0.5.0 #machine-learning #dataset #auto-grad
  71. dataloader-rs

    High-performance DataLoader for Rust with a PyTorch-like interface

    v0.1.0 #pytorch #sampler #dataset #collator #builder #multidimensional-array #collate
  72. apify-rs

    client for the Apify API

    v0.1.0 #api-client #actor #strong-typing #key-value-store #dataset #instagram #web-scraping #serverless #mirror #retries
  73. voirs-dataset

    Dataset utilities for VoiRS (LJSpeech, JVS, etc.)

    v0.1.0-rc.1 #audio-data #dataset #jvs #voirs #ljspeech
  74. ecitygml

    processing CityGML data

    v0.0.2-alpha.1 #city-gml-data #processing #dataset
  75. exporter

    Contrail exporter for filtered JSONL datasets

    v0.1.3 #contrail #jsonl #dataset #filtered #codex #claude #artificial-intelligence #recorder
  76. veks-pipeline

    Pipeline execution engine for veks

    v1.1.2 #execution-engine #pipeline #progress-bar #compute #dataset #veks #execution-pipeline #logging #fingerprinting #faiss
  77. sam-zfs-unlocker

    controlling encrypted ZFS pool datasets

    v0.2.0 #encryption #zfs #dataset #load-key #sudo #unmount
  78. css_dataset

    CSS dataset about functions, properties, etc

    v0.4.1 37K #css #dataset #web
  79. dapper

    Dependency Analysis Project - identifying dependencies in C/C++ code and packages on filesystems

    v0.0.0-pre.3 #identifying #dataset #system #package #user #source-file
  80. soundevents-dataset

    Audio Set Ontology aims to provide a comprehensive set of categories to describe sound events

    v0.2.0 #dataset #ontology #no-std #sound-events #audioset
  81. mobilitydata-client

    API for the Mobility Database Catalog. See https://mobilitydatabase.org/. The Mobility Database API uses OAuth2 authentication. To initiate a successful API request…

    v1.0.0 #oauth2 #feed #api-client #dataset #mobility #database #gtfs #search-api #web-api #authentication
  82. brainwires-datasets

    Training data pipelines for the Brainwires Agent Framework — JSONL I/O, tokenization, deduplication, format conversion

    v0.8.0 #training-data #dataset #jsonl
  83. kv-extsort

    External sort for key-value data

    v0.1.2 #key-value #sorting #dataset
  84. ghostflow-data

    Data loading utilities for GhostFlow ML framework

    v1.0.0 #machine-learning #data-loading #dataset
  85. sklears-datasets

    Dataset utilities and generation for sklears

    v0.1.1 #dataset #data-generation #machine-learning
  86. kaggle

    Unofficial rust implementation of the kaggle api

    v2.0.0 370 #dataset #data-science
  87. kurobako

    A black-box optimization benchmarking framework

    v0.2.10 #black-box-optimization #benchmark #random #dataset #plot #uniform-distribution #optimization-problem #markdown #studies #logging
  88. tpctools

    generating and converting TPC-H and TPC-DS data sets

    v0.7.0 240 #generator #convert #tpc-h #dataset #data-generator #parquet #data-fusion #apache-arrow
  89. tsai_cli

    Command-line interface for tsai-rs time series deep learning

    v0.1.2 #deep-learning #forecasting #time-series-transformer #dataset #transformer-model #classification #wgpu #mlx #deep-learning-framework #python-bindings
  90. standing-relations

    Standing relations over a shifting dataset optimized for 'feedback loop' scenarios

    v0.1.2 #relation #dataset #standing #feedback #creation-context #execution-context
  91. three-dcf-core

    Document-to-dataset encoding library for LLM training data preparation. Converts PDFs, Markdown, HTML into structured formats optimized for machine learning.

    v0.2.0 #pdf #llm #dataset #document
  92. santoka

    Translations of 668 of Taneda Santoka's free-verse haiku

    v1.0.2 #haiku-poetry #haiku #dataset #poetry #literature #japan
  93. libmotiva

    Sanctioned entities matching utilities

    v0.1.1 #dataset #elasticsearch #matching #entities #sanctions
  94. rubbl_miriad

    Interfacing to MIRIAD radio astronomy data formats within the Rubbl framework

    v0.3.2 330 #radio-astronomy #data-format #miriad #rubbl #dataset #astronomy-data
  95. tsai_data

    Dataset and dataloader implementations for tsai-rs time series deep learning

    v0.1.2 #deep-learning #dataset #forecasting #tsai #classification #data-loader #time-series-transformer
  96. perspective-js

    A data visualization and analytics component, especially well-suited for large and/or streaming datasets

    v4.4.1 #experimental #perspective #dataset
  97. perspective-python

    A data visualization and analytics component, especially well-suited for large and/or streaming datasets

    v4.4.1 #perspective #visualization #python-bindings #dataset #visualization-and-analytics
  98. ferrolearn-datasets

    Built-in datasets and synthetic data generators for the ferrolearn ML framework

    v0.3.0 #synthetic-data #benchmark #iris #dataset #machine-learning
  99. wkwrap

    webKNOSSOS wrapper is a file format designed for large-scale, three-dimensional voxel datasets. It was optimized for high-speed access to data subvolumes, and supports multi-channel data and dataset compression.

    v1.6.0 #voxel #dataset #web-knossos #format #multi-channel #three-dimensional #data-access #large-scale #subvolumes #compression
  100. h2n5

    HTTP 2 N5: Serve N5 datasets over HTTP as tiled image stacks

    v0.1.9 #n5 #dataset
  101. ceres-server

    REST API server for Ceres harvesting, embedding, and search workflows

    v0.3.5 #rest #portal #ceres #dataset #harvest #data-export #statistics #openai #harvesting #web-api
  102. wimbd

    A CLI for inspecting and analyzing large text datasets

    v0.3.0 320 #statistics #dataset #big-data #ngrams #search #counting-bloom-filter #corpus #text-data
  103. undr

    protocol implemented in Rust

    v0.2.0 #dataset #install #index #brotli #properties #index-file #decompression
  104. disco-cli

    Generate recommendations from CSV files

    v0.1.2 #recommendations #matrix-factorization #csv #generate #dataset
  105. oxkart

    Kart utils

    v0.0.2 #kart #dataset
  106. kuliya

    querying Algerian education dataset

    v0.1.3 170 #dataset #api #education #path #querying #college
  107. kitti-dataset

    Dataset loader, data parsers and writers for KITTI dataset

    v0.3.0 110 #dataset #deserialize #kitti #loader #sample #object-detection
  108. arff

    ARFF file format serializer and deserializer

    v0.3.0 #dataset #serializer-deserializer #weka #parser
  109. nuscenes-data

    NuScenes dataset loader in Rust

    v0.4.0 #dataset #nu-scenes #loader
  110. agnes

    A data wrangling library for Rust

    v0.3.2 #data-source #wrangling #serialization #data-processing #dataset #csv #data-manipulation #tabular-data #data-access #table-definition
  111. ferrolearn-fetch

    Network fetchers + on-disk cache for sklearn-style datasets (California housing, OpenML, 20newsgroups, etc.)

    v0.3.0 #dataset #fetch #machine-learning #cache #openml
  112. infernum-paimon

    LLM Studio - Teaches arts, sciences, and gives good familiars

    v0.2.0-rc.2 #training #llm #experiment #dataset #model #paimon #sciences #arts #teaches #metrics
  113. catclustering

    Agglomerative Clustering For Categorical Data

    v0.2.1 130 #cluster-analysis #complete-linkage #categorical-data #agglomerative #dataset #random-matrix #random-data
  114. concon

    Collaborative knowledge graph platform for teams — manage RDF datasets, share scientific papers, and query with SPARQL

    v0.1.0 #rdf #sparql #sparql-query #dataset #team #knowledge-graph #rdf-graph #collaborative
  115. llm-test-bench-datasets

    Dataset management and utilities for LLM Test Bench - load, validate, and manage test datasets

    v0.1.0 #dataset #testing #benchmark #llm #validation
  116. mmappet

    Memory-mapped columnar dataset library

    v0.1.0 #memory-map #dataset #memmaped-df #binary
  117. rubbl_casatables

    Interfacing to the CASA table format within the Rubbl framework

    v0.9.0 #table-format #casa #dataset #rubbl #interfacing #astronomy
  118. field33_rdftk_core_temporary_fork

    core RDF data model; concrete implementations for Statements and Literals, along with a Resource type that provides a builder-like experience for models

    v0.3.1 #rdf-data #dataset #blank-node #temporary-fork #resources #rdftk #graph-data #rdf-star #field33 #data-model
  119. ream

    Data language for building maintainable social science datasets

    v0.4.2 #dataset #language #csv #science #social #template-engine #data-points
  120. burn_dragon_sudoku

    Sudoku datasets and training for burn_dragon

    v0.4.0 #burn-dragon #training #sudoku #dataset #inference #metrics #hugging-face #positional #cache #alt-text
  121. rdftk_memgraph

    Graph traits from rdftk_core::graph for simple in-memory usage

    v0.1.12 #graph-traits #rdftk #dataset #in-memory #factory
  122. qdplot

    perform quick plots

    v0.1.1 #plot #csv #dataset #tool #quick-and-dirty #guff #nan #first-letter
  123. idx-lib

    read the IDX data format as used in, for example, the MNIST dataset

    v0.0.2 #data-format #dataset #mnist #read #idx
  124. rats-rs

    Rapid Augmentations for Time Series

    v0.1.0-alpha #time-series-data #frequency-domain #dataset #rapid #batch #univariate #augmentation #python-packages
  125. zenu

    Deep Learning library for Rust

    v0.1.2 120 #deep-learning #neural-network #dataset #mnist #artificial-intelligence #hpc
  126. burn-mnist

    train a simple neural network on mnist dataset using burn

    v0.1.2 210 #mnist #train #neural-network #burn #dataset #deep-learning #classification
  127. cortenforge-burn-dataset

    Dataset loading, splitting, and Burn-compatible batching utilities for CortenForge

    v0.6.0 #computer-vision #dataset #burn-dataset #burn
  128. delfi

    Conveniently writing data to csv-files

    v0.1.0 #csv #dataset #data
  129. idhash

    Calculate a Row-Invariant ID for Tabular Data

    v0.3.0 #dataset #row-invariant #calculate #hash #column #unf #tabular
  130. dendritic-preprocessing

    Package for preprocessing datasets to convert to numerical representation

    v1.5.0 150 #pre-processor #dendritic #scalar #min-max #dataset
  131. Try searching with DuckDuckGo.

  132. irisdata

    In-memory version of Fisher's Iris dataset

    v0.1.2 #iris #dataset #data
  133. dataz

    High-throughput generative datasets

    v0.3.0 #dataset #data-generation #generation
  134. pointcloud

    An accessor layer for goko

    v0.5.5 #dataset #goko
  135. time-func

    represents a set of data points as a function of time and performs various mathematical operations on the data

    v0.1.4 #math-operations #dataset #points-of-time #perform
  136. voc-dataset

    data loader for The PASCAL Visual Object Classes (VOC)

    v0.3.0 #dataset #voc
  137. hot-ranking-algorithm

    Algorithm that measures how relevant a given data set is, kinda like Reddit

    v2.0.0 #dataset #reddit #measure #site #news-site
  138. mnist_reader

    Download the MNIST dataset and simply read it

    v0.1.1 #mnist #dataset #ml #reader #computer-vision
  139. kv-par-merge-sort

    External sorting algorithm for (key, value) data sets

    v0.1.0 #key-value #merge-sorting #file-sorting #dataset #stream
  140. vil_rlhf_data

    N03 — RLHF/DPO Pipeline: preference pairs, dataset management, training format export

    v0.4.0 #format #vil #training #dataset #dpo #preferences #distributed-systems #language-framework #zero-copy
  141. vil_eval

    VIL Evaluation Framework — metrics, dataset, batch evaluation, reporting (H10)

    v0.4.0 #vil #metrics #dataset #framework #batch #h10 #distributed-systems #language-framework #llm #zero-copy
  142. dendritic-trees

    Pacakge for tree based modeling

    v1.5.0 #decision-tree #random-forest #dendritic #dataset #classification #min-max #machine-learning #bootstrap #multi-dimensional-array
  143. disk-dlmalloc

    A fork of [dlmalloc-rs] backed by a memory-mapped file, enabling support for datasets exceeding available RAM

    v0.2.0 #memory-mapped-file #dlmalloc #ram #fork #dataset #memory-map #allocator #exceeding
  144. cocotools

    Package providing functionalities to work with COCO format datasets

    v0.0.7 #coco #dataset #format #visualize #convert
  145. openml

    interface to OpenML

    v0.1.2 #dataset #machine-learning #machine-learning-data
  146. auto-regex

    Automagically finds a regex that best matches an example and a sample list

    v0.1.3 #regex #string #dataset #filter #text
  147. eo-identifiers

    Parsers for naming conventions of earth observation products and datasets

    v0.1.1 #naming-conventions #product #dataset #date-time #observation #earth #satellite
  148. toyai

    A small collection of ai algorithms to perform some simple prediction on structured data

    v0.2.1 #structured-data #artificial-intelligence #iris #dataset #collection #cm #pattern-recognition #plant #petal #literature
  149. pixmux

    An interactive TUI for exploring CSV-backed image datasets

    v0.1.0 #dataset #interactive #tui #image #exploring #ratatui
  150. esvc-traits

    Traits for ESVC

    v0.1.0 #events #version-control #esvc #dataset #branch #file-memory #blockchain #event-streaming #slow #imagine
  151. hodu_utils

    hodu utils

    v0.2.4 #hodu #data-loader #sampler #dataset #deep-learning #machine-learning
  152. tiny-data

    A cli tool for building computer vision datasets

    v0.1.2 #computer-vision #tool-for-building #dataset #command-line-tool #directory #alt-text
  153. quicklabel

    A fast image labeling tool for creating text-to-image finetuning datasets

    v1.0.0 #image #labeling #dataset #machine-learning #dreambooth
  154. label_studio_yolo_datasets_converter

    converting datasets from Label Studio to YOLO format

    v0.1.2 #label-studio #yolo #dataset #converter
  155. neardup

    near-duplicate matching

    v0.1.0 #near-duplicate #query #document-analysis #dataset #path
  156. nove_dataset

    lightweight deep learning library wrapped around Candle Tensor

    v0.1.2 #dataset #deep-learning #traits #wrapped #generic #candle
  157. aprender-data

    Data Loading, Distribution and Tooling in Pure Rust

    v0.33.0 290 #dataset #parquet #ml #arrow
  158. se_dump

    Some structs to facilitate parsing of StackExchange dumps into easy-to-use values

    v0.1.0 #parser #stack-exchange #dataset
  159. mule

    Strong-headed (yet flexible) parser of columnar datasets from CSV, TSV and other delimiter-separated datasets

    v0.5.0 #csv #dataset #tsv #columnar
  160. sklearn-sample-datasets-rs

    Rust port of https://scikit-learn.org/stable/datasets/index.html

    v0.1.0 #dataset #port #stable #org #learn
  161. ebd

    reading the eBird Basic Dataset (EBD)

    v0.1.0 #dataset #ebird #reading #latitude #record
  162. easy_stats

    package to perform basic descriptive stats on a data set

    v0.1.1 #statistics #dataset #descriptive #perform #mean
  163. dendritic-datasets

    Prebuilt datasets that can be imported for ML model training

    v1.5.0 #dataset #dendritic #student #iris #training #parquet #ml-model #purchase #cancer #airfoil
  164. fbw_map_parser

    allows extracting height data at an arbitrary geographical coordinate from the dataset used by FlyByWire simulations for Microsoft Flight Simulator

    v1.0.0 #map-parser #dataset #simulation #flight #geographical
  165. rustitude-core

    create and operate models for particle physics amplitude analyses

    v9.0.0 1.6K #particle-physics #parameters #dataset #model #analyse #pyo3 #amplitudes #spherical #precalculated
  166. sanity_rs_client

    client for sanity.io

    v0.1.0 #io-client #upload-image #sanity #dataset #query
  167. aorist_core

    Core abstractions the aorist project

    v0.0.14 #dataset #assets #task #python #template #universe #repetitive #dag #machine-learning
  168. sbr

    Recommender models

    v0.4.0 #model #dataset #sequence #user #recommender #recommender-system
  169. rs-cinic-10-index

    Data index for CINIC-10 dataset

    v0.2.0 750 #dataset #computer-vision #cinic-10
  170. wfdb-rust

    reading WFDB-format datasets in Rust

    v0.3.0 #dataset #wfdb-format #parser #information #time #rust-projects #motivation
  171. esvc-core

    Core of ESVC (event sourcing version control)

    v0.1.0 #event-sourcing #version-control #dataset #blockchain #branch #slow #memory-file #event-streaming #imagine #file-memory
  172. gosh-dataset

    short text for crates.io

    v0.2.1 160 #dataset #trajectory #chemistry #file
  173. iii-formosa-dataset

    Se/Deserialization toolkit for Formosa dataset from Institute for the Information Industry

    v0.2.0 #dataset #industry #deserialize
  174. palace

    mounting datasets into memory for fast loading in deep learning tasks

    v0.1.0 #dataset #memory #deep-learning #mounting #task
  175. fvecs_readers

    Quick and dirty .fvecs file reader

    v0.1.0 #file-reader #dirty #fvecs #quick-and-dirty #dataset
  176. flowrider

    High-performance PyTorch-compatible streaming dataset with distributed caching for on-the-fly remote dataset fetching

    v0.1.1 #stream #distributed #pytorch #cache #dataset #fetching #stream-data #on-the-fly #cache-data #cloud-storage
  177. bamcensus-tiger

    Work with geospatial data in the TIGER/Lines datasets

    v0.1.0 #tiger #geospatial #dataset #line #shape-file #geospatial-data
  178. bamcensus

    The Behavior and Advanced Mobility Census Dataset Aggregator

    v0.1.0 #dataset #census #advanced-mobility #aggregator #behavior
  179. esvc-wasm

    WASM engine for ESVC

    v0.1.0 #wasm-engine #version-control #events #esvc #dataset
  180. r2rs-datasets

    Statistical datasets for Rust based on R's datasets package

    v0.1.1 #dataset #data-model #plot #statistics #plot-data
  181. OpenDataSH_twitter_notifier

    A twitter bot that posts a message for new datasets on the OpenData platform of Schleswig-Holstein

    v0.1.2 #opendatash_twitter_notifier #twitter-bot #open-data #post-message #dataset #notifier
  182. outliers

    Identify outliers in a data set

    v0.5.0 #dataset #identifying #outliers-in-data-set
  183. axe

    split a Reddit dataset by various keys

    v0.1.0 #json #reddit #dataset #key #split #author
  184. ergast_rust

    Collection of utilities to fetch data from amazing F1 dataset, the Ergast API

    v0.1.0 #dataset #utilities #fetch #json-response #api #amazing #fetch-data #data-api #f1
  185. bamcensus-lehd

    Work with Longitudinal Employer-Household Dynamics (LEHD) datasets

    v0.1.0 #dataset #dynamics #employer-household #longitudinal #census #data-aggregation #geospatial
  186. kit-ais-dataset

    Data types and loader for KIT AIS data set

    v0.1.0 #dataset #voc
  187. aocdata

    gRPC server interface to database that serves AOC puzzle dataset requests

    v0.1.0 #grpc #dataset #grpc-server #database-server #aoc #puzzle #database-interface
  188. routrs_railways_dataset

    Railways dataset for routrs, the geograph-based shortest distance calculator for Rust

    v0.1.0 #shortest-distance #routrs #geograph-based #dataset #railways
  189. routrs_maritime_dataset

    Maritime dataset for routrs, the geograph-based shortest distance calculator for Rust

    v0.1.0 #shortest-distance #routrs #geograph-based #dataset #maritime
  190. routrs_highways_dataset

    Highways dataset for routrs, the geograph-based shortest distance calculator for Rust

    v0.1.0 #shortest-distance #routrs #geograph-based #dataset
  191. bonitox

    parsing input/output of Bonito LLM

    v0.1.1 #llm #dataset #synthetic-data #generation
  192. simple-abns

    Simplify the ABR's Australian Business Number dataset for easier analysis

    v0.1.0 #business-numbers #abn #dataset
  193. swc-plugin-add-logging-dataset

    swc plugin add dataset for logging

    v0.1.2 #swc-plugin #dataset #logging
  194. hot-ranking-algorithm-rust

    Algorithm that measures how relevant a given data set is, kinda like Reddit

    v1.0.1 #dataset #reddit #measure #rank #news-site
  195. vision

    Computer vision benchmarking datasets

    v0.0.2 #computer-vision #machine-learning #dataset #mnist
  196. mnist-extractor

    extract MNIST dataset

    v0.1.1 #dataset #mnist #extract #download #image #datas #array2
  197. plotter

    A package that plots a dataset to a HTML Canvas

    v0.1.0 #canvas #html #dataset #plot