Low-code framework for building custom LLMs, neural networks, and other AI models
-
Updated
Oct 28, 2024 - Python
Low-code framework for building custom LLMs, neural networks, and other AI models
Modern columnar data format for ML and LLMs implemented in Rust. Convert from parquet in 2 lines of code for 100x faster random access, vector index, and data versioning. Compatible with Pandas, DuckDB, Polars, Pyarrow, and PyTorch with more integrations coming..
A curated, but incomplete, list of data-centric AI resources.
Deita: Data-Efficient Instruction Tuning for Alignment [ICLR2024]
The toolkit to test, validate, and evaluate your models and surface, curate, and prioritize the most valuable data for labeling.
DataCLUE: 数据为中心的NLP基准和工具包
Rust implementation of the Data Distribution Service (DDS)
[ICLR'23] Implementation of "Empowering Graph Representation Learning with Test-Time Graph Transformation"
A Data Centric NER annotation tool for your Named Entity Recognition projects
Simulator framework for analysis of performance, energy consumption, area and cost of multi-node multi-chiplet tile-based manycore designs
Vue Form with Laravel Inspired Validation and Simply Enjoyable Error Messages Api. (Form Api, Validator Api, Rules Api, Error Messages Api)
An observer is a wrapper over JSON data, that provides an interface to know when data is changed, with a focus on performance and memory efficiency.
Codes for a Top 5% finish in the Data-Centric AI Competition organized by Andrew Ng and DeepLearning.AI
From local functions to cloud deployed pipelines
Jaehyung Kim et al's ACL 2023 paper on "infoVerse: A Universal Framework for Dataset Characterization with Multidimensional Meta-information"
Data-IQ: Characterizing subgroups with heterogeneous outcomes in tabular data (NeurIPS 2022)
Data-SUITE: Data-centric identification of in-distribution incongruous examples (ICML 2022)
Quickly set up an image labelling web application for manually tagging images for machine learning tasks.
Open-source Data Backend written in Java and based on PostgreSQL & GraphQL.
The official Python library for Openlayer, the Continuous Model Improvement Platform for AI. 📈
Add a description, image, and links to the data-centric topic page so that developers can more easily learn about it.
To associate your repository with the data-centric topic, visit your repo's landing page and select "manage topics."