High-efficiency floating-point neural network inference operators for mobile, server, and Web
-
Updated
Nov 8, 2024 - C
High-efficiency floating-point neural network inference operators for mobile, server, and Web
Official Repo for SparseLLM: Global Pruning of LLMs (NeurIPS 2024)
All useful sample codes of tensorrt models using onnx
This is research into optimizing the training and inference for AI models on CPUs using simulated quantum annealing algorithms
Faster inference YOLOv8: Optimize and export YOLOv8 models for faster inference using OpenVINO and Numpy 🔢
BladeDISC is an end-to-end DynamIc Shape Compiler project for machine learning workloads.
This repo provides scripts for fine-tuning HuggingFace Transformers, setting up pipelines and optimizing multi-label classification models for inference. They are based on my experience developing a custom chatbot, I’m sharing these in the hope they will help others to quickly fine-tune and use models in their projects! 😊
This repo provides scripts for fine-tuning HuggingFace Transformers, setting up pipelines and optimizing token classification models for inference. They are based on my experience developing a custom chatbot, I’m sharing these in the hope they will help others to quickly fine-tune and use models in their projects! 😊
The blog, read report and code example for AGI/LLM related knowledge.
OnnxRT based Inference Optimization of Roberta model trained for Sentiment Analysis On Twitter Dataset
[WIP] A template for getting started writing code using GGML
MLP-Rank: A graph theoretical approach to structured pruning of deep neural networks based on weighted Page Rank centrality as introduced by the related thesis.
Learn the ins and outs of efficiently serving Large Language Models (LLMs). Dive into optimization techniques, including KV caching and Low Rank Adapters (LoRA), and gain hands-on experience with Predibase’s LoRAX framework inference server.
YOLOV8 - Object detection
A compilation of various ML and DL models and ways to optimize the their inferences.
Improving Natural Language Processing tasks using BERT-based models
cross-platform modular neural network inference library, small and efficient
The Tensor Algebra SuperOptimizer for Deep Learning
Batch Partitioning for Multi-PE Inference with TVM (2020)
Modified inference engine for quantized convolution using product quantization
Add a description, image, and links to the inference-optimization topic page so that developers can more easily learn about it.
To associate your repository with the inference-optimization topic, visit your repo's landing page and select "manage topics."