#

inference-optimization

Here are 33 public repositories matching this topic...

google / XNNPACK

High-efficiency floating-point neural network inference operators for mobile, server, and Web

cpu neural-network inference multithreading simd matrix-multiplication neural-networks convolutional-neural-networks convolutional-neural-network inference-optimization mobile-inference

Updated Nov 8, 2024
C

BaiTheBest / SparseLLM

Official Repo for SparseLLM: Global Pruning of LLMs (NeurIPS 2024)

optimization pruning inference-optimization large-language-models

Updated Oct 21, 2024
Python

yester31 / TensorRT_Examples

All useful sample codes of tensorrt models using onnx

sparsity tensorrt qat onnx inference-optimization resnet18 quantization-aware-training post-training-quantization tensorrt-inference timm real-esrgan ptq depth-pro

Updated Oct 19, 2024
Python

booyasatoshi / quantum-annealer

This is research into optimizing the training and inference for AI models on CPUs using simulated quantum annealing algorithms

quantum-computing inference-optimization annealing-algorithm

Updated Oct 4, 2024
Python

Harly-1506 / Faster-Inference-yolov8

Faster inference YOLOv8: Optimize and export YOLOv8 models for faster inference using OpenVINO and Numpy 🔢

opencv image-processing torch segmentation object-detection numpy-arrays openvino inference-optimization openvino-toolkit numpy-implementation ultralytics yolov8

Updated Sep 16, 2024
Python

alibaba / BladeDISC

BladeDISC is an end-to-end DynamIc Shape Compiler project for machine learning workloads.

machine-learning deep-learning neural-network compiler tensorflow pytorch inference-optimization mlir

Updated Aug 28, 2024
C++

matteo-stat / transformers-nlp-multi-label-classification

This repo provides scripts for fine-tuning HuggingFace Transformers, setting up pipelines and optimizing multi-label classification models for inference. They are based on my experience developing a custom chatbot, I’m sharing these in the hope they will help others to quickly fine-tune and use models in their projects! 😊

nlp text-classification transformers multi-label-classification fine-tuning onnx inference-optimization huggingface onnxruntime huggingface-transformers huggingface-pipelines

Updated Aug 20, 2024
Python

matteo-stat / transformers-nlp-ner-token-classification

This repo provides scripts for fine-tuning HuggingFace Transformers, setting up pipelines and optimizing token classification models for inference. They are based on my experience developing a custom chatbot, I’m sharing these in the hope they will help others to quickly fine-tune and use models in their projects! 😊

nlp transformers named-entity-recognition ner fine-tuning onnx inference-optimization huggingface onnxruntime huggingface-transformers token-classification huggingface-pipelines

Updated Aug 20, 2024
Python

keli-wen / AGI-Study

The blog, read report and code example for AGI/LLM related knowledge.

demo train code-examples inference-optimization llm

Updated Jul 24, 2024
Python

manickavela29 / EmoTwitter

OnnxRT based Inference Optimization of Roberta model trained for Sentiment Analysis On Twitter Dataset

cpu sentiment-analysis avx2 quantization avx512 onnx inference-optimization bert-models onnxruntime roberta-model

Updated Jun 6, 2024
Jupyter Notebook

grazder / template.cpp

[WIP] A template for getting started writing code using GGML

deep-learning cpp inference-optimization ggml

Updated May 1, 2024
C++

amazon-science / mlp-rank-pruning

MLP-Rank: A graph theoretical approach to structured pruning of deep neural networks based on weighted Page Rank centrality as introduced by the related thesis.

machine-learning neural-network pagerank graph-theory pruning multilayer-perceptron structured-sparsity inference-optimization centrality-measures weighted-pagerank

Updated Apr 22, 2024
Python

ksm26 / Efficiently-Serving-LLMs

Learn the ins and outs of efficiently serving Large Language Models (LLMs). Dive into optimization techniques, including KV caching and Low Rank Adapters (LoRA), and gain hands-on experience with Predibase’s LoRAX framework inference server.

text-generation batch-processing server-optimization model-serving model-acceleration inference-optimization optimization-techniques machine-learning-operations deep-learning-techniques model-inference-service performance-enhancement scalability-strategies serving-infrastructure large-scale-deployment

Updated Apr 12, 2024
Jupyter Notebook

Wb-az / YOLOv8-Image-detection

YOLOV8 - Object detection

computer-vision deep-learning live-streaming pandas pytorch object-detection optimization-algorithms average-precision inference-optimization openvino-toolkit openvino-inference-engine ray-tune ultralytics yolov8

Updated Dec 15, 2023
Jupyter Notebook

ankdeshm / inference-optimization

A compilation of various ML and DL models and ways to optimize the their inferences.

cuda python3 pytorch xgboost quantization cnn-classification inference-optimization cudf cuml torchscript acceleration-model nvidia-rapids

Updated Nov 10, 2023
Jupyter Notebook

prabhath-r / Enhancing-BERT-for-NLP-Tasks

Improving Natural Language Processing tasks using BERT-based models

transformers inference-optimization bert-fine-tuning

Updated Oct 16, 2023
Jupyter Notebook

lmaxwell / Armednn

cross-platform modular neural network inference library, small and efficient

neural-network eigen lstm inference-engine eigen3 inference-optimization conv1d

Updated May 15, 2023
C++

jiazhihao / TASO

The Tensor Algebra SuperOptimizer for Deep Learning

deep-neural-networks deep-learning inference-optimization

Updated Jan 26, 2023
C++

batch-partitioning

sjlee25 / batch-partitioning

Batch Partitioning for Multi-PE Inference with TVM (2020)

deep-learning data-parallelism tvm inference-optimization dl-optimization dl-compiler

Updated Dec 17, 2022
Python

Bisonai / ncnn

Modified inference engine for quantized convolution using product quantization

quantization product-quantization edge-machine-learning inference-optimization mobile-deep-learning inference-acceleration

Updated Jul 1, 2022
C++

Improve this page

Add a description, image, and links to the inference-optimization topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the inference-optimization topic, visit your repo's landing page and select "manage topics."