Semantic model router with parallel LLM classification, prompt caching, and vision short-circuiting. Optimizes request routing with sub-100ms overhead for Open WebUI.
-
Updated
Feb 13, 2026 - Python
Semantic model router with parallel LLM classification, prompt caching, and vision short-circuiting. Optimizes request routing with sub-100ms overhead for Open WebUI.
This repository presents an efficient approach for fine-tuning large language models for the medical domain using 4-bit quantization and LoRA techniques.
Tools and experiments for converting Human Activity Recognition (HAR) models to TensorFlow Lite for efficient on-device inference on mobile and wearable devices.
Aprendizagem e Extração de Conhecimento
Predicts telecom customer churn with machine learning and an interactive Streamlit app. Features include single/batch predictions, dashboards, and actionable insights for improved retention.
A minimal, high-performance starter kit for running AI model inference on NVIDIA GPUs using CUDA. Includes environment setup, sample kernels, and guidance for integrating ONNX/TensorRT pipelines for fast, optimized inference on modern GPU hardware.
Minimal Reproducibility Study of (https://arxiv.org/abs/1911.05248). Experiments with Compression of Deep Neural Networks
ML journey to explore concepts and framework through code and math. It serves as a personal log of my learning experiences, revisiting foundational topics, and delving into new areas within the field.
This is an End to End project and Api deployment for Spain electricity shortfall prediction
Practical experience in hyperparameter tuning techniques using the Keras Tuner library. Hyperparameter tuning plays a crucial role in optimizing machine learning models, and this project offers hands-on learning opportunities. Exploring different hyperparameter tuning methods, including random search, grid search, and Bayesian optimization
This project is built to detect spam messages using a Long Short-Term Memory (LSTM) model combined with Word2Vec as the word embedding technique. The model has been optimized using Grid Search, achieving a best accuracy of 95.65%.
NU Bootcamp Module 21
The "Predicting Startup Outcomes with XGBoost and Machine Learning" project uses machine learning algorithms, particularly XGBoost, to predict the success or failure of startups based on historical data. It leverages feature engineering and model optimization to enhance prediction accuracy.
An advanced study on optimizing Transfer Learning pipelines (VGG16 & ResNet50) for the CIFAR-10 dataset. Implements Fine-Tuning, L2 Regularization, Dropout, and Learning Rate Scheduling to solve overfitting and boost classification accuracy
Comprehensive performance analysis of DeepSeek V3 quantization levels (FP16, Q8_0, Q4_0) on 16GB GPU environments.
Optimized IDKL Model for Visible-Infrared Person Re-Identification focusing on efficiency for resource-constrained hardware.
Vision-lanugage model example code.
NLP pipeline with parameter-efficient LoRA fine-tuning on FLAN-T5-XXL (11B params). Achieves +2.6 ROUGE-1 improvement with <1% trainable parameters and 8-bit quantization for scientific paper summarization.
Training and fine-tuning pipeline for a custom GPT-style language model built exclusively for Amharic. Pretrained on a 12+ GB corpus and adapted on curated datasets, with support for SentencePiece tokenization, LoRA fine-tuning, and efficient inference tools.
Nonprofit foundation Alphabet Soup wants a tool that can help it select the applicants for funding with the best chance of success in their ventures. Using machine learning and neural networks, you’ll use the features in the provided dataset to create a binary classifier that can predict whether applicants will be successful if funded.
Add a description, image, and links to the model-optimization topic page so that developers can more easily learn about it.
To associate your repository with the model-optimization topic, visit your repo's landing page and select "manage topics."