AI-powered Quantitative Investment Research Platform.
-
Updated
Nov 10, 2025 - HTML
AI-powered Quantitative Investment Research Platform.
A project that demonstrates how to deploy AI models with significant improvements, within containerized environments using Cog. Ideal for reproducible, scalable and hardware-efficient inference.
This repository contains an HR Policy Query Resolution system using Retrieval-Augmented Generation (RAG). It leverages a 4-bit quantized Mistral-7B-Instruct-v0.2 LLM and JP Morgan Chase’s publicly available Code of Conduct documents to generate accurate, contextually relevant responses for HR policy queries.
[ICLR2025 Spotlight] SVDQuant: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion Models
PyTorch native quantization and sparsity for training and inference
Trustworthy onboard satellite AI in PyTorch→ONNX→INT8 with calibration, telemetry, and a PhiSat-2 EO tile-filter demo.
🤖 Build AI agents that combine OpenAI's orchestration and Claude's execution for effective production solutions.
Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM
📊 Transform documents into a smart knowledge base using Neo4j and Azure AI for efficient, intelligent searching and answer generation.
🌐 Run GGUF models directly in your web browser using JavaScript and WebAssembly for a seamless and flexible AI experience.
Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)
🚀 Simplify running, sharing, and shipping Hugging Face models with autopack; it quantizes and exports to multiple formats effortlessly.
🔍 Optimize RAG systems by exploring Lexical, Semantic, and Hybrid Search methods for better context retrieval and improved LLM responses.
Open-source quant finance foundation unites trading tools and protocols, funds community projects, and boosts cross-project interoperability for collaboration 🐙
Neural Network Compression Framework for enhanced OpenVINO™ inference
Quantize TinyLlama-1.1B-Chat from PyTorch to CoreML (float16, int8, int4) for efficient on-device inference on iOS 18+.
Wrapture lets you go from a Python-trained model to deployable JavaScript with a single command. It generates TypeScript bindings and a Web/Node-compatible wrapper, using WebGPU/WASM-ready ONNX runtimes.
This project implements a complete pipeline for 3D mesh preprocessing, normalization, quantization, and error analysis. The work simulates the data preparation phase for AI systems like SeamGPT that work with 3D meshes.
Advanced Quantization Algorithm for LLMs and VLMs, with support for CPU, Intel GPU, CUDA and HPU.
Add a description, image, and links to the quantization topic page so that developers can more easily learn about it.
To associate your repository with the quantization topic, visit your repo's landing page and select "manage topics."