- Lahore, Pakistan
- https://llamatelemetry.github.io/
- in/mohammad-waqas-3a1384270
- @waqasm86
-
floci Public
Forked from floci-io/flociLight, fluffy, and always free - The AWS Local Emulator alternative
Java MIT License UpdatedMay 16, 2026 -
-
-
skills-github-pages Public
Exercise: Create a site or blog from your GitHub repositories with GitHub Pages
MIT License UpdatedApr 30, 2026 -
llm-observability-stack Public
This is an opinionated umbrella Helm chart for your local single-node **k3s + NVIDIA GPU + Ollama + Open WebUI + LangChain/LangSmith** setup.
Jupyter Notebook UpdatedMar 24, 2026 -
ClaudeHistoryMCP Public
Forked from jhammant/ClaudeHistoryMCPMCP server for searching and surfacing Claude Code conversation history
TypeScript UpdatedFeb 24, 2026 -
llamatelemetry Public
CUDA-first OpenTelemetry Python SDK for LLM inference observability and explainability.
Python MIT License UpdatedFeb 23, 2026 -
hive Public
Forked from aden-hive/hiveOutcome driven agent development framework that evolves
Python Apache License 2.0 UpdatedFeb 12, 2026 -
mito Public
Forked from mito-ds/mitoJupyter extensions that help you write code faster: Context aware AI Chat, Autocomplete, and Spreadsheet
Jupyter Notebook Other UpdatedFeb 6, 2026 -
grafana-com-public-clients Public
Forked from grafana/grafana-com-public-clientsgrafana.com API Clients
Shell Apache License 2.0 UpdatedFeb 6, 2026 -
nbdev Public
Forked from AnswerDotAI/nbdevCreate delightful software with Jupyter Notebooks
Jupyter Notebook Apache License 2.0 UpdatedFeb 4, 2026 -
lon-mirror Public
Forked from Tuttotorna/lon-mirrorMB-X.01 · Logical Origin Node (L.O.N.) — TruthΩ → Co⁺ → Score⁺. Demo e spec verificabili. https://massimiliano.neocities.org/
Python MIT License UpdatedFeb 3, 2026 -
-
llcuda Public
Forked from llcuda/llcudaCUDA 12-first backend inference for Unsloth on Kaggle — Optimized for small GGUF models (1B-5B) on dual Tesla T4 GPUs (15GB each, SM 7.5)
Jupyter Notebook MIT License UpdatedFeb 1, 2026 -
GRIT Public
Forked from eric-ai-lab/GRITOfficial code for NeurIPS 2025 paper "GRIT: Teaching MLLMs to Think with Images"
Python MIT License UpdatedJan 16, 2026 -
notebooks Public
Forked from roboflow/notebooksA collection of tutorials on state-of-the-art computer vision models and techniques. Explore everything from foundational architectures like ResNet to cutting-edge models like RF-DETR, YOLO11, SAM …
Jupyter Notebook UpdatedJan 15, 2026 -
Pre-built llama.cpp CUDA binary for Ubuntu 22.04. No compilation required - download, extract, and run! Works with llcuda Python package for JupyterLab integration. Tested on GeForce 940M to RTX 4090.
-
cuda-nvidia-systems-engg Public
Production-grade C++20/CUDA distributed LLM inference system with TCP networking, MPI scheduling, and content-addressed storage. Features comprehensive benchmarking (p50/p95/p99 latencies), epoll a…
C++ MIT License UpdatedDec 27, 2025 -
local-llama-cuda Public
Custom CUDA implementation for LLM inference with MPI-based distributed computing. Memory-efficient layer offloading, multi-rank coordination, and GPU optimization for constrained hardware (1GB VRAM).
C++ MIT License UpdatedDec 25, 2025 -
cuda-tcp-llama.cpp Public
High-performance TCP inference gateway with epoll async I/O for CUDA-accelerated LLM serving. Binary protocol, connection pooling, streaming responses. Zero dependencies beyond POSIX and CUDA.
C++ UpdatedDec 23, 2025 -
cuda-openmpi Public
CUDA-aware OpenMPI integration for GPU-accelerated distributed computing. Multi-GPU LLM inference with MPI communication, performance benchmarking, and collective operations testing.
Cuda MIT License UpdatedDec 23, 2025 -
cuda-llm-storage-pipeline Public
Content-addressed LLM model distribution with SHA256 verification and SeaweedFS integration. Distributed storage, manifest management, LRU caching, and integrity checking for GGUF models.
C++ UpdatedDec 23, 2025 -
cuda-mpi-llama-scheduler Public
Distributed MPI scheduler with work-stealing algorithm for LLM inference. Percentile latency analysis (p50/p95/p99), throughput benchmarking, multi-rank load balancing, and empirical performance me…
Cuda UpdatedDec 23, 2025 -
-
cmake-superbuild-toolkit Public
Qt-style CMake superbuild demo: FetchContent deps, feature flags, install/export targets, CI matrix, tests, and CPack packaging.
CMake Other UpdatedDec 16, 2025 -
MCP stdio server for Windsurf that routes tool calls to a local llama.cpp llama-server (GGUF), optimized for low-VRAM GPUs.
Python UpdatedDec 13, 2025 -
-
Wolfram-llama.cpp Public
This is a sample project to use wolfram with llama.cpp
UpdatedNov 18, 2025 -
-