- http://www.uncomplicate.org
- http://dragan.rocks
- @draganrocks
Stars
An uber-fast parallelized Java classpath scanner and module scanner.
FlashSampling: Fast and Memory-Efficient Exact Sampling (https://huggingface.co/papers/2603.15854)
On-device AI across mobile, embedded and edge for PyTorch
Py4J enables Python programs to dynamically access arbitrary Java objects
Speech-to-text, text-to-speech, speaker diarization, speech enhancement, source separation, and VAD using next-gen Kaldi with onnxruntime without Internet connection. Support embedded systems, Andr…
Matplot++: A C++ Graphics Library for Data Visualization 📊🗾
⚡ Datoviz: high-performance GPU rendering for scientific data visualization
BLAS-like Extensions for Neanderthal, Fast Clojure Matrix Library
lightweight, standalone C++ inference engine for Google's Gemma models.
Helpful kernel tutorials, examples and SKILLs for tile-based GPU programming
Up to 100x faster strings for C, C++, CUDA, Python, Rust, Swift, JS, & Go, leveraging NEON, AVX2, AVX-512, SVE, GPGPU, & SWAR to accelerate search, hashing, sorting, edit distances, sketches, and m…
onnxruntime-extensions: A specialized pre- and post- processing library for ONNX Runtime
Generative AI extensions for onnxruntime
Awesome LLM Books: Curated list of books on Large Language Models
Unsupervised text tokenizer for Neural Network-based text generation.
💥 Fast State-of-the-Art Tokenizers optimized for Research and Production
Practice Clojure using Interactive Programming in your editor
Flexible and powerful tensor operations for readable and reliable code (for pytorch, jax, TF and others)
TensorDict is a pytorch dedicated tensor container.
A repository of all code from Introduction to System Programming in Linux, by Stewart Weiss
Samples for CUDA Developers which demonstrates features in CUDA Toolkit
A k means implementation in Clojure which supports clustering on larger than memory but smaller than storage datasets.
Dear ImGui: Bloat-free Graphical User interface for C++ with minimal dependencies