HPC
Mars is a tensor-based unified framework for large-scale data computation which scales numpy, pandas, scikit-learn and Python functions.
A low-impact profiler to figure out how much memory each task in Dask is using
Monitor Memory usage of Python code
Julia wrapper for the performance monitoring and benchmarking suite LIKWID.
Readily pin Julia threads to CPU-threads
Measuring memory bandwidth using TheBandwidthBenchmark
Allocate arrays with malloc, calloc, or on NUMA nodes
Ask the CPU for cache sizes, SIMD feature support, a running hypervisor, and more.
A framework for out-of-core and parallel execution
Fast sequential, threaded, and distributed for-loops for Julia—fold for humans™
Run jobs on different job queue systems (schedulers) commonly used on HPC compute clusters
A Julia package for strided array views and efficient manipulations thereof
Estimate the absolute performance of a piece of Julia code
Tools for building non-allocating pre-cached functions in Julia, allowing for GC-free usage of automatic differentiation in complex codes
Experimental Distributed Arrays package
Zero-copy MPI communication of JAX arrays, for turbo-charged HPC applications in Python ⚡
Flexible and performant GEMM kernels in Julia
Heterogeneous programming in Julia
Package for writing high-level code for parallel high-performance stencil computations that can be deployed on both GPUs and CPUs
Improved thread management for background and nonuniform tasks in Julia. Docs at https://tro3.github.io/ThreadPools.jl
Large-scale, distributed, sparse linear algebra in Julia.