Skip to content

alifahrri/nmtools

Repository files navigation

nmtools

gcc clang web-wasm arm android riscv windows mbed arduino cuda sycl hip codecov

A generic, composable multidimensional array library.

(Work In Progress)

The projects is still work in progress, expect missing docs, features, and benchmarks.

What is nmtools?

  • Array computation library
  • Array/Tensor computational graph
  • Tile-based kernel embedded DSL

nmtools is written in c++ and provide numpy-like array computation in c++. It can also capture the computational graph at compile-time.

Array Library

    auto x = nm::array.arange(15, /*dtype=*/nm::float32).reshape(array{3,5});
    auto t1 = (x - x.max(/*axis=*/1,/*dtype=*/nm::None,/*initial=*/nm::None,/*keepdims=*/true)).exp();
    auto t2 = t1.sum(-1,nm::None,nm::None,true);
    auto softmax = t1 / t2;

    print(x);
    print(t1);
    print(t2);
    print(softmax);

Sample output:

shape: [        3,      5]
[[      0.000000,       1.000000,       2.000000,       3.000000,       4.000000],
[       5.000000,       6.000000,       7.000000,       8.000000,       9.000000],
[       10.000000,      11.000000,      12.000000,      13.000000,      14.000000]]
shape: [        3,      5]
[[      0.018316,       0.049787,       0.135335,       0.367879,       1.000000],
[       0.018316,       0.049787,       0.135335,       0.367879,       1.000000],
[       0.018316,       0.049787,       0.135335,       0.367879,       1.000000]]
shape: [        3,      1]
[[      1.571317],
[       1.571317],
[       1.571317]]
shape: [        3,      5]
[[      0.011656,       0.031685,       0.086129,       0.234122,       0.636409],
[       0.011656,       0.031685,       0.086129,       0.234122,       0.636409],
[       0.011656,       0.031685,       0.086129,       0.234122,       0.636409]]

Computational Graph

    auto input = nm::random(array{3,4},dtype,gen);

    auto axis = -1;
    auto res = view::softmax(input,axis);

    /* Static computational graph: */
    auto graph = fn::get_computational_graph(res);

    /* Compile-time computational graph: */
    constexpr auto graph_v = nm::to_value_v<decltype(unwrap(graph))>;
Description

Tilekit

Check full code: examples/tilekit/add.cpp

/* includes */

/* Multicore + SIMD */
using v128_mt = tk::thread_pool<tk::vector::context_t>;

struct add_kernel_t
{
    template <typename tile_shape_t=tuple<nm::ct<2>,nm::ct<4>>, typename context_t, typename out_t, typename a_t, typename b_t>
    auto operator()(context_t ctx, out_t& out, const a_t& a, const b_t& b, const tile_shape_t t_shape=tile_shape_t{})
    {
        auto [t_id] = tk::worker_id(ctx);
        auto [t_size] = tk::worker_size(ctx);

        auto a_shape = shape(a);
        auto offset  = tk::ndoffset(a_shape,t_shape);
        // t_size num workers
        auto n_iter = (offset.size()/t_size);
        for (nm_size_t i=0; i<n_iter; i++) {
            auto tile_offset = offset[(t_id*n_iter)+i];
            auto block_a = tk::load(ctx,a,tile_offset,t_shape);
            auto block_b = tk::load(ctx,b,tile_offset,t_shape);
            auto result  = block_a + block_b;

            tk::store(ctx,out,tile_offset,result);
        }
    }
};
inline auto add_kernel = add_kernel_t{};

int main(int argc, char** argv)
{
    /* setup a,b,c*/

    auto tile_shape  = tuple{2_ct,16_ct};
    auto num_threads = 8;
    auto ctx         = v128_mt(num_threads);
    auto worker_size = num_threads;

    ctx.eval(worker_size,add_kernel,c,a,b,tile_shape);

    /* check or use result */
    
    return 0;
}

perf-script.png
As you can see, we have 8 worker threads saturated with works.

perf-report.png
As you can see, the add is vectorized using simd instruction.

tracy-add-kernel.png Optionally, instrument profiling using tracy is also supported. It provides timeline view, assembly and source.

GPU Support

    auto gen = nm::random_engine();
    auto dtype = nm::float32;

    auto input = nm::random(array{4096,4},dtype,gen);

    // Change hip to cuda/sycl if using nvidia/neutral
    // auto ctx  = nm::cuda::default_context();
    // auto ctx  = nm::sycl::default_context();
    auto ctx  = nm::hip::default_context();
    auto gpu_res = nm::tanh(input,ctx);
    auto cpu_res = nm::tanh(input);

Sample output:

./a.out                                 
[nmtools hip] driver version: 60443484
[nmtools hip] runtime version: 60443484
[nmtools hip] number of hip devices: 1
- compute capability: major: 11 minor: 0
- device name: Radeon RX 7900 XTX
- total mem (bytes): 25753026560
- ecc enabled: 0
- async engine count: 8
- can map to host memory: 1
- can use host pointer for registered memory: 1
- maximum clock (kHz): 2482000
- compute mode: 0
- compute preemption supported: 0
- concurrent kernel execution supported: 1
- coherent access managed memory concurrently with cpu: 1
- cooperative launch supported: 1
- cooperative device launch supported: 1
- caching globals in L1 supported: 1
- host device operation is native atomic: 1
- is integrated GPU: 0
- is multi GPUs: 0
- kernel execution timeout limit: 0
- L2 cache size (bytes): 6291456
- caching locals in L1 supported: 1
- supports allocating managed memory: 1
- maximum block size per multiprocessors: 2
- maximum block size in width (x): 1024
- maximum block size in height (y): 1024
- maximum block size in depth (z): 1024
- maximum grid size in width (x): 2147483647
- maximum grid size in height (y): 65536
- maximum grid size in depth (z): 65536
- maximum threads per block: 1024
- maximum threads per multiprocessor: 2048
- global memory bus width (bits): 384
- maximum memory clock frequency (kHz): 1249000
- multiprocessor count: 48
isclose: true

Other features

  • Compile-time shape inference
  • constexpr compile-time evaluation
  • CUDA, HIP, SYCL support
  • Support for zero dynamic allocation
  • Bare-metal Microcontrollers

Requirement

  • C++17 (full language features)

Supported compilers:

  • gcc 9+
  • clang 10+

Getting Started

Clone the repository somewhere:

git clone https://github.com/alifahrri/nmtools.git

Write some code:

// file array.cpp
#include "nmtools/nmtools.hpp"
#include "nmtools/ndarray.hpp"

namespace nm = nmtools;
namespace utils = nmtools::utils;

using namespace nm::literals;
using nmtools_array;
using nmtools_tuple;

template <typename array_t>
auto print(const array_t& x)
{
    std::cout << "shape: " << utils::to_string(nm::shape(x))
        << std::endl
        << utils::to_string(x)
        << std::endl;
}

int main(int argc, char** argv)
{
    auto gen = nm::random_engine();
    auto dtype = nm::float32;
    auto a = nm::Array::random(array{2,3,2},dtype,gen);
    print(a);

    // similar to a[1:,1:2,...]
    auto sa = a.slice("1:"_ct,"1:2"_ct,"..."_ct);
    print(sa);

    std::cout << "a:\n";
    a.slice("1:"_ct,"1:2"_ct,"..."_ct) = nm::ones(array{1,1,2});
    print(a);

    std::cout << "b:\n";
    auto b = nm::Array::arange(2,dtype);
    print(b);

    std::cout << "c=dot(a,b):\n";
    auto c = nm::dot(a,b);
    print(c);

    return 0;
}

nmtools is a header only library, it can be used by simply informing the include path to the compiler. Then compile it.

# adjust the path as necessary
export NMTOOLS_INCLUDE_PATH=${HOME}/projects/nmtools/include
g++ -I$NMTOOLS_INCLUDE_PATH array.cpp

Then you run it:

./a.out

sample result:

shape: [        2,      3,      2]
[[[     0.846539,       0.547375],
[       0.150028,       0.481849],
[       0.761298,       0.949123]],

[[      0.382625,       0.582128],
[       0.845037,       0.914821],
[       0.660163,       0.464962]]]
shape: [        1,      1,      2]
[[[     0.845037,       0.914821]]]
a:
shape: [        2,      3,      2]
[[[     0.846539,       0.547375],
[       0.150028,       0.481849],
[       0.761298,       0.949123]],

[[      0.382625,       0.582128],
[       1.000000,       1.000000],
[       0.660163,       0.464962]]]
b:
shape: [        2]
[       0.000000,       1.000000]
c=dot(a,b):
shape: [        2,      3]
[[      0.547375,       0.481849,       0.949123],
[       0.582128,       1.000000,       0.464962]]

Other Examples

About

A generic, composable multi-dimensional array library.

Topics

Resources

License

Stars

Watchers

Forks

Contributors

Languages