nmtools

A generic, composable multidimensional array library.

(Work In Progress)

The projects is still work in progress, expect missing docs, features, and benchmarks.

nmtools

What is nmtools?

Array computation library
Array/Tensor computational graph
Tile-based kernel embedded DSL

nmtools is written in c++ and provide numpy-like array computation in c++. It can also capture the computational graph at compile-time.

Array Library

    auto x = nm::array.arange(15, /*dtype=*/nm::float32).reshape(array{3,5});
    auto t1 = (x - x.max(/*axis=*/1,/*dtype=*/nm::None,/*initial=*/nm::None,/*keepdims=*/true)).exp();
    auto t2 = t1.sum(-1,nm::None,nm::None,true);
    auto softmax = t1 / t2;

    print(x);
    print(t1);
    print(t2);
    print(softmax);

Sample output:

shape: [        3,      5]
[[      0.000000,       1.000000,       2.000000,       3.000000,       4.000000],
[       5.000000,       6.000000,       7.000000,       8.000000,       9.000000],
[       10.000000,      11.000000,      12.000000,      13.000000,      14.000000]]
shape: [        3,      5]
[[      0.018316,       0.049787,       0.135335,       0.367879,       1.000000],
[       0.018316,       0.049787,       0.135335,       0.367879,       1.000000],
[       0.018316,       0.049787,       0.135335,       0.367879,       1.000000]]
shape: [        3,      1]
[[      1.571317],
[       1.571317],
[       1.571317]]
shape: [        3,      5]
[[      0.011656,       0.031685,       0.086129,       0.234122,       0.636409],
[       0.011656,       0.031685,       0.086129,       0.234122,       0.636409],
[       0.011656,       0.031685,       0.086129,       0.234122,       0.636409]]

Computational Graph

    auto input = nm::random(array{3,4},dtype,gen);

    auto axis = -1;
    auto res = view::softmax(input,axis);

    /* Static computational graph: */
    auto graph = fn::get_computational_graph(res);

    /* Compile-time computational graph: */
    constexpr auto graph_v = nm::to_value_v<decltype(unwrap(graph))>;

Tilekit

Check full code: examples/tilekit/add.cpp

/* includes */

/* Multicore + SIMD */
using v128_mt = tk::thread_pool<tk::vector::context_t>;

struct add_kernel_t
{
    template <typename tile_shape_t=tuple<nm::ct<2>,nm::ct<4>>, typename context_t, typename out_t, typename a_t, typename b_t>
    auto operator()(context_t ctx, out_t& out, const a_t& a, const b_t& b, const tile_shape_t t_shape=tile_shape_t{})
    {
        auto [t_id] = tk::worker_id(ctx);
        auto [t_size] = tk::worker_size(ctx);

        auto a_shape = shape(a);
        auto offset  = tk::ndoffset(a_shape,t_shape);
        // t_size num workers
        auto n_iter = (offset.size()/t_size);
        for (nm_size_t i=0; i<n_iter; i++) {
            auto tile_offset = offset[(t_id*n_iter)+i];
            auto block_a = tk::load(ctx,a,tile_offset,t_shape);
            auto block_b = tk::load(ctx,b,tile_offset,t_shape);
            auto result  = block_a + block_b;

            tk::store(ctx,out,tile_offset,result);
        }
    }
};
inline auto add_kernel = add_kernel_t{};

int main(int argc, char** argv)
{
    /* setup a,b,c*/

    auto tile_shape  = tuple{2_ct,16_ct};
    auto num_threads = 8;
    auto ctx         = v128_mt(num_threads);
    auto worker_size = num_threads;

    ctx.eval(worker_size,add_kernel,c,a,b,tile_shape);

    /* check or use result */
    
    return 0;
}

As you can see, we have 8 worker threads saturated with works.

As you can see, the add is vectorized using simd instruction.

Optionally, instrument profiling using tracy is also supported. It provides timeline view, assembly and source.

GPU Support

    auto gen = nm::random_engine();
    auto dtype = nm::float32;

    auto input = nm::random(array{4096,4},dtype,gen);

    // Change hip to cuda/sycl if using nvidia/neutral
    // auto ctx  = nm::cuda::default_context();
    // auto ctx  = nm::sycl::default_context();
    auto ctx  = nm::hip::default_context();
    auto gpu_res = nm::tanh(input,ctx);
    auto cpu_res = nm::tanh(input);

Sample output:

./a.out                                 
[nmtools hip] driver version: 60443484
[nmtools hip] runtime version: 60443484
[nmtools hip] number of hip devices: 1
- compute capability: major: 11 minor: 0
- device name: Radeon RX 7900 XTX
- total mem (bytes): 25753026560
- ecc enabled: 0
- async engine count: 8
- can map to host memory: 1
- can use host pointer for registered memory: 1
- maximum clock (kHz): 2482000
- compute mode: 0
- compute preemption supported: 0
- concurrent kernel execution supported: 1
- coherent access managed memory concurrently with cpu: 1
- cooperative launch supported: 1
- cooperative device launch supported: 1
- caching globals in L1 supported: 1
- host device operation is native atomic: 1
- is integrated GPU: 0
- is multi GPUs: 0
- kernel execution timeout limit: 0
- L2 cache size (bytes): 6291456
- caching locals in L1 supported: 1
- supports allocating managed memory: 1
- maximum block size per multiprocessors: 2
- maximum block size in width (x): 1024
- maximum block size in height (y): 1024
- maximum block size in depth (z): 1024
- maximum grid size in width (x): 2147483647
- maximum grid size in height (y): 65536
- maximum grid size in depth (z): 65536
- maximum threads per block: 1024
- maximum threads per multiprocessor: 2048
- global memory bus width (bits): 384
- maximum memory clock frequency (kHz): 1249000
- multiprocessor count: 48
isclose: true

Other features

Compile-time shape inference
constexpr compile-time evaluation
CUDA, HIP, SYCL support
Support for zero dynamic allocation
Bare-metal Microcontrollers

Requirement

C++17 (full language features)

Supported compilers:

gcc 9+
clang 10+

Getting Started

Clone the repository somewhere:

git clone https://github.com/alifahrri/nmtools.git

Write some code:

// file array.cpp
#include "nmtools/nmtools.hpp"
#include "nmtools/ndarray.hpp"

namespace nm = nmtools;
namespace utils = nmtools::utils;

using namespace nm::literals;
using nmtools_array;
using nmtools_tuple;

template <typename array_t>
auto print(const array_t& x)
{
    std::cout << "shape: " << utils::to_string(nm::shape(x))
        << std::endl
        << utils::to_string(x)
        << std::endl;
}

int main(int argc, char** argv)
{
    auto gen = nm::random_engine();
    auto dtype = nm::float32;
    auto a = nm::Array::random(array{2,3,2},dtype,gen);
    print(a);

    // similar to a[1:,1:2,...]
    auto sa = a.slice("1:"_ct,"1:2"_ct,"..."_ct);
    print(sa);

    std::cout << "a:\n";
    a.slice("1:"_ct,"1:2"_ct,"..."_ct) = nm::ones(array{1,1,2});
    print(a);

    std::cout << "b:\n";
    auto b = nm::Array::arange(2,dtype);
    print(b);

    std::cout << "c=dot(a,b):\n";
    auto c = nm::dot(a,b);
    print(c);

    return 0;
}

nmtools is a header only library, it can be used by simply informing the include path to the compiler. Then compile it.

# adjust the path as necessary
export NMTOOLS_INCLUDE_PATH=${HOME}/projects/nmtools/include
g++ -I$NMTOOLS_INCLUDE_PATH array.cpp

Then you run it:

./a.out

sample result:

shape: [        2,      3,      2]
[[[     0.846539,       0.547375],
[       0.150028,       0.481849],
[       0.761298,       0.949123]],

[[      0.382625,       0.582128],
[       0.845037,       0.914821],
[       0.660163,       0.464962]]]
shape: [        1,      1,      2]
[[[     0.845037,       0.914821]]]
a:
shape: [        2,      3,      2]
[[[     0.846539,       0.547375],
[       0.150028,       0.481849],
[       0.761298,       0.949123]],

[[      0.382625,       0.582128],
[       1.000000,       1.000000],
[       0.660163,       0.464962]]]
b:
shape: [        2]
[       0.000000,       1.000000]
c=dot(a,b):
shape: [        2,      3]
[[      0.547375,       0.481849,       0.949123],
[       0.582128,       1.000000,       0.464962]]

Name		Name	Last commit message	Last commit date
Latest commit History 399 Commits
.devcontainer		.devcontainer
.github/workflows		.github/workflows
cmake		cmake
docker		docker
docs		docs
examples		examples
include		include
scripts		scripts
tests		tests
.dockerignore		.dockerignore
.gitignore		.gitignore
CMakeLists.txt		CMakeLists.txt
DEVELOP.md		DEVELOP.md
LICENSE		LICENSE
README.md		README.md
codecov.yml		codecov.yml
library.json		library.json
nmtools.pc.in		nmtools.pc.in
nmtoolsConfig.cmake.in		nmtoolsConfig.cmake.in
requirements-apt.txt		requirements-apt.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

nmtools

What is nmtools?

Array Library

Computational Graph

Tilekit

GPU Support

Other features

Requirement

Getting Started

Other Examples

About

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

nmtools

What is nmtools?

Array Library

Computational Graph

Tilekit

GPU Support

Other features

Requirement

Getting Started

Other Examples

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Uh oh!

Contributors

Uh oh!

Languages