GitHub - dayanruben/cactus: Kernels & AI inference engine for mobile devices.

Cross-platform & energy-efficient kernels, runtime and AI inference engine for mobile devices.

Cactus Graph

Cactus Graph is a general numerical computing framework for implementing any model, like PyTorch for mobile devices.

#include cactus.h

CactusGraph graph;
auto a = graph.input({2, 3}, Precision::FP16);
auto b = graph.input({3, 4}, Precision::INT8);

auto x1 = graph.matmul(a, b, false);
auto x2 = graph.transpose(x1);
auto result = graph.matmul(b, x2, true);

float a_data[6] = {1.1f, 2.3f, 3.4f, 4.2f, 5.7f, 6.8f};
float b_data[12] = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12};
graph.set_input(a, a_data, Precision::FP16);
graph.set_input(b, b_data, Precision::INT8);

graph.execute();
void* output_data = graph.get_output(result);

graph.hard_reset();

Cactus Engine

Cactus Engine is an AI inference engine with OpenAI-compatible APIs built on top of Cactus Graphs.

#include cactus.h

cactus_model_t model = cactus_init("path/to/weight/folder", 2048);

const char* messages = R"([
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "My name is Henry Ndubuaku"}
])";

const char* options = R"({
    "max_tokens": 50,
    "stop_sequences": ["<|im_end|>"]
})";

char response[1024];
int result = cactus_complete(model, messages, response, sizeof(response), options, nullptr, nullptr, nullptr);

Example response from Gemma3-270m-INT8

{
    "success": true,
    "response": "Hi there! I'm just a friendly assistant.",
    "time_to_first_token_ms": 45.23,
    "total_time_ms": 163.67,
    "tokens_per_second": 168.42,
    "prefill_tokens": 28,
    "decode_tokens": 50,
    "total_tokens": 78
}

INT8 Performance

_{Models: LFM2-VL-450m & Whisper-Small}
_{Decode = toks/sec, P/D = prefill/decode, VLM = 256×256 image, STT = 30s audio}
_{Cactus Pro: Uses NPU for realtime and large context (Apple for now), scores are marked with *}
_{Cactus Pro simply needs special weights, email founders@cactuscompute.com}
_{INT4 coming: 1.8x speed, 1.9x smaller files}

Device	Short Decode	1k-P/D	4k-P/D	4k-P Pro	4k-RAM	VLM-TTFT	VLM-Dec	VLM-RAM	STT-TTFT	STT-Dec	STT-RAM
Mac M4 Pro	173	1574/115	1089/100	-	122MB	0.4s/0.1s*	168	112MB	1.7s/0.2s*	83	142MB
Mac M3 Pro	150	1540/109	890/93	-	121MB	0.5s/0.1s*	149	113MB	2.9s/0.4s*	78	140MB
iPad/Mac M4	129	793/82	507/64	-	80MB	0.5s/0.1s*	113	145MB	2.4s0.3s*	60	131MB
iPad/Mac M3	112	786/78	446/60	-	81MB	0.6s/0.1s*	111	154MB	4.2s/0.7s*	58	142MB
iPhone 17 Pro	136	810/105	628/84	-	-	1.1s/0.1s*	120	-	3.0s/0.6s*	-	-
iPhone 16 Pro	114	716/98	580/81	-	-	1.3s/0.2s*	101	-	3.5s/0.7s*	75	-
iPhone 15 Pro	99	549/86	530/75	-	-	1.5s/0.3s*	92	-	3.8s/0.8s*	70	-
Galaxy S25 Ultra	91	230/63	173/57	-	128MB	1.4s	58	-	-	-	-
Nothing 3	56	167/49	160/46	-	-	1.7s	54	-	8.5s	55	-
Nothing 3a	31	114/26	108/24	-	-	2.4s	29	-	-	-	-
Raspberry Pi 5	24	192/28	-	-	-	2.3s	23	-	21s	16	-

Supported models (INT8)

Model	Compressed Size	Completion	Tool Call	Vision	Embed	Speech	Pro
google/gemma-3-270m-it	172MB	✓	✗	✗	✗	✗	Apple
openai/whisper-small	282MB	✗	✗	✗	✓	✓	Apple
LiquidAI/LFM2-350M	233MB	✓	✓	✗	✓	✗	✗
HuggingFaceTB/SmolLM2-360m-Instruct	227MB	✓	✗	✗	✗	✗	✗
LiquidAI/LFM2-VL-450M	420MB	✓	✗	✓	✓	✗	Apple
Qwen/Qwen3-0.6B	394MB	✓	✓	✗	✓	✗	Apple
Qwen/Qwen3-Embedding-0.6B	394MB	✗	✗	✗	✓	✗	✗
LiquidAI/LFM2-700M	467MB	✓	✓	✗	✓	✗	✗
nomic-ai/nomic-embed-text-v2-moe	533MB	✗	✗	✗	✓	✗	✗
google/gemma-3-1b-it	642MB	✓	✗	✗	✗	✗	Apple
openai/whisper-medium	646MB	✗	✗	✗	✓	✓	Apple
LiquidAI/LFM2-1.2B	722MB	✓	✓	✗	✓	✗	✗
LiquidAI/LFM2-1.2B-RAG	722MB	✓	✓	✗	✓	✗	✗
LiquidAI/LFM2-1.2B-Tool	722MB	✓	✓	✗	✓	✗	✗
LiquidAI/LFM2-VL-1.6B	1440MB	✓	✗	✓	✓	✗	Apple
Qwen/Qwen3-1.7B	1161MB	✓	✓	✗	✓	✗	Apple
HuggingFaceTB/SmolLM2-1.7B-Instruct	1161MB	✓	✗	✗	✓	✗	✗

Using up this repo on Mac

Clone repo and run source ./setup.
Setup is automatic and usage instructions printed after.
Run cactus --help to see guides anytime.
Remember to run source ./setup in any new terminal.

Name		Name	Last commit message	Last commit date
Latest commit History 389 Commits
.githooks		.githooks
.github/workflows		.github/workflows
android		android
apple		apple
assets		assets
cactus		cactus
docs		docs
libs		libs
tests		tests
tools		tools
tuning		tuning
.gitignore		.gitignore
CACTUS_VERSION		CACTUS_VERSION
CONTRIBUTING.md		CONTRIBUTING.md
DCO.md		DCO.md
LICENSE		LICENSE
README.md		README.md
setup		setup

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Cactus Graph

Cactus Engine

INT8 Performance

Supported models (INT8)

Using up this repo on Mac

Using in your apps

Try demo apps

About

Uh oh!

Releases

Packages

Languages

License

dayanruben/cactus

Folders and files

Latest commit

History

Repository files navigation

Cactus Graph

Cactus Engine

INT8 Performance

Supported models (INT8)

Using up this repo on Mac

Using in your apps

Try demo apps

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages