QVAC-20556 feat[api]: enable Android GPU for Parakeet (overlay; CI validation) [DO-NOT-MERGE] by pratiknarola-t · Pull Request #2577 · tetherto/qvac

pratiknarola-t · 2026-06-12T18:44:22Z

⚠️ DO-NOT-MERGE — measurement vehicle

Overlay-only PR (ticket QVAC-20556) to get an empirical AWS Device Farm signal on whether the latest speech stack drives Parakeet on Android GPUs (Pixel 9 / Mali + S25 Ultra / Adreno 830). This is the inverse of the CPU-only workaround in #2525 — please don't merge over it.

Add the verified label to fire the device-farm leg.

What this changes

packages/transcription-parakeet/:

ParakeetModel::load — remove the #ifdef __ANDROID__ guard that forced useGPU=false (kept the n_gpu_layers logic + the GPU-init→CPU fallback warning).
CMakeLists.txt — widen the Android backend-staging glob from libqvac-speech-ggml-cpu-*.so to libqvac-speech-ggml-*.so so the vulkan/opencl MODULE libs ship in the prebuild (reverses the [0.7.2] CPU-only packaging); refresh the now-stale "intentionally CPU-only" comments.
gpu-smoke.test.js — drop the four Android early-pass skips so the strict assertGpuBackend (backendDevice=1, backendId Vulkan/OpenCL) runs on device.
In-package vcpkg overlay ports — ggml-speech@44fd4817 (speech HEAD) + parakeet-cpp@ed749556 (whisper.cpp master), wired via overlay-ports in vcpkg-configuration.json. Registry baseline and registry version>= pins are unchanged — the registry PR is deferred until the device-farm result is understood.
vcpkg.json — bump parakeet-cpp version>= to the overlay version-date.

Local device finding (Adreno 740 / iQOO 11, TDT q4_0)

Run directly against this branch's prebuild on a physically-attached Adreno 740:

Path	Backend	Result
CPU (`useGPU=false`)	CPU (id 0)	✅ correct transcript
GPU, engine default	OpenCL (id 4, auto-selected on Adreno>700)	❌ SIGABRT — `ggml_backend_opencl_graph_compute: op not supported joint.token_argmax (ARGMAX)` → `GGML_ASSERT`
GPU, OpenCL withheld	Vulkan (id 3)	⚠️ runs, but transcript degraded vs CPU (dropped words) and ~2× slower

So on the Adreno the engine picks OpenCL, whose backend lacks ARGMAX and aborts in graph-compute instead of falling back to CPU. The Vulkan path (the one ggml-speech@8bf760f4 reported byte-identical on this exact device) is not what the engine selects, and even when forced it no longer reproduces the byte-identical result on the current 44fd4817/ed749556 stack.

Expectation for the device-farm run: the Adreno (S25) leg likely hits the same OpenCL ARGMAX abort (which can SIGABRT the Bare worklet and take down subsequent tests, cf. #2525); the Mali (Pixel 9) leg exercises the Vulkan path.

Note (pre-existing, out of scope)

While bringing this up on a local device, found that the addon's BACKENDS_SUBDIR compile-definition is PRIVATE on the bare-module target but ParakeetModel.cpp compiles into parakeet_model_core, so the subdir isn't appended to a host-provided default backendsDir. The device-farm/APK passes an explicit flat nativeLibraryDir, so CI is unaffected — but a host relying on the __dirname/prebuilds default would not find the backend .so. Filed mentally as a follow-up; not touched here.

Refs

ggml-speech 44fd4817 (qvac-ext-ggml@speech HEAD)
parakeet-cpp ed749556 (qvac-ext-lib-whisper.cpp@master HEAD)
Related: fix[notask]: ship Parakeet CPU-only on Android to stop Adreno Vulkan SIGABRT #2525 (parakeet Android CPU-only), QVAC-19255 feat[api]: reintroduce Supertonic GPU support (desktop/iOS; Android CPU-only) #2506 (Supertonic desktop/iOS GPU, Android CPU-only)

…lidation) DO-NOT-MERGE — overlay-only PR to get an empirical AWS Device Farm signal on whether the latest speech stack drives Parakeet on Android GPUs (Pixel 9/Mali + S25/Adreno 830). This is the inverse of the CPU-only workaround in #2525. Changes (packages/transcription-parakeet): - ParakeetModel::load — remove the __ANDROID__ guard that forced useGPU=false. - CMakeLists — widen the Android backend-staging glob from libqvac-speech-ggml-cpu-*.so to libqvac-speech-ggml-*.so so the Vulkan/OpenCL MODULE libs ship in the prebuild (reverses the [0.7.2] CPU-only packaging); refresh the now-stale "intentionally CPU-only" comments. - gpu-smoke.test.js — drop the four Android early-pass skips so the strict assertGpuBackend (backendDevice=1, backendId Vulkan/OpenCL) runs on device. - vcpkg overlay ports (in-package) — ggml-speech@44fd4817 (speech HEAD) + parakeet-cpp@ed749556 (whisper.cpp master), wired via the overlay-ports entry in vcpkg-configuration.json. Registry baseline and registry version>= pins are unchanged; the registry PR is deferred. - vcpkg.json — bump parakeet-cpp version>= to the overlay version-date. Local device finding (Adreno 740 / iQOO 11), TDT q4_0, recorded for reviewers: - CPU: correct transcript, backendDevice=0. - GPU OpenCL (engine auto-selects this on Adreno>700): aborts in graph-compute — "op not supported joint.token_argmax (ARGMAX)" -> GGML_ASSERT (SIGABRT). - GPU Vulkan (forced by withholding the OpenCL module): runs (backendId=3) but output is degraded vs CPU (dropped words) and ~2x slower; NOT the byte-identical result ggml-speech 8bf760f4 reported. Expect the Device Farm Adreno (S25) leg to hit the OpenCL ARGMAX abort and the Mali leg to exercise the Vulkan path. Do not merge — this is a measurement vehicle.

pratiknarola-t · 2026-06-12T18:47:59Z

Local Adreno 740 (iQOO 11) matrix — refined

Ran each model type directly against this branch's prebuild on a physically-attached Adreno 740. On Adreno the engine auto-selects OpenCL (policy: Adreno>700 → OpenCL). Results:

Model	CPU	OpenCL (GPU, auto)	Vulkan (GPU, OpenCL withheld)
TDT (q4_0)	✅ correct	❌ SIGABRT — `ggml_backend_opencl_graph_compute: op not supported joint.token_argmax (ARGMAX)` → `GGML_ASSERT`	⚠️ runs (`backendId=3`) but transcript degraded vs CPU + ~2× slower
EOU (q4_0)	✅	✅ correct (95 tokens)	—
Sortformer (q8_0)	—	✅ correct (speaker labels)	—
CTC	n/a on mobile	n/a	—

Takeaway: the GPU blocker is narrow — TDT's joint.token_argmax (ARGMAX) is not implemented in the ggml OpenCL backend, and supports_op/graph-compute aborts instead of falling back to CPU. EOU and Sortformer run fine on OpenCL. The Vulkan path supports the op (no crash) but is degraded/slower on this device, and is not what the engine selects on Adreno anyway.

Implications for the Device Farm run:

Adreno (S25/830) leg: EOU + Sortformer GPU should pass; the TDT GPU smoke will likely SIGABRT (and a Bare-worklet abort can cascade to later tests).
Mali (Pixel 9) leg: exercises the Vulkan path (no OpenCL on non-Adreno) — separate unknown.

Fix directions (follow-up, not in this PR): implement ARGMAX in ggml-opencl, OR make ggml-opencl supports_op return false for ARGMAX so it routes to CPU, OR have parakeet-cpp keep the TDT joint argmax on CPU.

Separately, a pre-existing latent bug surfaced during bring-up: the addon's BACKENDS_SUBDIR compile-def is PRIVATE on the bare-module target while ParakeetModel.cpp compiles into parakeet_model_core, so the subdir isn't appended to a host-provided default backendsDir (__dirname/prebuilds). The device-farm APK passes an explicit flat nativeLibraryDir, so CI is unaffected — but a host relying on the default would not find the backend .so.

github-actions · 2026-06-12T18:48:26Z

Tier-based Approval Status

**PR Tier:** TIER1

**Current Status:** ❌ PENDING

**Requirements:**
- 1 Team Member approval ❌ (0/1)
- 1 Team Lead OR Management approval ❌ (0/1)



---
*This comment is automatically updated when reviews change.*

github-actions · 2026-06-12T19:29:44Z

Mobile integration tests — @qvac/transcription-parakeet (Android)

Result: passed

metric	value
Devices passed	2
Devices failed	0
Test cases total	6
Test cases passed	6
Test cases failed	0
Test cases skipped	0

View workflow run

github-actions · 2026-06-12T19:36:39Z

Mobile integration tests — @qvac/transcription-parakeet (iOS)

Result: failed

metric	value
Devices passed	1
Devices failed	1
Test cases total	6
Test cases passed	4
Test cases failed	0
Test cases skipped	0

View workflow run

Bump the parakeet-cpp overlay to 06cef8e7 (off ed749556). The TDT transducer's per-step GPU graphs do an in-place read-and-write of the LSTM persistent state; Adreno OpenCL drops those aliased ggml_cpy writes, so the prediction state never advances and the decode emits one constant token per frame. The fix routes the TDT decode to the host scalar path on OpenCL while the encoder still runs on the GPU (stats.backendId stays OpenCL). enc_proj is write-only so it's fine; EOU/Sortformer don't use this persistent-state pattern, so they already ran correctly on OpenCL. Verified on-device (Adreno 740 / iQOO 11): TDT-OpenCL now matches the CPU baseline byte-for-byte; TDT-Vulkan/CPU and EOU/Sortformer-OpenCL unchanged. - vcpkg-overlay-ports/parakeet-cpp: REF/SHA512 -> 06cef8e7, version-date 2026-06-15 - vcpkg.json: parakeet-cpp version>= 2026-06-15

Bump the parakeet-cpp overlay to bb585eb1: ARM Mali (Valhall) Vulkan mis-computes every parakeet model (its narrow subgroup width breaks the ggml-vulkan shaders), so the engine guards Mali by name and routes it to CPU; Adreno OpenCL and Samsung Xclipse Vulkan are correct and run on the GPU. TDT host-decode on Adreno OpenCL is unchanged. The addon surfaces engine gpu_unsupported() as stats.gpuUnsupported, and the GPU smoke test treats a CPU backend with gpuUnsupported=1 on Android as the expected, correct result instead of a GPU regression.

pratiknarola-t requested review from a team as code owners June 12, 2026 18:44

pratiknarola-t added the verified Authorize secrets / label-gate in PR workflows label Jun 12, 2026

pratiknarola-t temporarily deployed to release June 12, 2026 18:45 — with GitHub Actions Inactive

pratiknarola-t had a problem deploying to release June 12, 2026 18:45 — with GitHub Actions Failure

pratiknarola-t temporarily deployed to release June 12, 2026 18:45 — with GitHub Actions Inactive

pratiknarola-t temporarily deployed to release June 12, 2026 18:58 — with GitHub Actions Inactive

pratiknarola-t had a problem deploying to release June 12, 2026 18:58 — with GitHub Actions Failure

pratiknarola-t temporarily deployed to release June 12, 2026 18:58 — with GitHub Actions Inactive

pratiknarola-t had a problem deploying to release June 12, 2026 19:41 — with GitHub Actions Failure

pratiknarola-t temporarily deployed to release June 15, 2026 06:51 — with GitHub Actions Inactive

pratiknarola-t had a problem deploying to release June 15, 2026 08:33 — with GitHub Actions Failure

pratiknarola-t temporarily deployed to release June 15, 2026 08:33 — with GitHub Actions Inactive

pratiknarola-t temporarily deployed to release June 15, 2026 08:41 — with GitHub Actions Inactive

pratiknarola-t mentioned this pull request Jun 15, 2026

QVAC-20556 parakeet-cpp: TDT host-decode on Adreno OpenCL + route Mali Vulkan to CPU tetherto/qvac-ext-lib-whisper.cpp#46

Draft

pratiknarola-t force-pushed the qvac-20556-parakeet-android-gpu branch from 11c94aa to f1fa6e3 Compare June 15, 2026 10:17

pratiknarola-t had a problem deploying to release June 15, 2026 10:18 — with GitHub Actions Failure

pratiknarola-t temporarily deployed to release June 15, 2026 10:18 — with GitHub Actions Inactive

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

QVAC-20556 feat[api]: enable Android GPU for Parakeet (overlay; CI validation) [DO-NOT-MERGE]#2577

QVAC-20556 feat[api]: enable Android GPU for Parakeet (overlay; CI validation) [DO-NOT-MERGE]#2577
pratiknarola-t wants to merge 3 commits into
mainfrom
qvac-20556-parakeet-android-gpu

pratiknarola-t commented Jun 12, 2026

Uh oh!

pratiknarola-t commented Jun 12, 2026

Uh oh!

github-actions Bot commented Jun 12, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 12, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 12, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

pratiknarola-t commented Jun 12, 2026

⚠️ DO-NOT-MERGE — measurement vehicle

What this changes

Local device finding (Adreno 740 / iQOO 11, TDT q4_0)

Note (pre-existing, out of scope)

Refs

Uh oh!

pratiknarola-t commented Jun 12, 2026

Local Adreno 740 (iQOO 11) matrix — refined

Uh oh!

github-actions Bot commented Jun 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Tier-based Approval Status

Uh oh!

github-actions Bot commented Jun 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Mobile integration tests — @qvac/transcription-parakeet (Android)

Uh oh!

github-actions Bot commented Jun 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Mobile integration tests — @qvac/transcription-parakeet (iOS)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

github-actions Bot commented Jun 12, 2026 •

edited

Loading

github-actions Bot commented Jun 12, 2026 •

edited

Loading

github-actions Bot commented Jun 12, 2026 •

edited

Loading