QVAC-20556 feat[api]: enable Android GPU for Parakeet (overlay; CI validation) [DO-NOT-MERGE]#2577
QVAC-20556 feat[api]: enable Android GPU for Parakeet (overlay; CI validation) [DO-NOT-MERGE]#2577pratiknarola-t wants to merge 3 commits into
Conversation
…lidation) DO-NOT-MERGE — overlay-only PR to get an empirical AWS Device Farm signal on whether the latest speech stack drives Parakeet on Android GPUs (Pixel 9/Mali + S25/Adreno 830). This is the inverse of the CPU-only workaround in #2525. Changes (packages/transcription-parakeet): - ParakeetModel::load — remove the __ANDROID__ guard that forced useGPU=false. - CMakeLists — widen the Android backend-staging glob from libqvac-speech-ggml-cpu-*.so to libqvac-speech-ggml-*.so so the Vulkan/OpenCL MODULE libs ship in the prebuild (reverses the [0.7.2] CPU-only packaging); refresh the now-stale "intentionally CPU-only" comments. - gpu-smoke.test.js — drop the four Android early-pass skips so the strict assertGpuBackend (backendDevice=1, backendId Vulkan/OpenCL) runs on device. - vcpkg overlay ports (in-package) — ggml-speech@44fd4817 (speech HEAD) + parakeet-cpp@ed749556 (whisper.cpp master), wired via the overlay-ports entry in vcpkg-configuration.json. Registry baseline and registry version>= pins are unchanged; the registry PR is deferred. - vcpkg.json — bump parakeet-cpp version>= to the overlay version-date. Local device finding (Adreno 740 / iQOO 11), TDT q4_0, recorded for reviewers: - CPU: correct transcript, backendDevice=0. - GPU OpenCL (engine auto-selects this on Adreno>700): aborts in graph-compute — "op not supported joint.token_argmax (ARGMAX)" -> GGML_ASSERT (SIGABRT). - GPU Vulkan (forced by withholding the OpenCL module): runs (backendId=3) but output is degraded vs CPU (dropped words) and ~2x slower; NOT the byte-identical result ggml-speech 8bf760f4 reported. Expect the Device Farm Adreno (S25) leg to hit the OpenCL ARGMAX abort and the Mali leg to exercise the Vulkan path. Do not merge — this is a measurement vehicle.
Local Adreno 740 (iQOO 11) matrix — refinedRan each model type directly against this branch's prebuild on a physically-attached Adreno 740. On Adreno the engine auto-selects OpenCL (policy: Adreno>700 → OpenCL). Results:
Takeaway: the GPU blocker is narrow — TDT's Implications for the Device Farm run:
Fix directions (follow-up, not in this PR): implement Separately, a pre-existing latent bug surfaced during bring-up: the addon's |
Tier-based Approval Status |
Mobile integration tests — @qvac/transcription-parakeet (Android)Result: passed
|
Mobile integration tests — @qvac/transcription-parakeet (iOS)Result: failed
|
Bump the parakeet-cpp overlay to 06cef8e7 (off ed749556). The TDT transducer's per-step GPU graphs do an in-place read-and-write of the LSTM persistent state; Adreno OpenCL drops those aliased ggml_cpy writes, so the prediction state never advances and the decode emits one constant token per frame. The fix routes the TDT decode to the host scalar path on OpenCL while the encoder still runs on the GPU (stats.backendId stays OpenCL). enc_proj is write-only so it's fine; EOU/Sortformer don't use this persistent-state pattern, so they already ran correctly on OpenCL. Verified on-device (Adreno 740 / iQOO 11): TDT-OpenCL now matches the CPU baseline byte-for-byte; TDT-Vulkan/CPU and EOU/Sortformer-OpenCL unchanged. - vcpkg-overlay-ports/parakeet-cpp: REF/SHA512 -> 06cef8e7, version-date 2026-06-15 - vcpkg.json: parakeet-cpp version>= 2026-06-15
11c94aa to
f1fa6e3
Compare
Bump the parakeet-cpp overlay to bb585eb1: ARM Mali (Valhall) Vulkan mis-computes every parakeet model (its narrow subgroup width breaks the ggml-vulkan shaders), so the engine guards Mali by name and routes it to CPU; Adreno OpenCL and Samsung Xclipse Vulkan are correct and run on the GPU. TDT host-decode on Adreno OpenCL is unchanged. The addon surfaces engine gpu_unsupported() as stats.gpuUnsupported, and the GPU smoke test treats a CPU backend with gpuUnsupported=1 on Android as the expected, correct result instead of a GPU regression.
Overlay-only PR (ticket QVAC-20556) to get an empirical AWS Device Farm signal on whether the latest speech stack drives Parakeet on Android GPUs (Pixel 9 / Mali + S25 Ultra / Adreno 830). This is the inverse of the CPU-only workaround in #2525 — please don't merge over it.
Add the
verifiedlabel to fire the device-farm leg.What this changes
packages/transcription-parakeet/:ParakeetModel::load— remove the#ifdef __ANDROID__guard that forceduseGPU=false(kept then_gpu_layerslogic + the GPU-init→CPU fallback warning).CMakeLists.txt— widen the Android backend-staging glob fromlibqvac-speech-ggml-cpu-*.sotolibqvac-speech-ggml-*.soso thevulkan/openclMODULE libs ship in the prebuild (reverses the[0.7.2]CPU-only packaging); refresh the now-stale "intentionally CPU-only" comments.gpu-smoke.test.js— drop the four Android early-pass skips so the strictassertGpuBackend(backendDevice=1,backendIdVulkan/OpenCL) runs on device.ggml-speech@44fd4817(speech HEAD) +parakeet-cpp@ed749556(whisper.cppmaster), wired viaoverlay-portsinvcpkg-configuration.json. Registry baseline and registryversion>=pins are unchanged — the registry PR is deferred until the device-farm result is understood.vcpkg.json— bumpparakeet-cppversion>=to the overlay version-date.Local device finding (Adreno 740 / iQOO 11, TDT q4_0)
Run directly against this branch's prebuild on a physically-attached Adreno 740:
useGPU=false)ggml_backend_opencl_graph_compute: op not supported joint.token_argmax (ARGMAX)→GGML_ASSERTSo on the Adreno the engine picks OpenCL, whose backend lacks
ARGMAXand aborts in graph-compute instead of falling back to CPU. The Vulkan path (the oneggml-speech@8bf760f4reported byte-identical on this exact device) is not what the engine selects, and even when forced it no longer reproduces the byte-identical result on the current44fd4817/ed749556stack.Expectation for the device-farm run: the Adreno (S25) leg likely hits the same OpenCL ARGMAX abort (which can SIGABRT the Bare worklet and take down subsequent tests, cf. #2525); the Mali (Pixel 9) leg exercises the Vulkan path.
Note (pre-existing, out of scope)
While bringing this up on a local device, found that the addon's
BACKENDS_SUBDIRcompile-definition isPRIVATEon the bare-module target butParakeetModel.cppcompiles intoparakeet_model_core, so the subdir isn't appended to a host-provided defaultbackendsDir. The device-farm/APK passes an explicit flatnativeLibraryDir, so CI is unaffected — but a host relying on the__dirname/prebuildsdefault would not find the backend.so. Filed mentally as a follow-up; not touched here.Refs
ggml-speech44fd4817(qvac-ext-ggml@speech HEAD)parakeet-cpped749556(qvac-ext-lib-whisper.cpp@master HEAD)