Skip to content

Tags: gogpu/gg

Tags

v0.28.3

Toggle v0.28.3's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
chore: update wgpu v0.16.2, gogpu v0.19.0 in examples (v0.28.3)

v0.28.2

Toggle v0.28.2's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
perf: GPU buffer optimization, dependency updates v0.28.2

* perf: persistent GPU buffers, fence-free surface submit, vertex staging reuse

- Persistent buffer pool: SDF/convex vertex buffers, uniform buffers, and
  bind groups survive across frames with grow-only reallocation (2x headroom)
- Fence-free surface submit: surface mode submits without fence wait, previous
  frame's command buffer freed at start of next frame (VSync guarantees
  GPU completion). Readback mode still uses fence.
- Vertex staging reuse: CPU-side byte slices for SDF and convex vertex data
  reused across frames with grow-only strategy to reduce GC pressure
- Stencil buffers use pool-based approach for multi-path reuse
- GPU queue drain on shutdown and mode switch via no-op command buffer
- Framebuffer cache invalidation before texture view destruction (wgpu fix)
- CloseAccelerator in example OnClose handler with correct shutdown order

Reduces per-frame GPU overhead from ~14 buffer create/destroy cycles to
zero in steady-state. Eliminates 0.5-2ms/frame fence wait latency.

* docs: update CHANGELOG/ROADMAP for v0.28.2

* chore: update dependencies wgpu v0.16.1, gogpu v0.18.2 in examples

v0.28.1

Toggle v0.28.1's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
fix: Porter-Duff compositing, docs for v0.28.1

* feat: Porter-Duff compositing for GPU readback, event-driven example

- Replace convertBGRAToRGBA with compositeBGRAOverRGBA (Porter-Duff over)
- Transparent GPU pixels now preserve existing pixmap content
- Fixes content loss when FlushGPU called multiple times per frame
- Update example to event-driven rendering with animation token
- Add Space key pause/resume for three-state model demo

* docs: update documentation and example deps for v0.28.1 release

- CHANGELOG: add v0.28.1 section (Porter-Duff fix, event-driven example)
- README: mention event-driven rendering in GPU section
- ROADMAP: update current state to v0.28.1
- ARCHITECTURE: ggcanvas Draw() helper, deferred destruction, compositing
- Example: gogpu v0.18.0 → v0.18.1

v0.28.0

Toggle v0.28.0's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
feat: three-tier GPU render pipeline, structured logging, dependency …

…updates

* chore(examples): update dependencies to gg v0.27.1

* wip: time-based animation, DX12 backend for gogpu_integration example

* fix(ggcanvas): remove GPU SDF from example, add resize logging

- Remove gpu blank import from gogpu_integration example (GPU SDF
  compute dispatch caused 16+ synchronous fence waits per frame on DX12)
- Add resize and periodic frame logging for diagnostics
- Use time-based animation for consistent rotation speed

* wip: incremental canvas updates — deferred texture destruction, Draw() helper, remove auto-dirty

* fix(raster): add X-bounds clipping to analytic AA coverage computation

Edges extending beyond the canvas in X (e.g., circles partially off-screen
during window resize) caused incorrect winding accumulation, producing
full-width horizontal color bands.

Root cause: computeSegmentCoverage() processed all pixels [0, width) but
never pre-accumulated winding for edges at X < 0, so their contribution
was lost. Pixels inside the circle body showed zero coverage.

Fix: clip edges in X before the per-pixel loop:
- Entirely off-screen right: skip (no visible contribution)
- Entirely off-screen left: apply full winding to all visible pixels
- Partially off-screen left: pre-accumulate winding for off-screen portion
- Optimize loop to only process pixels near the edge's X range

Added regression tests covering 8 scenarios: fully on-screen, partially
off-screen (left/right/both), fully off-screen, and the exact narrow-window
scenario that triggered the bug.

* fix(ggcanvas): deferred texture destruction for DX12 resize stability

DX12 shader-visible sampler heap has a hard 2048-slot limit. Leaking
textures during resize exhausts it, causing CreateBindGroup to fail and
textures to disappear. Fix: defer old texture destruction until after
WriteTexture completes (GPU idle), then destroy safely in RenderToEx.

Also adds Canvas.Draw() helper, removes debug alpha pixel logging,
and cleans up example comments.

* feat(gpu): add fan tessellator for stencil-then-cover path rendering

O(n) triangle fan tessellation from path elements. Cubic and quadratic
Bezier curves adaptively flattened via de Casteljau subdivision (0.25px
tolerance). Supports multiple contours, AABB tracking, and cover quad
generation. Designed for reuse across frames via Reset().

GG-GPU-008

* feat(gpu): add StencilRenderer with MSAA + stencil texture management

Manages 4x MSAA color, Depth24PlusStencil8 stencil, and 1x resolve
textures for stencil-then-cover rendering. Textures auto-resize on
dimension change with proper cleanup. RenderPassDescriptor configures
transparent clear, stencil clear to 0, and MSAA resolve.

GG-GPU-005

* feat(gpu): add stencil fill + cover render pipelines and WGSL shaders

Two-pass pipeline for stencil-then-cover:
- Stencil fill: vertex-only pass, IncrWrap/DecrWrap stencil ops,
  WriteMask=0 (no color output), transforms pixel coords to NDC
- Cover: vertex+fragment pass, NotEqual stencil test, passOp=Zero,
  premultiplied alpha blending, solid color uniform

Both use MSAA 4x, Depth24PlusStencil8, and share uniform layout.

GG-GPU-006 + GG-GPU-007

* feat(gpu): integrate stencil-then-cover into GPUAccelerator.FillPath

RenderPath() executes the full two-pass render pipeline:
- Tessellate path → fan triangles, upload vertex/uniform buffers
- Pass 1: stencil fill (IncrWrap/DecrWrap winding numbers)
- Pass 2: cover quad (NotEqual test, premultiplied alpha blend)
- MSAA resolve → staging buffer → CPU readback (BGRA→RGBA)

SDFAccelerator.FillPath now delegates general paths to
StencilRenderer instead of falling back to CPU. AccelFill
capability flag added.

GG-GPU-009

* feat(gpu): add EvenOdd fill rule support for stencil-then-cover (GG-GPU-010)

Add evenOddStencilPipeline using StencilOperationInvert for both faces.
RenderPath now accepts FillRule parameter to select pipeline variant.
Cover pipeline is shared between NonZero and EvenOdd fill rules.

* feat(gpu): add SDF render pipeline and convexity detection

- Port SDF rendering from compute shader to vertex+fragment pipeline
- New sdf_render.go with MSAA+resolve texture management
- sdf_render.wgsl shader with arithmetic-only SDF (avoids naga bugs)
- Supports circles, ellipses, rects, rounded rects (filled+stroked)
- Convexity detection O(n) algorithm with winding direction
- 56 tests total (30 SDF render + 26 convexity)

* test(raster): increase coverage from 42.9% to 90.8%

- 118 tests across 8 new test files
- Cover fixed-point arithmetic, edge building, curve AET,
  alpha runs, analytic filler, path geometry
- 4 benchmarks for performance regression tracking

* refactor(gpu): delete compute pipeline legacy, keep render pipeline only

Remove ~2600 LOC of legacy compute shader SDF pipeline:
- Delete sdf_pipeline_test.go, sdf_batch/circle/rrect/test_const.wgsl
- Clean sdf_gpu.go: remove compute dispatch, batch shapes, conditional switching
- SDFAccelerator now delegates exclusively to SDFRenderPipeline
- Render pipeline is architecturally correct: shared render pass,
  hardware MSAA, direct surface rendering support

* feat(gpu): unified render pass — all tiers in single BeginRenderPass

- New GPURenderSession: shared MSAA + stencil + resolve textures
- SDF shapes + stencil-then-cover paths in one render pass
- Single submit + fence wait + readback per frame
- Shared textureSet eliminates ~90 lines of duplicated texture code
- SDFRenderPipeline gains pipelineWithStencil for unified pass
- StencilRenderer refactored to use textureSet
- SDFAccelerator dispatches both SDF and stencil paths via session
- 15 new render session tests

* feat(gpu): add convex fast-path renderer and direct surface rendering

- ConvexRenderer: Tier 2a single-draw convex polygon rendering with per-edge AA
  - Fan from centroid + 0.5px AA fringe strips, coverage ramp 1.0→0.0
  - pipelineWithStencil for unified render pass (stencil ignored: Always/Keep)
  - convex.wgsl: vertex+fragment shader with premultiplied alpha blending
  - extractConvexPolygon: auto-detect convex line-only closed paths

- Direct surface rendering mode (RenderModeSurface):
  - SetSurfaceTarget() on GPURenderSession: MSAA resolves to surface view
  - SurfaceTargetAware interface + SetAcceleratorSurfaceTarget() in gg root
  - ensureSurfaceTextures: skip resolve texture (surface IS the resolve target)
  - encodeSubmitSurface: zero-copy path, no readback for windowed rendering

- Unified session integration:
  - RenderFrame accepts convexCommands alongside SDF shapes and stencil paths
  - buildConvexResources: per-frame vertex/uniform/bind group creation
  - All three tiers render in single BeginRenderPass with pipeline switching

* fix: resolve linter warnings in raster and ggcanvas packages

- Extract offscreenLeftWinding helper to reduce cyclomatic complexity (21→18)
- Add nolint:funlen for computeSegmentCoverage (performance-critical rasterization)
- Extract deferTextureDestruction and destroyTexture to flatten nestif in canvas.go
- Extract promotePendingTexture to flatten nestif in render.go
- Simplify Close() using shared destroyTexture helper

* refactor(examples): update gpu and gogpu_integration for three-tier rendering

- examples/gpu: rewrite to use import _ gg/gpu blank import API
  instead of deleted gpu.NewBackend() — demonstrates all three tiers
  (SDF circles/rrects, convex polygons, stencil-then-cover paths)
- examples/gogpu_integration: update for three-tier animated demo
  with circle ring, rotating polygons, star, and curved shapes
- Remove hardcoded GraphicsAPI selection, use auto-detection
- Use canvas.Draw() helper for atomic draw+dirty marking

* fix(gpu): remove stray semicolons from WGSL shader struct declarations

* feat(examples): enable GPU accelerator in gogpu_integration example

Adds blank import of gg/gpu package to register the SDF GPU accelerator.
Shapes are now rendered with SDF (circles), MSAA 4x (polygons), and
stencil+cover (complex paths) instead of CPU software rasterizer.

* fix(gpu): add texture layout transition before CopyTextureToBuffer

Insert explicit TransitionTextures barrier (RenderAttachment → CopySrc)
between render pass end and readback copy in all three paths:
render_session, sdf_render, stencil_renderer. Fixes Vulkan validation
error about COLOR_ATTACHMENT_OPTIMAL vs TRANSFER_SRC_OPTIMAL mismatch.

* fix: add CloseAccelerator() and GPU flush on Context.Close()

- Add CloseAccelerator() to properly destroy GPU resources on shutdown
- Context.Close() now flushes GPU accelerator before clearing state
- Prevents VkImage leak validation errors on device destruction

* feat: add RenderDirect() for zero-copy GPU surface rendering (GG-GPU-019)

- Add Canvas.RenderDirect() that renders directly to gogpu surface view
- MSAA 4x shapes resolve straight to the surface — no readback, no
  staging buffers, no texture upload
- Update gogpu_integration example to use RenderDirect
- Old RenderTo() path preserved for non-GPU fallback

* fix: keep surface target between frames, lazy GPU init

- Remove surface target reset after each RenderDirect frame to avoid
  destroying and recreating MSAA/stencil textures every frame
- Clear surface target only in Close() for proper cleanup
- Defer GPU device creation to flushLocked() (lazy init) to prevent
  standalone Vulkan device from interfering with DX12

* feat(examples): enable GPU accelerator with GLES backend

- Re-enable gpu blank import for SDF + MSAA rendering
- Use GraphicsAPIGLES for OpenGL backend testing
- Use RenderTo texture upload path (GLES lacks RenderDirect)

* feat(ggcanvas): add Draw() helper, deferred texture destruction on resize

- Add Canvas.Draw(fn) for atomic draw + mark dirty
- Defer texture destruction from Resize() to Flush() via sizeChanged flag
- Add debug frame logging for GPU pipeline development

* chore(ggcanvas): remove debug frame logging

Remove debugFrames counter and log.Printf diagnostics that printed
pixel statistics for the first 3 frames. This was development-only
instrumentation that leaked into user-visible stderr output.

* feat: add structured logging via log/slog, silent by default

Replace all hardcoded log.Printf calls with log/slog structured logging.
Libraries produce zero output by default (nopHandler). Users opt in via
gg.SetLogger(). Thread-safe via atomic.Pointer[slog.Logger].

- Debug: GPU pipeline state, buffer sizes, init details
- Info: adapter selected, accelerator initialized
- Warn: CPU fallback, resource release errors
- Propagation: SetLogger → accelerator, RegisterAccelerator → current logger
- Tests: nopHandler, Set/Get, nil, propagation, concurrency, benchmarks
- Benchmarks: 1 ns/op load, 5.5 ns/op disabled log, 0 allocs

* fix(gpu): aligned readback pitch, barrier after copy, RenderDirect example

Align BytesPerRow to 256 for CopyTextureToBuffer (DX12 requirement),
strip row padding during readback, transition resolve texture back to
RenderAttachment after copy. Update example to use RenderDirect
(zero-copy) with auto backend selection.

* docs: update public documentation for v0.28.0 release

- CHANGELOG.md: add v0.28.0 entry, fix tier numbering (2a/2b)
- README.md: fix Scene graph and Recording examples, fix RenderDirect docs
- ROADMAP.md: update current state to v0.28.0, mark v0.27.0/v0.28.0 released
- ARCHITECTURE.md: fix file listings, package paths, ggcanvas API examples

* fix: explicit TextureViewDescriptor fields for wgpu-native compatibility

All CreateTextureView calls now set Format, Dimension, Aspect, and
MipLevelCount explicitly instead of relying on zero-value defaults.
Native Go backends handle zero defaults gracefully, but wgpu-native
panics on MipLevelCount=0.

* fix: use App.OnClose() for canvas cleanup, prevent exit crash

Canvas.Close() now runs inside OnClose callback (before renderer
destruction) instead of after App.Run() returns. Fixes Vulkan
validation errors and crash on exit.

* chore: update deps — wgpu v0.16.0, naga v0.13.0, gogpu v0.18.0

* fix: resolve lint issues — funlen, revive var-naming exclusions

v0.27.1

Toggle v0.27.1's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
fix: flush GPU accelerator before text drawing

v0.27.0

Toggle v0.27.0's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
feat: SDF accelerator, GPU compute pipeline, CI nogpu build tags

- Add SDFAccelerator (CPU) and NativeSDFAccelerator (GPU via wgpu HAL)
- Shape detection: circle, ellipse, rect, rrect from Path elements
- GPU SDF compute pipeline with multi-pass dispatch (naga workaround)
- GPUAccelerator interface with opt-in registration (gpu/ blank import)
- DeviceProvider integration for gogpu GPU device sharing
- Refactor GPU dispatch in Context (eliminate dupl/nestif)
- Add //go:build !nogpu to all internal/gpu files for clean CI
- Update deps: wgpu v0.15.0, naga v0.12.0, gpucontext v0.9.0

v0.26.1

Toggle v0.26.1's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
chore: update naga v0.11.0, wgpu v0.13.2, gogpu v0.15.7 (#92)

v0.26.0

Toggle v0.26.0's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
docs: update ARCHITECTURE.md for v0.26.0 (#90)

Reflect new architecture: CPU raster as core, GPU as optional
accelerator. Update package structure diagram (backend/ removed,
internal/raster + internal/native), add GPUAccelerator interface
docs, add Vello tile rasterizer section.

v0.25.1

Toggle v0.25.1's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
chore: update wgpu to v0.13.1 (#89)

* refactor: extract raster/ core, remove backend/rust/ and internal/raster/

ARCH-010: Remove backend/rust/ (Rust FFI backend, 5 files)
ARCH-011: Extract raster/ package from backend/native/ (10 core files)
ARCH-014: Clean go.mod (remove go-webgpu/webgpu, go-webgpu/goffi)
ARCH-015: Remove legacy internal/raster/ (supersampled AA, 14 files)

- CPU rasterization types (EdgeBuilder, AlphaRuns, CurveAwareAET,
  fixed-point math, curve edges) extracted to raster/ package
- backend/native/ imports raster/ directly (no aliases)
- scene_bridge.go bridges scene.Path to raster.PathLike interface
- Tests moved to appropriate packages (raster/ for internals)
- software.go rewritten to use raster.AnalyticFiller directly
- Backend registry simplified (Native > Software priority)

* refactor: move raster/, cache/, gpucore/ to internal/

Move implementation-detail packages behind internal/ boundary:
- raster/ → internal/raster/ (CPU rasterization types)
- cache/ → internal/cache/ (LRU cache utility)
- gpucore/ → internal/gpucore/ (GPU resource types)

Only user-facing packages remain public: scene/, surface/,
text/, render/, recording/, integration/ggcanvas, backend/.

* feat: add GPUAccelerator interface with transparent CPU fallback

- GPUAccelerator interface for optional GPU acceleration
- RegisterAccelerator() with Init/Close lifecycle management
- ErrFallbackToCPU sentinel error for transparent fallback
- AcceleratedOp bitfield for capability checking
- Benchmark: ~17ns/op, 0 allocs for nil check path

* refactor: remove backend/ package, move native/ to internal/

Delete backend/ abstraction layer entirely:
- Remove RenderBackend interface and registry pattern
- Remove SoftwareBackend wrapper (redundant with SoftwareRenderer)
- Move backend/native/ → internal/native/ (64 files)
- Clean up NativeBackend: remove backend.RenderBackend dependency
- Update examples/gpu to use internal/native directly
- Update all imports across tests and examples

Pre-v1.0 with 0 external users — no backward compatibility needed.
CPU rendering via internal/raster, GPU via GPUAccelerator interface.

* docs: update README, CHANGELOG, ROADMAP for v0.26.0

- README: replace internal import example with GPUAccelerator pattern,
  update architecture diagram (CPU raster core + optional GPU),
  update package table
- CHANGELOG: add v0.26.0 entry (GPUAccelerator, architecture refactor,
  backend/ removal, dependency cleanup)
- ROADMAP: update current state to v0.26.0, mark v0.25.0/v0.26.0
  as released, add v0.27.0 planned section

* chore: update wgpu to v0.13.1

Fixes render pass InitialLayout for LoadOpLoad — correct ClearColor
preservation between consecutive render passes.

v0.25.0

Toggle v0.25.0's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
feat(vello): add tile-based analytic anti-aliasing rasterizer

Implement Vello-style 16x16 tile rasterizer with analytic AA coverage.
Port of vello_shaders CPU fine rasterizer (fine.rs) to Go.

Key features:
- DDA-based segment binning into 16x16 tiles
- Analytic trapezoidal area coverage per pixel
- yEdge mechanism for correct winding/backdrop
- VelloLine float pipeline bypassing fixed-point quantization
- NonZero and EvenOdd fill rules

Bottom artifact at circle extrema improved from alpha=191 to alpha=248
by using original float32 coordinates instead of FDot6/FDot16 round-trip.

Includes golden test infrastructure, visual tests, and coordinate
validation tests.