Tags: ROCm/FlyDSL
Tags
[Docs] Onboarding notebooks (2/n): layout algebra (#665) * [Docs] Onboarding notebooks (2/n): layout algebra Picks up after #635 (1/n, expr foundation) with the layout half of the series: - 04_layout: layout = (shape, stride) as a coord->index function; crd2idx (row- vs column-major), logical_divide tiling, and the two tensor kinds (memref vs coord, the latter via the identity layout). - 05_tiled_copy_and_swizzle: thread-value layouts, make_tiled_copy + partitioning into per-thread views, the LDS bank swizzle, and an inline print_typst -> SVG showcase (typst optional; falls back to source). - 01_numeric_types: split DSL programming from the MLIR mapping — the type->MLIR detail moves into an optional section that explains and demonstrates it (.ir_type), instead of an unexplained column. Fixes the unsigned-int row: MLIR integers are signless (Uint32 and Int32 are both i32; signedness is carried by the op). - README: 04/05 in the index + a layout cheat-sheet; note the typst dep. Run-verified on MI350X (gfx950); outputs committed cleared. * review fixes (#665): JIT-cache robustness, pedagogy, arch-labeling Self-review + a warm-cache re-run surfaced one real bug and several clarity/labeling fixes: - JIT-cache trap (the important one): the layout cells print at *trace* time and print_typst writes its .typ files at trace time, so a warm JIT disk cache skipped the re-trace and the output vanished on a re-run (04 silently blanked; 05 errored on the missing .typ). 04/05 and 01 §6 now set FLYDSL_RUNTIME_ENABLE_CACHE=0 before importing flydsl so re-runs always re-trace. Verified on a warm cache. - 04: sharpen the coord-tensor explanation (crd2idx through an identity layout returns the coordinate tuple -> that is why it has no element type); note the coordinate is fixed and the stride decides the index. - 05: decode the thread/value layouts explicitly; reframe coord_swizzle as a deliberate internal (not "missing"); disclose the device path is AMD CDNA (rocdl.*) while the layout algebra stays portable; fix a file-handle leak in render_typst. - 01 §6: lead with the concrete use case (reading IR dumps / type errors). - README: label 05 as the AMD CDNA path; add the trace-time/cache gotcha. Re-verified on MI350X (gfx950); outputs cleared. * docs: clarity pass on the layout notebooks (beginner walkthrough) Simulated a junior engineer fresh from 00-03 running 04/05/01 and tightened the spots that tripped a first-time reader. Wording/markdown only — all cells still run clean on gfx950: - 04: gloss "memref" and "identity layout"; de-jargon "cosize"; explain the logical_divide result (8 tiles x 8) and how to read the coord-tensor repr (base coord + the 1E0/1E1 basis stride). - 05: fix the thread-layout orientation ((4,1) is 4 threads down a column, not across a row); read the TV-layout stride; note the opaque per-thread view repr is just a per-thread slice of the same buffer; gloss the swizzle (mask, base, shift) knobs; say what the diagram colours mean. - 01 §6: define "signless" plainly (no sign bit in the type; the op carries it); drop layout jargon from the vector<4xf32> closing. - README: nbconvert runs against your active env (flydsl must import there); list torch + the build step; note 00-03 come first. Re-verified on MI350X (gfx950); outputs cleared.
PreviousNext