Tags: intel/sycl-tla
Tags
Fix sub-byte pointer arithmetic and zero buffer allocation in grouped… … gemm (#790) For sub-byte types (uint4_t/int4_t), sizeof(T)=1 but packed storage uses sizeof_bits/8=0.5 bytes per element. Plain pointer arithmetic (base+offset) over-advances, causing out-of-bounds access for group>0. Add packed_ptr() helper to compute correct byte offsets. Also fix zero buffer under-allocation when scale_k < zero_elements_packed_along_k by using max(zero_elements_packed_along_k, scale_k). --------- Co-authored-by: Jacky, Deng <jacky.deng@intel.com>
Fix for python EVT BMG tests (#762) ## Description <!-- What does this PR do? --> Fixes following issues - Detection of B60, B70 hosts as BMG - Fix for compile command causing test failure. Compile command now uses spirv64_gen and sets devices correctly for BMG ``` Error Message: icpx: error: cannot deduce implicit triple value for '-Xspirv-translator', specify triple using '-Xspirv-translator=<triple>' icpx: error: cannot deduce implicit triple value for '-Xspirv-translator', specify triple using '-Xspirv-translator=<triple>' ``` ## Type - [x] Bug - [ ] Feature - [ ] Performance - [ ] Refactor ## Testing - [ ] Tests pass - [ ] Xe12 - [ ] Xe20 ### Testing on G31 (G21 test covered by CI) ``` python3 test/python/cutlass/evt/run_xe_evt_tests.py -j all ====================================================================== Test Report Summary ====================================================================== Suite: all Total tests run: 65 Passed: 59 Failed: 0 Errors: 0 Skipped: 6 ====================================================================== Test suite 'all' passed! ``` ## Performance | Metric | Before | After | |--------|--------|-------| | | | | ## References Fixes # ## Checklist - [ ] Copyright - [ ] Co-pilot Review - [ ] Deprecated APIs not used --------- Co-authored-by: Vance, Antony <antony.vance@intel.com>