pbc/df: chunk kpt-pairs in CCGDFBuilder.outcore_auxe2 to avoid OOM#3257
Open
gauravharsha wants to merge 2 commits into
Open
pbc/df: chunk kpt-pairs in CCGDFBuilder.outcore_auxe2 to avoid OOM#3257gauravharsha wants to merge 2 commits into
gauravharsha wants to merge 2 commits into
Conversation
The CCDF int3c kernel was invoked for all nkpts^2 kpt-pairs at once, producing an output buffer of (nkpts_ij, max_buflen, nauxc) doubles (R + I). For a 6x6x6 k-mesh with j_only=False this is 46656 pairs and the buffer reaches hundreds of GB, causing OOM even when the formula at line 295 honestly returns buflen ~ 1 (the AO-shell granularity prevents shrinking further, and the "memory usage may be N times over max_memory" warning then prints but the loop allocates anyway). Split kikj_idx into chunks sized so each chunk's int3c output fits in max_memory, and rebuild gen_int3c_kernel per chunk with reindex_k restricted to that chunk. The shell-block granularity (sh_ranges) is kept fixed across chunks so fswap row writes remain consistent, and the pre-allocated fswap layout (indexed by global kpt-pair index) is unchanged so downstream readers see no difference. The merge_dd path that pairs (ij_idx, ji_idx) within a kk_adapted group requires both indices to live in the same chunk; that path now groups whole kk_adapted groups together. A local pair_pos map translates global kpt-pair indices to positions in the chunk's outR/outI arrays. When kpts_chunk >= nkpts_ij (small problems or ample memory) nchunks == 1 and the execution path is identical to before.
The chunk loop computes reindex_k_chunk inline per chunk, so the top-level reindex_k variable became dead. Removes ruff F841.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes #3111.
_CCGDFBuilder.outcore_auxe2allocates the int3c buffer for allnkpts**2kpt-pairs at once:(nkpts_ij, max_buflen, nauxc) * 2doubles. For a 6×6×6 k-mesh j_only=False that's 46656 pairs → hundreds of GB.The existing memory check at
gdf_builder.py:288doesn't help —buflenclamps to 1 oncenkpts_ijis large,_guess_shell_rangesrounds up to AO-shell granularity (max_buflencan be O(10³)), and the existing "may be N times over max_memory" warning only logs.Fix
split
kikj_idxinto chunks sized so each chunk's int3c output fits inmax_memory, callinggen_int3c_kernelper chunk with a restrictedreindex_k. Wholekk_adaptedgroups are kept together so themerge_dd(ij_idx, ji_idx)pairing stays in one chunk; a localpair_posmap handles index translation. Thefswaplayout is unchanged so downstream readers see no difference.Also: define
nauxc = self.fused_cell.naoand use it inbuflen(master usesnaux, which underestimates per-pair workspace when CCDF's compensating basis is non-trivial).Compatibility
when
kpts_chunk >= nkpts_ij,nchunks == 1and the loop body is functionally identical to master.Verified
all
test_gdf_buildercases pass; forced multi-chunk runs (up to 14 chunks on 27 kpts / 729 pairs) match a single-chunk reference to roundoff; the original failing 6×6×6 CCDF build completes.