Segfault in `dfpt_nsteltwf` (strain DFPT post-SCF) when `npband > 1`: array slice mixes local and global band index


## Summary

In `src/72_response/m_dfpt_scfcv.F90`, subroutine `dfpt_nsteltwf` (the
non-self-consistent elastic-tensor accumulator that runs after a strain
SCF cycle converges) builds two wave-function buffers with an array
slice whose lower bound uses the *local* band index `iband_me` but whose
upper bound uses the *global* band index `iband`.  When MPI band
parallelism is active (`npband > 1`) the two indices differ, the slice
becomes wider than the destination buffer, and ABINIT segfaults inside
this routine immediately after the first strain perturbation finishes
its SCF cycle.

This bug is present in v9.8.2 (the version we hit it on) and in the
current `master` branch (v10.6.7, file content as of
[`master/src/72_response/m_dfpt_scfcv.F90`][master-link] lines 2774-2775).

[master-link]: https://github.com/abinit/abinit/blob/master/src/72_response/m_dfpt_scfcv.F90

## Buggy lines

`src/72_response/m_dfpt_scfcv.F90` (master, v10.6.7), around line 2774:

```fortran
iband_me = 0
do iband=1,nband_k
  if(mpi_enreg%proc_distrb(ikpt, iband, isppol) /= mpi_enreg%me_kpt) then
    cycle
  end if
  iband_me = iband_me + 1
!  Get ground-state and first-order wavefunctions
  cwave0(:,:)=cg(:,1+(iband_me-1)*npw_k*nspinor+icg:iband*npw_k*nspinor+icg)
  cwavef(:,:)=cg1(:,1+(iband_me-1)*npw1_k*nspinor+icg1:iband*npw1_k*nspinor+icg1)
  ...
```

The destination arrays were allocated with width `npw_k*nspinor`
(resp. `npw1_k*nspinor`):

```fortran
ABI_MALLOC(cwave0,(2,npw_k*nspinor))
ABI_MALLOC(cwavef,(2,npw1_k*nspinor))
```

For the slice width to match the destination width we need
`iband == iband_me`, which is only true when `npband == 1` (each rank
sees every band of its k-point).

When `npband > 1`, `iband_me` is the position in the local band list
while `iband` is the global band number, so the slice reads
`(iband - iband_me + 1) * npw_k * nspinor` columns of `cg` instead of
one column.  For a typical run with `nproc=32, nkpt=2`, ABINIT
auto-selects `npband=16` and the first kept band on a rank has
`iband_me=1, iband=16`, so the slice reads 16x the buffer width.  This
falls off the end of the allocated `cg` array on most ranks and crashes.

The same module already uses the local index consistently elsewhere
(line 4219-4220 in master) for a similar loop:

```fortran
do ipw=1,npw_k*nspinor
  cwave0(1,ipw)=cg(1,ipw+(iband_me-1)*npw_k*nspinor+icg)
  cwave0(2,ipw)=cg(2,ipw+(iband_me-1)*npw_k*nspinor+icg)
end do
```

confirming that `iband_me` is the intended index.

## Proposed fix (2 lines)

```diff
-  cwave0(:,:)=cg(:,1+(iband_me-1)*npw_k*nspinor+icg:iband*npw_k*nspinor+icg)
-  cwavef(:,:)=cg1(:,1+(iband_me-1)*npw1_k*nspinor+icg1:iband*npw1_k*nspinor+icg1)
+  cwave0(:,:)=cg(:,1+(iband_me-1)*npw_k*nspinor+icg:iband_me*npw_k*nspinor+icg)
+  cwavef(:,:)=cg1(:,1+(iband_me-1)*npw1_k*nspinor+icg1:iband_me*npw1_k*nspinor+icg1)
```

I am happy to open a PR if that is helpful; just say the word.  Please
let me know whether the active development branch on gitlab also wants
the fix mirrored.

## Reproducer

System: 164-atom MOF, orthorhombic, sg=19 (`P 2_1 2_1 2_1`), 760 valence
electrons, `nband=400`, `ecut=50 Ha`, `ngkpt=3 2 1`, PseudoDojo NC SR
v0.4 PBE pseudos.  The bug is independent of the cell -- any strain
DFPT run on >~30 ranks with `paral_kgb=0` reproduces it.

Minimum reproducer is a strain-only DFPT (DS4) reading a pre-existing
GS WFK + DDK files via `irdwfk=1 irdddk=1`:

```text
ndtset 1
jdtset 4

irdwfk 1
irdddk 1

paral_kgb 0
paral_rf  0
npband    1     ! IGNORED by ABINIT in paral_kgb=0 -- see "Workaround" below
npfft     1
bandpp    1
nband     400

rfstrs4   3
rfdir4    1 1 1
nqpt4     1
qpt4      0 0 0
kptopt4   2
tolvrs4   1.0d-8
nstep4    50
```

Launched with `mpirun -np 32 abinit dma-1_strain.abi > log`.

## Actual behaviour

The first strain perturbation (`ipert=natom+3, idir=1`, uniaxial xx)
converges normally in ~20 SCF iterations.  When ABINIT then proceeds to
the non-SC elastic step it crashes:

```
Backtrace:
#3  0x846228 in dfpt_nsteltwf
        at src/72_response/m_dfpt_scfcv.F90:2882
#4  0x846228 in dfpt_nselt
        at src/72_response/m_dfpt_scfcv.F90:2581
#5  0x846228 in __m_dfpt_scfcv_MOD_dfpt_scfcv
        at src/72_response/m_dfpt_scfcv.F90:1365
#6  0x512da1 in __m_dfpt_loopert_MOD_dfpt_looppert
        at src/95_drive/m_dfpt_looppert.F90:1902
...
mpirun noticed that process rank 27 with PID 0 on node n1 exited on
signal 11 (Segmentation fault).
```

The crash is at line 2882 in v9.8.2 (line 2774 in master) -- the array
slice with `iband` as the upper bound.

ABINIT's own log right above the crash shows the parallel decomposition
that triggered it:

```
-   mpi_nproc: 32, omp_nthreads: 1
 Present parallel dimensions: nkpt= 2 nsppol 1 nband per processor= 25 npband= 16
```

i.e. the user-supplied `npband=1` was overridden because `paral_kgb=0`.

## Workarounds (none satisfactory)

1. **Force `npband=1` via input.**  `paral_kgb=0` (the only mode
   supported by DFPT response calculations) silently ignores the
   `npband` input variable.  We confirmed this both by reading the log
   above and by the fact that setting `paral_rf 0 npband 1 npfft 1` in
   the input had no effect.

2. **Set `nproc == nkpt`** (i.e. NP=2 for this system).  This is the
   only way to force `npband=1` from the user side, but it makes the
   per-iter wall ~16x slower (no band parallelism).  For our 164-atom
   MOF that turns a 24 h job into a ~70 h job which no longer fits the
   batch slot.

3. **Use `kptopt 3`** (full BZ, no symmetry reduction) to increase
   `nkpt` from 2 to 6.  Same issue: choosing `nproc=6` works, anything
   higher re-introduces band parallelism.  Plus the 3x extra k-points
   triples the compute cost.

4. **Patch the source as above** -- what we ultimately did locally.
   Both DS3 (E-field) and DS4 (strain) converge cleanly afterwards.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Segfault in `dfpt_nsteltwf` (strain DFPT post-SCF) when `npband > 1`: array slice mixes local and global band index #96

Summary

Buggy lines

Proposed fix (2 lines)

Reproducer

Actual behaviour

Workarounds (none satisfactory)

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Segfault in dfpt_nsteltwf (strain DFPT post-SCF) when npband > 1: array slice mixes local and global band index #96

Description

Summary

Buggy lines

Proposed fix (2 lines)

Reproducer

Actual behaviour

Workarounds (none satisfactory)

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

Segfault in `dfpt_nsteltwf` (strain DFPT post-SCF) when `npband > 1`: array slice mixes local and global band index #96