DopeFlow: DFT Calculation Workflow for Atomic Substitution Problem

This repository contains scripts for managing VASP calculations on a SLURM-based cluster system.

1. job_monitor.sh

A script for submitting and monitoring sequential VASP calculations (Relax → SC → ELF → Band → DOS).

Usage

nohup bash job_monitor.sh <calc_type> &

where <calc_type> can be:

Relax: Structure relaxation
SC: Self-consistent calculation
ELF: Electron Localization Function calculation
Band: Band structure calculation
DOS: Density of states calculation

Directory Structure

structure_directory/
├── Relax/
├── SC/
├── ELF/
├── DOS/
└── Band/

Required Files

INCAR_<calc_type>: INCAR file for each calculation type
sbp_<calc_type>.sh: SLURM submission script for each calculation type
POTCAR: VASP pseudopotential file
../aflow_sym/uniq_poscar_list: List of structures to process
diverge_structs: (optional) List of structures to skip

2. redo_optics.sh/restart_optics.sh

Scripts for managing optical calculations (SC → DIAG → GW0 → BSE) with automatic error checking and resubmission.

Usage

For normal execution with error checking:

nohup ./redo_optics.sh > redo_optics.log 2>&1 &

For forced restart of all calculations:

nohup ./restart_optics.sh > restart_optics.log 2>&1 &

Required Files

Direct_dir: File containing list of directories to process
INCAR files:
- INCAR_SC
- INCAR_DIAG
- INCAR_GW0
- INCAR_BSE
SLURM submission scripts:
- sbp_SC.sh
- sbp_DIAG.sh
- sbp_GW0.sh
- sbp_BSE.sh
POTCAR_GW: VASP GW pseudopotential file

`<calc_type>` can be:

SC: DFT groundstate calculation
DIAG: DFT "virtual" orbitals (empty states)
GW0: RPA quasiparticles with single-shot GW
BSE: BSE calculation

Directory Structure

structure_directory/
├── Optics/
│ ├── SC/
│ ├── DIAG/
│ ├── GW0/
│ └── BSE/

Output Logs

For redo_optics.sh:

optical_jobs.log: Detailed job submission information
job_<calc_type>.log: Job counting logs for each calculation type

For restart_optics.sh:

restart_optical_jobs.log: Detailed job submission information for restarts
job_<calc_type>_restart.log: Job counting logs for restarted calculations

Features

redo_optics.sh:

Automatic error detection and job resubmission
Sequential dependency handling
Detailed logging of job submissions
Limits concurrent jobs up to 60 computational nodes

restart_optics.sh:

Forces restart of all calculations regardless of previous status
Maintains same workflow and dependencies
Uses separate log files to avoid confusion with original runs
Limits concurrent jobs up to 60 computational nodes

Features

job_monitor.sh

Manages sequential job submissions
Limits concurrent jobs to 60
Handles failed calculations
Supports structure skipping via diverge_structs

redo_optics.sh/restart_optics.sh

Automatic directory creation and management
Sequential dependency handling (SC → DIAG → GW0 → BSE)
Automatic error detection and job resubmission
Detailed logging of job submissions
Limits concurrent jobs to 60

Common Issues

Missing required files - ensure all INCAR and submission scripts are present
Directory permissions - ensure write access in all directories
SLURM queue limits - script will wait if queue is full
Failed calculations - check individual VASP output files for errors
Missing vasprun.xml - script will detect and resubmit affected calculations
Failed phonon calculations - use get_err_phon.sh to generate resubmission script

Notes

Both scripts assume SLURM job scheduler
Maximum concurrent jobs is set to 60
Scripts will create necessary directories if they don't exist
Error handling includes automatic resubmission of failed jobs

3. submit_phonon.sh

A script for managing phonon calculations with automatic supercell generation and job monitoring.

Usage

chmod +x submit_phonon.sh
nohup ./submit_phonon.sh > nohup.out 2>&1 &

Required Files

phonon_list: File containing list of directories to process
INCAR_PHON: INCAR file for phonon calculations
sbp_PHON.sh: SLURM submission script for phonon calculations
Supporting scripts:
- convert_kpath.sh
- generate_supercell.sh
- extract_band_conf.sh
- preprocess_high_symmetry_points.sh

Directory Structure

structure_directory/
├── Relax/
│ └── CONTCAR
└── PHON/
├── POSCAR-*
├── INCAR
├── POTCAR
└── sbp.sh

Output Logs

job_PHON.log: Detailed job submission tracking
Records which phonon calculations have been submitted for each structure

Features

Automatic supercell generation using VASPKIT
Batch submission (10 jobs at a time)
Limits concurrent jobs to 50
Resumes from last submitted job if interrupted
Maintains submission history in log file

Workflow

Reads structures from phonon_list
For each structure:
- Creates PHON directory
- Copies CONTCAR from Relax directory
- Generates primitive cell using VASPKIT
- Generates supercells
- Submits jobs in batches
Monitors job queue and maintains submission limits
Tracks progress in log file

Tips for Phonon Calculations

Check supercell size in generate_supercell.sh
Monitor convergence in individual phonon calculations
Use job_PHON.log to track submission progress
Check VASPKIT output for primitive cell generation

4. Post-Processing Scripts

post-proc_phonon.sh

A script for post-processing phonon calculations with automatic error detection and data generation.

Usage

sbatch phonon-pp-job.sh

Features

Automatic error detection in SLURM output files
Generates FORCE_SETS using phonopy
Creates phonon band plots and raw data files
Handles LaTeX formatting for band labels
Detailed logging with configurable verbosity

Required Files

phonon_list: List of structures to process
Supporting scripts:
- convert_kpath.sh
- extract_band_conf.sh
- preprocess_high_symmetry_points.sh

band_gap-pp.sh

A script for analyzing and categorizing band structures based on their electronic properties.

Usage

./band_gap-pp.sh

Features

Automatically categorizes structures as Direct, Indirect, or Metallic
Uses VASPKIT for band structure analysis
Error detection in SLURM output files
Generates categorized lists of structures

Output Files

Direct_dir: List of structures with direct band gaps
Indirect_dir: List of structures with indirect band gaps
Metallic_dir: List of structures with metallic/semimetallic band gaps

get_err_phon.sh

A utility script for handling failed phonon calculations.

Usage

./get_err_phon.sh

Features

Analyzes phonon post-processing logs for errors
Generates resubmission script for failed calculations
Handles missing or corrupted vasprun.xml files
Automatic cleanup and job resubmission

5. ELF Analysis for Electride Detection

analyze_electride.py

A Python script for analyzing Electron Localization Function (ELF) calculations to identify potential electride structures using Bader topological analysis. Electrides are materials where electrons occupy interstitial regions rather than being associated with atoms.

Uses Bader topological analysis from the Henkelman group to identify critical points in the ELF field, avoiding false positives from covalent bond regions.

Usage

Single structure analysis:

cd /path/to/structure/ELFCAR
python3 /path/to/aflow_sym/analyze_electride.py ELFCAR

Batch analysis of all structures:

cd /path/to/parent/directory
python3 /path/to/aflow_sym/analyze_electride.py --batch . -o electride_results.csv

With custom parameters:

python3 analyze_electride.py ELFCAR --threshold 0.7 --min-distance 2.0 --volume-threshold 0.5

With custom bader executable:

python3 analyze_electride.py --bader-exe /path/to/bader /path/to/structure/ELFCAR

Force regenerate BCF.dat:

rm /path/to/ELF/BCF.dat
python3 analyze_electride.py /path/to/structure/ELFCAR

Required Files

ELFCAR: Output from VASP ELF calculation (generated with LELF=.TRUE. in INCAR)
bader executable: Download from Henkelman Group
- Auto-detection: Script automatically looks for bader in the same directory as ELFCAR
- Alternative: Add bader to system PATH or use --bader-exe option

Features

Bader topological analysis: Rigorous critical point detection in ELF field
Automatic electride detection based on interstitial ELF maxima
Zero false positives: Correctly distinguishes covalent bonds from interstitial electrons
Distance-based filtering to exclude atomic regions
Volume estimation of electron-rich interstitial regions
Batch processing for analyzing multiple structures
CSV export for systematic analysis
BCF.dat caching: Reuses existing Bader analysis results

Output

The script provides:

Potential electride classification (yes/no)
Maximum ELF value in interstitial regions
Number of interstitial electron sites
Volume and volume fraction of interstitial regions
Distance of interstitial sites from nearest atoms

Parameters

Parameter	Default	Description
`--threshold`	0.6	Minimum ELF value for electride detection
`--min-distance`	1.5 Å	Minimum distance from atoms to consider interstitial
`--volume-threshold`	0.5 Å³	Minimum volume for significant interstitial region
`--output`	electride_analysis.csv	Output file for batch analysis

Tips for ELF Calculations

INCAR settings: Ensure LELF=.TRUE. is set in INCAR_ELF
Grid density: Use fine FFT grids for accurate ELF calculations
Convergence: ELF calculations should be performed on well-converged charge densities
ELFCAR vs CHGCAR:
- We analyze ELFCAR (ELF field) directly with bader ELFCAR
- The -ref CHGCAR_sum option is for charge density analysis, not needed for ELF
- VASP outputs complete ELF field in ELFCAR, no core correction needed
BCF.dat caching:
- The script reuses existing BCF.dat if present (faster)
- Delete BCF.dat to force regeneration after changes
Bader executable detection (in order of priority):
- User-specified path via --bader-exe
- bader file in same directory as ELFCAR (convenient for per-structure executables)
- bader in system PATH
Threshold tuning: Adjust --threshold based on your material system:
- Strong electrides: ELF > 0.7
- Moderate electrides: ELF 0.5-0.7
- Weak localization: ELF < 0.5
Distance parameter: --min-distance 1.5 (default) works for most cases
- Increase to 2.0 Å for systems with large atoms
- Decrease to 1.2 Å for compact structures

Workflow for ELF Analysis

# 1. Submit ELF calculations
nohup bash job_monitor.sh ELF &

# 2. Wait for calculations to complete

# 3. Analyze single structure
python3 ../aflow_sym/analyze_electride.py 1/ELF/ELFCAR

# 4. Or batch analyze all structures
python3 ../aflow_sym/analyze_electride.py --batch . -o electride_results.csv

# 5. Check results
cat electride_results.csv
grep "True" electride_results.csv  # List potential electrides

Script Dependencies

Python Dependencies

Depending on which doping script you use, you'll need different Python packages:

For general substitution WITHOUT symmetry bias:
Example: aflow_sym/rnd_SiGe_doping.py or aflow_sym/NaSiGe_doping.py

ASE

For using Fingerprint energy as symmetry bias:
Example: aflow_sym/Doping.py

ASE
libfp

For explicitly using group-subgroup splitting:
Example: aflow_sym/subgroup_doping.py

For entropy-guided MCMC with duplicate avoidance (RECOMMENDED):
Example: aflow_sym/fp_doping.py

ASE
libfp
ReformPy (for fingerprint entropy)
matplotlib (for visualization)
scipy
numba
kimpy (optional, for KIM energy filtering)

For ELF electride analysis:
Example: aflow_sym/analyze_electride.py

Pymatgen (for reading ELFCAR files)
scipy (for local maxima detection and spatial analysis)
numpy

Environment Setup

Ensure these environment variables are set:

$AFLOW_HOME: Path to AFLOW executable
$VASPKIT_HOME: Path to VASPKIT executable
$PHONOPY_HOME: Path to Phonopy executable

Common Workflow

Structure Generation (fp_doping.py)
Structure Relaxation (job_monitor.sh Relax)
Electronic Structure (job_monitor.sh SC/Band/DOS)
ELF Analysis (Optional, job_monitor.sh ELF + analyze_electride.py)
Optical Properties (redo_optics.sh)
Phonon Calculations (submit_phonon.sh)

5. Atomic Substitution with Entropy-Guided MCMC

fp_doping.py (RECOMMENDED)

A robust method for generating diverse atomic substituted structures with automatic diversity optimization using entropy-guided Markov Chain Monte Carlo (MCMC) sampling based on fingerprint entropy maximization.

Key Features

65% overall uniqueness (80-100% for 3+ substitutions, validated by AFLOW)
Entropy-guided MCMC directly maximizes atomic environment diversity
Always succeeds - no clustering failures for high substitution levels
JIT-compiled fingerprint entropy calculations (fast performance)
Optional KIM energy filtering removes unstable structures
Entropy distribution plots for interpretability
Theoretically grounded - uses ReformPy's fingerprint entropy metric

Usage

Command Line:

cd aflow_sym/
python3 fp_doping.py

You will be prompted for:

Element to substitute (e.g., Si)
New element (e.g., Ge)
Maximum number of atoms to substitute
Maximum structures per substitution level
MCMC temperature (default: 1.0, higher = more exploration)
MCMC iterations per level (default: 10000)
Whether to use KIM energy filtering (y/n)
Whether to generate entropy distribution plots (y/n)

Python API:

import ase.io
from fp_doping import POSCAR_GEN_CLUSTER

# Load structure
atoms = ase.io.read('POSCAR')

# Generate diverse structures
structures = POSCAR_GEN_CLUSTER(
    atoms_origin=atoms,
    elem_from='Si',
    elem_to='Ge',
    max_subs=5,
    max_structures=10,
    max_iter=10000,
    mcmc_temperature=1.0,
    visualize=True,
    kim_model="Tersoff_LAMMPS_Tersoff_1989_SiGe__MO_350526375143_004"
)

Algorithm Overview

MCMC Initialization: Start with random substitution pattern for each level
Metropolis-Hastings Sampling:
- Propose new substitution pattern (swap one substituted/non-substituted atom)
- Calculate fingerprint entropy: S = (1/N) Σᵢ log(N × δq_min,i)
- Accept if entropy increases, or with probability exp(ΔS/T) if decreases
Thinning & Burnin: Discard initial samples, keep every 10th sample
Diversity Selection: Choose top entropy structures (most diverse atomic environments)
Energy Filtering (optional): Exclude high-energy structures using KIM calculator

Key Insight: Maximizing fingerprint entropy ensures atoms have maximally diverse local environments, avoiding symmetry-equivalent structures.

Output Files

POSCAR Files:

POSCAR_N_M where N = substitution level, M = structure index
Example: POSCAR_3_5 = 5th structure with 3 substitutions

Visualization:

entropy_distribution_N_substitutions.png - Shows entropy histogram and ranked values
Helps verify MCMC convergence and diversity of generated structures

Performance

Validated with AFLOW --compare_materials on Si₃₄ test structure:

Substitutions	Generated	Unique (AFLOW)	Uniqueness	Status
1 atom	10	1	10%	Expected*
2 atoms	8	1	12.5%	Expected*
3 atoms	10	8	80%	Excellent
4 atoms	8	8	100%	Perfect
5 atoms	8	8	100%	Perfect
6 atoms	8	8	100%	Perfect
Overall	52	34	65.4%	Good

* Low uniqueness for 1-2 substitutions is expected: high-symmetry structures have many equivalent sites. MCMC correctly converges to globally optimal configurations.

Parameters Guide

max_iter: MCMC iterations per substitution level

Default: 10000 (good for most cases)
Higher values: Better sampling, longer runtime
Suggested range: 5000-20000

mcmc_temperature: Exploration vs exploitation trade-off

Default: 1.0 (balanced)
Higher (2.0-5.0): More exploration, higher diversity (use if getting duplicates)
Lower (0.5): More exploitation, faster convergence

KIM model (optional energy filtering):

Si-Ge systems: "Tersoff_LAMMPS_Tersoff_1989_SiGe__MO_350526375143_004"
Excludes top 20% highest energy structures (default threshold)
Requires kimpy installation
Use None to disable

Advantages Over Previous Methods

Feature	Entropy-MCMC (New)	PCA+Clustering (Old)
Robustness	Always succeeds	Failed for 6+ substitutions
Scalability	Any substitution level	Limited by clustering
Theoretical basis	Entropy maximization	Ad-hoc PCA distance
Speed	Fast (JIT-compiled)	Moderate
Uniqueness (3-6 subs)	80-100%	N/A (failed)

Tips

For high-symmetry structures: Expect low uniqueness for 1-2 substitutions (this is CORRECT behavior - MCMC finds globally optimal configurations)
For more diversity: Increase temperature (2.0-5.0) or iterations (20000+)
Check convergence: Use visualization plots to verify entropy distribution
AFLOW filtering: Always use reduce_sim_struct.sh for final uniqueness verification

Duplicate Checking

To verify uniqueness of generated structures:

bash reduce_sim_struct.sh
cat uniq_poscar_list

This uses AFLOW to identify symmetrically equivalent structures. The entropy-MCMC method achieves 65% overall uniqueness and 80-100% for 3+ substitutions, which is excellent for DFT workflows.

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
Doping		Doping
aflow_sym		aflow_sym
.gitignore		.gitignore
CITATION.cff		CITATION.cff
LICENSE		LICENSE
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

DopeFlow: DFT Calculation Workflow for Atomic Substitution Problem

1. job_monitor.sh

Usage

Directory Structure

Required Files

2. redo_optics.sh/restart_optics.sh

Usage

Required Files

<calc_type> can be:

Directory Structure

Output Logs

Features

Features

job_monitor.sh

redo_optics.sh/restart_optics.sh

Common Issues

Notes

3. submit_phonon.sh

Usage

Required Files

Directory Structure

Output Logs

Features

Workflow

Tips for Phonon Calculations

4. Post-Processing Scripts

post-proc_phonon.sh

Usage

Features

Required Files

band_gap-pp.sh

Usage

Features

Output Files

get_err_phon.sh

Usage

Features

5. ELF Analysis for Electride Detection

analyze_electride.py

Usage

Required Files

Features

Output

Parameters

Tips for ELF Calculations

Workflow for ELF Analysis

Script Dependencies

Python Dependencies

Environment Setup

Common Workflow

5. Atomic Substitution with Entropy-Guided MCMC

fp_doping.py (RECOMMENDED)

Key Features

Usage

Algorithm Overview

Output Files

Performance

Parameters Guide

Advantages Over Previous Methods

Tips

Duplicate Checking

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`<calc_type>` can be:

Packages