MOF-Dimension-Analyzer

This repository contains a tool for calculating the dimensionality of MOFs from CIF files. Based on an earlier CSD version (created by S.B. Wiggin, CCDC, 2020-01-21), this version is updated to directly process CIF files and supports parallel execution for handling larger datasets more efficiently.

Note: Both versions of the code rely on the integrated CSD Python library, particularly the entry.crystal.polymer_expansion function. For larger MOFs, this function tends to consume significantly more RAM, leading to potential spikes in RAM usage during execution. To prevent your jobs from being killed due to memory issues, it is strongly recommended to allocate a generous amount of RAM for both the parallel and basic versions of the code.

MOF_Dimensions_CIF.py

The MOF_Dimensions_CIF.py script is an updated version of the CCDC code generated by S.B. Wiggin (https://github.com/ccdc-opensource/science-paper-mofs-2020.git), reworked to allow CIF files as input.

To run this version of the code, you need to specify two arguments:

The absolute path of the dataset (directory containing CIF files)
The absolute path to the output file (a .csv file)

Code example:

python MOF_Dimensions_CIF.py -i /path/to/dataset/directory -o /path/to/csv/file.csv

This command will iterate over all of the CIF structures in the dataset directory sequentially (one structure at a time) and print the dimensionality results to both the command line and the CSV file, similar to how the original code printed the results.

MOF_Dimensions_CIF_Parallel.py

The MOF_Dimensions_CIF_Parallel.py script was created to shorten the runtime by allowing the code to run on multiple CPU cores simultaneously using Python's built-in ProcessPoolExecutor (from the concurrent.futures library).

To run this version of the code, you need to specify three arguments:

The absolute path of the dataset (directory containing CIF files)
The absolute path to the output file (a .csv file)
The number of cores to use (integer)

Code example:

python MOF_Dimensions_CIF_Parallel.py -i /path/to/dataset/directory -o /path/to/csv/file.csv -n 4

This command will split the dataset into pools (based on the number of CPU cores provided). Each pool will be computed in parallel, with the output printed to both the command line and the CSV file. Note that while this code is much faster and scales with the number of CPU cores provided, it requires more RAM to schedule these parallelizable pools.

Large Dataset Scripts

For reference, bash scripts are also provided in the repository to demonstrate further parallelization. For instance, the ARC-MOF dimensionality results were computed using these batch scripts. The SUBMIT.sh script splits the overall dataset into smaller batches of user-specified sizes. Each batch can be submitted independently using either the MOF_Dimensions_CIF_Parallel.py or MOF_Dimensions_CIF.py scripts, as specified in the SEND.sh script.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
Large_Dataset_Scripts		Large_Dataset_Scripts
MOF_Dimensions_CIF.py		MOF_Dimensions_CIF.py
MOF_Dimensions_CIF_Parallel.py		MOF_Dimensions_CIF_Parallel.py
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

MOF-Dimension-Analyzer

MOF_Dimensions_CIF.py

Code example:

MOF_Dimensions_CIF_Parallel.py

Code example:

Large Dataset Scripts

About

Uh oh!

Releases

Packages

Languages

uowoolab/MOF-Dimension-Analyzer

Folders and files

Latest commit

History

Repository files navigation

MOF-Dimension-Analyzer

MOF_Dimensions_CIF.py

Code example:

MOF_Dimensions_CIF_Parallel.py

Code example:

Large Dataset Scripts

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages