Update 2024: The tmQMg dataset has been extended by 13,756 transition metal complexes extracted from the Cambridge Structural Database.
This repository contains the graph dataset tmQMg containing descriptive graph representations of 74,555 transition metal complexes (TMCs), including all thirty elements from the 3d, 4d, and 5d series. These representations were derived from quantum chemistry simulation data and more preciseley Natural Bond Order (NBO) analysis. We provide three different types of graphs as GML formatted files: baseline, u-NatQG and d-NatQG. The graphs can be used in deep graph learning methods and can be downloaded from here. The code used to generate these representations can be found at HyDGL. A detailed discussion about the representations and machine learning methods can be found in the corresponding publication.
- Overview of the different graph types and links to their storage location.
- List of all TMCs and their respective graph level features, quantum properties and SMILES strings.
- Graph level features are: charge, molecular mass, number of atoms and number of electrons
- The TMC SMILES strings were computed using the xyz2mol_tm tool developed by the Jensen group. In particular, we made use of their procedure based on extended Hückel data. Details can be found in the associated publication.
- Zip file of the xyz data of all compounds in the dataset.
Furthermore, we provide here the Python codes used to perform the various machine learning experiments.
- List of the IDs of about 2.5k of the TMCs that were deemed to be outliers based on their quantum properties for the performed ML experiments.
- Holds the code for the Gilmer net and comprehensive analysis of data.
- Consult the provided README for more info.