This repository provides the BiFuN-L dataset, which contains 2,548 ligands from the parent BiFuN dataset. BiFuN is a dataset of transition metal complexes that will be provided in the next release. All ligands were extracted and derived from the tmQMg-L dataset and from in-house modifications based on substructure prevalence in the Cambridge Structural Database (CSD). They all contain a unique coordinating -NH group, and two more binding points.
Ligand names follow this format:
tmQMg-L name -(N transformation, n)-(NNN→NNP, n)-(5 membered ring)-(6-membered ring)
Examples:
WATDUQ-subgraph-0 -20-00-00000-00000
UNIYIA-subgraph-2 -21-10-51101-61001
Key components:
-
N transformations depending on the number of NH groups present in the tmQMg-L ligand: NH=0 (1), NH>1 (2) and NH>1 and NH2!=0 (3)
-
Number of transformations (n): Number of times this transformation is applied over the original ligand.
-
NNN to NNP transformation: Binary flag (0 = no transformation, 1 = applied).
-
Ring modifications: Transformation in 5-membered ring (-5) or in 6-membered ring (-6). Type of 5-heterocyclic ring (1, 2, 3, 4, 5, 6, 7) Type of 6-heterocyclic ring (1, 2, 3, 4)
-
0 = No transformation.
Contains information about IDs, SMILES, atom-coordinating patterns, and metal-coordinating atom indices.
Calculated electronic descriptors for the optimized free structures of the ligands.
Calculated RDKit descriptors for the ligands.
This release represents an initial preview of the BiFuN-L dataset. We plan to expand it with:
- Detailed documentation of design principles
- Additional computational descriptors
- Tutorials and usage examples
Lucía Morán-González - lmoranglez
David Balcells - David-Balcells
Ainara Nova