This is the implementation of the paper "NPGPT: Natural Product-Like Compound Generation with GPT-based Chemical Language Models" by Koh Sakano, Kairi Furui, and Masahito Ohue.
Install uv if you haven't already. Then, clone the repository and install the dependencies.
git clone https://github.com/ohuelab/npgpt.git
cd npgpt
git submodule update --init --recursive
uv syncFirst, download the training dataset from the following link and place it in the data/ directory:
This dataset contains molecules from the COCONUT natural product library converted to SMILES format.
To train the model, run the following command:
uv run python src/scripts/train.pyFine-tuned models trained on the COCONUT dataset are available. We provide two models fine-tuned from two different pre-trained models: smiles-gpt and ChemGPT.
You can download the models from the following link:
To generate SMILES strings using the trained model, first ensure that you have placed the model checkpoint files in the checkpoints/smiles-gpt/ directory for the smiles-gpt model. Then run the following command:
uv run python src/scripts/inference.pyWe also provide a Google Colab notebook for inference without local installation:
This notebook allows you to generate SMILES strings using our pre-trained models directly in your browser.