Skip to content

ohuelab/npgpt

Repository files navigation

NPGPT

This is the implementation of the paper "NPGPT: Natural Product-Like Compound Generation with GPT-based Chemical Language Models" by Koh Sakano, Kairi Furui, and Masahito Ohue.

Installation

Install uv if you haven't already. Then, clone the repository and install the dependencies.

git clone https://github.com/ohuelab/npgpt.git
cd npgpt
git submodule update --init --recursive
uv sync

Training

First, download the training dataset from the following link and place it in the data/ directory:

This dataset contains molecules from the COCONUT natural product library converted to SMILES format.

To train the model, run the following command:

uv run python src/scripts/train.py

Fine-tuned Models

Fine-tuned models trained on the COCONUT dataset are available. We provide two models fine-tuned from two different pre-trained models: smiles-gpt and ChemGPT.

You can download the models from the following link:

Inference

To generate SMILES strings using the trained model, first ensure that you have placed the model checkpoint files in the checkpoints/smiles-gpt/ directory for the smiles-gpt model. Then run the following command:

uv run python src/scripts/inference.py

Google Colab

We also provide a Google Colab notebook for inference without local installation:

This notebook allows you to generate SMILES strings using our pre-trained models directly in your browser.

About

NPGPT: Natural Product-Like Compound Generation with GPT-based Chemical Language Models

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •