Skip to content

1160300611/MCDHGN

Repository files navigation

MCDHGN: Heterogeneous Network-based Cancer Driver Gene Prediction and Interpretability Analysis

Abstract

We trying to establish a novel cancer driver gene mining method based on heterogeneous network metapaths. First, we constructed a heterogeneous network using several types of multi-omics data that are biologically linked to genes. Subsequently, we form nine metapaths using genes as start and end nodes, and the representation vectors obtained by aggregating information within and across metapath nodes can be used as new gene features for subsequent classification and prediction tasks. In addition, we hope to improve the biological interpretability of the predictions by analysing the contribution of different metapathways. framework

Data

All the nodes we use to build the network and the connections between them can be found in the './data/' folder:

'./data/biological_features.csv': The result we use EMOGI method for multi-omics feature extraction of genes.

In the initial feature preprocessing for gene nodes, we calculated the aberrant expression values for each gene across 16 cancer types in tumor tissue samples. The mutation rates were determined using the probability of single nucleotide variations (SNVs). The probabilities of gene methylation and gene expression products were represented by the logarithmic difference values between tumor samples and normal gene segments. All preprocessing procedures can be found in the './preprocess_data' folder.

'./data/ppi/': Protein-Protein interaction (PPI) data obtained from the CPDB database. https://toxnet.nlm.nih.gov/cpdb/

'./data/msigdb/': Various multi-omics data related to cancer from the MSigDB. https://www.gsea-msigdb.org/gsea/msigdb

The file './preprocess/gene_protein.ipynb' deals with the interrelationships between genes obtained through the PPi network. While the file './preprocess/Msig_preprocess.ipynb' deals with Correspondence Relationships between Multi omics Biological Nodes in MSIGDB Database.

We use './preprocess/generate_network.ipynb' to build available graphs in dgl input format. The graphs used for training and comparing performance are stored in './data/network/hetero/new_9nodes_graph.bin'

Requirements

MCDHGN codes is baesd on Pytorch and Python and DGL library. So you will need the following packages to run.

  • Python==3.9.16
  • torch==1.12.0
  • jupyter notebook==6.5.4
  • ipykernel==6.19.2
  • ipython==8.12.0
  • dgl==1.1.1+cu102
  • torch-geometric==2.3.1
  • torchvision==0.13.1a0

Usage

First you can clone the repository or download source codes and data files.
git clone https://github.com/1160300611/MCDHGN.git

In order to improve running speed, we save the message flowing subgraph (mfgs) generated by random walks in heterogeneous graphs in './Intermediate/blocks/' .You can directly use these results to execute file './5fold_verification.ipynb' to view the results of five-fold cross validation, or choose to regenerate the messaging flowing subgraphs as guided by the code comments.

Use File './test_and_pridict.ipynb' to view the training and prediction results of the model on the entire MCDHGN label set.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages