MO-GCAN is a framework that leverages supervised feature learning and classification based on a graph-based learning approach with attention mechanism for subtype detection. The work is inspired by MOGONET[1], MoGCN [2] and omicsGAT[3] and is built upon the code from https://github.com/Lifoof/MoGCN and https://github.com/compbiolabucf/omicsGAT.
[1]Wang, Tongxin, Wei Shao, Zhi Huang, Haixu Tang, Jie Zhang, Zhengming Ding, and Kun Huang. “MOGONET Integrates Multi-Omics Data Using Graph Convolutional Networks Allowing Patient Classification and Biomarker Identification.” Nature Communications 12, no. 1 (June 8, 2021). https://doi.org/10.1038/s41467-021-23774-w.
[2] Li, Xiao, Jie Ma, Ling Leng, Mingfei Han, Mansheng Li, Fuchu He, and Yunping Zhu. “MOGCN: A Multi-Omics Integration Method Based on Graph Convolutional Network for Cancer Subtype Analysis.” Frontiers in Genetics 13 (February 2, 2022). https://doi.org/10.3389/fgene.2022.806842.
[3] Baul, Sudipto, Khandakar Tanvir Ahmed, Joseph Filipek, and Wei Zhang. “OMICSGAT: Graph Attention Network for Cancer Subtype Analyses.” International Journal of Molecular Sciences 23, no. 18 (September 6, 2022): 10220. https://doi.org/10.3390/ijms231810220.
-
First download the processed data from the website https://figshare.com/articles/dataset/MO-GCAN_data/25823950 and saved them in the 'data' folder. The data folder should contains 8 subfolders, named by the cancer type name (like brca, cesc...ucec), and each subfolder should contains 5 files (cna_data.csv, met_data.csv, mrna_data.csv, rppa_data.csv and subtype_data.csv).Then create an empty 'result' folder to save the outputs. If you wanted to experience the whole process involing data processing, you can start with the original data (see the instrument inside the 'original_data' folder), and run the data process.py on each cancer type. An example command to process the original data is: python3 data_process.py -t lgg
-
to run the main.py: for lgg cancer: python3 main.py
for ucec cancer: python3 main.py -t ucec
for stad cancer: python3 main.py -t stad
for sarc cancer: python3 main.py -t sarc
for coadread cancer: python3 main.py -t coadread
for cesc cancer: python3 main.py -t cesc
for hnsc cancer: python3 main.py -t hnsc
for brca cancer: python3 main.py -t brca
python: 3.9.6
scikit-learn:1.5.2
pandas:1.5.3
numpy:1.26.4
torch:2.4.1
snfpy: 0.2.2
matplotlib:3.9.2
seaborn:0.13.2
pip3: 24.2
The result folder is organized by cancer type, with each subfolder corresponding to a specific cancer. Inside each subfolder, a cancer_type.log file captures the output generated during the execution of commands for that particular dataset, including single-omics, all-omics, and selected omics predictions. For the purpose of multi-omics prediction, we utilized the results from "selected omics", which represent the final output used in our analyses.