{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2023,10,4]],"date-time":"2023-10-04T05:44:06Z","timestamp":1696398246731},"reference-count":27,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2022,5,12]],"date-time":"2022-05-12T00:00:00Z","timestamp":1652313600000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2022,5,12]],"date-time":"2022-05-12T00:00:00Z","timestamp":1652313600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["BMC Bioinformatics"],"published-print":{"date-parts":[[2022,12]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:sec>\n                <jats:title>Background<\/jats:title>\n                <jats:p>Disease detection is an important aspect of biotherapy. With the development of biotechnology and computer technology, there are many methods to detect disease based on single biomarker. However, biomarker does not influence disease alone in some cases. It\u2019s the interaction between biomarkers that determines disease status. The existing influence measure <jats:italic>I<\/jats:italic>-score is used to evaluate the importance of interaction in determining disease status, but there is a deviation about the number of variables in interaction when applying <jats:italic>I<\/jats:italic>-score. To solve the problem, we propose a new influence measure Multivariate Gain Ratio (MGR) based on Gain Ratio (GR) of single-variate, which provides us with multivariate combination called interaction.<\/jats:p>\n              <\/jats:sec><jats:sec>\n                <jats:title>Results<\/jats:title>\n                <jats:p>We propose a preprocessing verification algorithm based on partial predictor variables to select an appropriate preprocessing method. In this paper, an algorithm for selecting key interactions of biomarkers and applying key interactions to construct a disease detection model is provided. MGR is more credible than <jats:italic>I<\/jats:italic>-score in the case of interaction containing small number of variables. Our method behaves better with average accuracy <jats:inline-formula><jats:alternatives><jats:tex-math>$$93.13\\%$$<\/jats:tex-math><mml:math xmlns:mml=\"http:\/\/www.w3.org\/1998\/Math\/MathML\">\n                    <mml:mrow>\n                      <mml:mn>93.13<\/mml:mn>\n                      <mml:mo>%<\/mml:mo>\n                    <\/mml:mrow>\n                  <\/mml:math><\/jats:alternatives><\/jats:inline-formula> than <jats:italic>I<\/jats:italic>-score of <jats:inline-formula><jats:alternatives><jats:tex-math>$$91.73\\%$$<\/jats:tex-math><mml:math xmlns:mml=\"http:\/\/www.w3.org\/1998\/Math\/MathML\">\n                    <mml:mrow>\n                      <mml:mn>91.73<\/mml:mn>\n                      <mml:mo>%<\/mml:mo>\n                    <\/mml:mrow>\n                  <\/mml:math><\/jats:alternatives><\/jats:inline-formula> in Breast Cancer Wisconsin (Diagnostic) Dataset. Compared to the classification results <jats:inline-formula><jats:alternatives><jats:tex-math>$$89.80\\%$$<\/jats:tex-math><mml:math xmlns:mml=\"http:\/\/www.w3.org\/1998\/Math\/MathML\">\n                    <mml:mrow>\n                      <mml:mn>89.80<\/mml:mn>\n                      <mml:mo>%<\/mml:mo>\n                    <\/mml:mrow>\n                  <\/mml:math><\/jats:alternatives><\/jats:inline-formula> based on all predictor variables, MGR identifies the true main biomarkers and realizes the dimension reduction. In Leukemia Dataset, the experiment results show the effectiveness of MGR with the accuracy of <jats:inline-formula><jats:alternatives><jats:tex-math>$$97.32\\%$$<\/jats:tex-math><mml:math xmlns:mml=\"http:\/\/www.w3.org\/1998\/Math\/MathML\">\n                    <mml:mrow>\n                      <mml:mn>97.32<\/mml:mn>\n                      <mml:mo>%<\/mml:mo>\n                    <\/mml:mrow>\n                  <\/mml:math><\/jats:alternatives><\/jats:inline-formula> compared to <jats:italic>I<\/jats:italic>-score with accuracy <jats:inline-formula><jats:alternatives><jats:tex-math>$$89.11\\%$$<\/jats:tex-math><mml:math xmlns:mml=\"http:\/\/www.w3.org\/1998\/Math\/MathML\">\n                    <mml:mrow>\n                      <mml:mn>89.11<\/mml:mn>\n                      <mml:mo>%<\/mml:mo>\n                    <\/mml:mrow>\n                  <\/mml:math><\/jats:alternatives><\/jats:inline-formula>. The results can be explained by the nature of MGR and <jats:italic>I<\/jats:italic>-score mentioned above because every key interaction contains a small number of variables in Leukemia Dataset.<\/jats:p>\n              <\/jats:sec><jats:sec>\n                <jats:title>Conclusions<\/jats:title>\n                <jats:p>MGR is effective for selecting important biomarkers and biomarker interactions even in high-dimension feature space in which the interaction could contain more than two biomarkers. The prediction ability of interactions selected by MGR is better than <jats:italic>I<\/jats:italic>-score in the case of interaction containing small number of variables. MGR is generally applicable to various types of biomarker datasets including cell nuclei, gene, SNPs and protein datasets.<\/jats:p>\n              <\/jats:sec>","DOI":"10.1186\/s12859-022-04699-7","type":"journal-article","created":{"date-parts":[[2022,5,12]],"date-time":"2022-05-12T13:04:40Z","timestamp":1652360680000},"update-policy":"http:\/\/dx.doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":1,"title":["Biomarker interaction selection and disease detection based on multivariate gain ratio"],"prefix":"10.1186","volume":"23","author":[{"given":"Xiao","family":"Chu","sequence":"first","affiliation":[]},{"given":"Mao","family":"Jiang","sequence":"additional","affiliation":[]},{"given":"Zhuo-Jun","family":"Liu","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2022,5,12]]},"reference":[{"issue":"8","key":"4699_CR1","doi-asserted-by":"publisher","first-page":"618","DOI":"10.1038\/nrg1407","volume":"5","author":"\u00d6 Carlborg","year":"2004","unstructured":"Carlborg \u00d6, Haley CS. Epistasis: too often neglected in complex trait studies? Nat Rev Genet. 2004;5(8):618\u201325.","journal-title":"Nat Rev Genet"},{"issue":"6034","key":"4699_CR2","doi-asserted-by":"publisher","first-page":"1193","DOI":"10.1126\/science.1203801","volume":"332","author":"AI Khan","year":"2011","unstructured":"Khan AI, Dinh DM, Schneider D, Lenski RE, Cooper TF. Negative epistasis between beneficial mutations in an evolving bacterial population. Science. 2011;332(6034):1193\u20136.","journal-title":"Science"},{"issue":"3","key":"4699_CR3","doi-asserted-by":"publisher","first-page":"309","DOI":"10.1016\/j.ajhg.2009.08.006","volume":"85","author":"JH Moore","year":"2009","unstructured":"Moore JH, Williams SM. Epistasis and its implications for personal genetics. Am J Hum Genet. 2009;85(3):309\u201320.","journal-title":"Am J Hum Genet"},{"issue":"50","key":"4699_CR4","doi-asserted-by":"publisher","first-page":"19910","DOI":"10.1073\/pnas.0810388105","volume":"105","author":"H Shao","year":"2008","unstructured":"Shao H, Burrage LC, Sinasac DS, Hill AE, Ernest SR, O\u2019Brien W, Courtland H-W, Jepsen KJ, Kirby A, Kulbokas E, et al. Genetic architecture of complex traits: large phenotypic effects and pervasive epistasis. Proc Natl Acad Sci. 2008;105(50):19910\u20134.","journal-title":"Proc Natl Acad Sci"},{"issue":"4","key":"4699_CR5","doi-asserted-by":"publisher","first-page":"1193","DOI":"10.1073\/pnas.1119675109","volume":"109","author":"O Zuk","year":"2012","unstructured":"Zuk O, Hechter E, Sunyaev SR, Lander ES. The mystery of missing heritability: genetic interactions create phantom heritability. Proc Natl Acad Sci. 2012;109(4):1193\u20138.","journal-title":"Proc Natl Acad Sci"},{"issue":"6","key":"4699_CR6","doi-asserted-by":"publisher","first-page":"392","DOI":"10.1038\/nrg2579","volume":"10","author":"HJ Cordell","year":"2009","unstructured":"Cordell HJ. Detecting gene\u2013gene interactions that underlie human diseases. Nat Rev Genet. 2009;10(6):392\u2013404.","journal-title":"Nat Rev Genet"},{"issue":"4","key":"4699_CR7","first-page":"472","volume":"24","author":"C Kooperberg","year":"2009","unstructured":"Kooperberg C, LeBlanc M, Dai JY, Rajapakse I. Structures and assumptions: strategies to harness gene$$\\times$$gene and gene$$\\times$$environment interactions in GWAS. Stat Sci Rev J Inst Math Stat. 2009;24(4):472.","journal-title":"Stat Sci Rev J Inst Math Stat"},{"issue":"1","key":"4699_CR8","first-page":"27","volume":"159","author":"M Emily","year":"2018","unstructured":"Emily M. A survey of statistical methods for gene-gene interaction in case-control genome-wide association studies. Journal de la soci\u00e9t\u00e9 fran\u00e7aise de statistique. 2018;159(1):27\u201367.","journal-title":"Journal de la soci\u00e9t\u00e9 fran\u00e7aise de statistique"},{"issue":"6","key":"4699_CR9","doi-asserted-by":"publisher","first-page":"405","DOI":"10.1111\/ahg.12324","volume":"83","author":"G Chen","year":"2019","unstructured":"Chen G, Yuan A, Cai T, Li C.-M, Bentley AR, Zhou J, N. Shriner D, A. Adeyemo A, N. Rotimi C. Measuring gene\u2013gene interaction using Kullback\u2013Leibler divergence. Ann Hum Genet. 2019;83(6):405\u201317.","journal-title":"Ann Hum Genet"},{"issue":"1","key":"4699_CR10","doi-asserted-by":"publisher","first-page":"352","DOI":"10.1214\/13-AOAS690","volume":"8","author":"Y-T Huang","year":"2014","unstructured":"Huang Y-T, VanderWeele TJ, Lin X. Joint analysis of SNP and gene expression data in genetic association studies of complex diseases. Ann Appl Stat. 2014;8(1):352.","journal-title":"Ann Appl Stat"},{"key":"4699_CR11","doi-asserted-by":"crossref","unstructured":"Chattopadhyay A, Lu T-P. Gene-gene interaction: the curse of dimensionality. Ann Transl Med. 2019;7(24).","DOI":"10.21037\/atm.2019.12.87"},{"issue":"4","key":"4699_CR12","doi-asserted-by":"publisher","first-page":"1335","DOI":"10.1214\/09-AOAS265","volume":"3","author":"H Chernoff","year":"2009","unstructured":"Chernoff H, Lo S-H, Zheng T. Discovering influential variables: a method of partitions. Ann Appl Stat. 2009;3(4):1335\u201369.","journal-title":"Ann Appl Stat"},{"issue":"2","key":"4699_CR13","doi-asserted-by":"publisher","first-page":"252","DOI":"10.1016\/j.jtbi.2005.11.036","volume":"241","author":"JH Moore","year":"2006","unstructured":"Moore JH, Gilbert JC, Tsai C-T, Chiang F-T, Holden T, Barney N, White BC. A flexible computational framework for detecting, characterizing, and interpreting statistical patterns of epistasis in genetic studies of human disease susceptibility. J Theor Biol. 2006;241(2):252\u201361.","journal-title":"J Theor Biol"},{"issue":"5","key":"4699_CR14","doi-asserted-by":"publisher","first-page":"939","DOI":"10.1086\/521878","volume":"81","author":"P Chanda","year":"2007","unstructured":"Chanda P, Zhang A, Brazeau D, Sucheston L, Freudenheim JL, Ambrosone C, Ramanathan M. Information-theoretic metrics for visualizing gene\u2013environment interactions. Am J Hum Genet. 2007;81(5):939\u201363.","journal-title":"Am J Hum Genet"},{"issue":"2","key":"4699_CR15","doi-asserted-by":"publisher","first-page":"362","DOI":"10.1016\/j.jtbi.2007.10.001","volume":"250","author":"G Kang","year":"2008","unstructured":"Kang G, Yue W, Zhang J, Cui Y, Zuo Y, Zhang D. An entropy-based approach for testing genetic epistasis underlying complex diseases. J Theor Biol. 2008;250(2):362\u201374.","journal-title":"J Theor Biol"},{"issue":"2","key":"4699_CR16","doi-asserted-by":"publisher","first-page":"229","DOI":"10.1038\/sj.ejhg.5201921","volume":"16","author":"C Dong","year":"2008","unstructured":"Dong C, Chu X, Wang Y, Wang Y, Jin L, Shi T, Huang W, Li Y. Exploration of gene\u2013gene interaction effects using entropy-based methods. Eur J Hum Genet. 2008;16(2):229\u201335.","journal-title":"Eur J Hum Genet"},{"issue":"1","key":"4699_CR17","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1186\/1755-8794-7-1","volume":"7","author":"M-S Kwon","year":"2014","unstructured":"Kwon M-S, Park M, Park T. IGENT: efficient entropy based algorithm for genome-wide gene\u2013gene interaction analysis. BMC Med Genom. 2014;7(1):1\u201311.","journal-title":"BMC Med Genom"},{"key":"4699_CR18","unstructured":"Breast Cancer Wisconsin (Diagnostic) Data Set. http:\/\/archive.ics.uci.edu\/ml\/index.php. Accessed 20 Apr 2021."},{"issue":"4","key":"4699_CR19","doi-asserted-by":"publisher","first-page":"570","DOI":"10.1287\/opre.43.4.570","volume":"43","author":"OL Mangasarian","year":"1995","unstructured":"Mangasarian OL, Street WN, Wolberg WH. Breast cancer diagnosis and prognosis via linear programming. Oper Res. 1995;43(4):570\u20137.","journal-title":"Oper Res"},{"issue":"5439","key":"4699_CR20","doi-asserted-by":"publisher","first-page":"531","DOI":"10.1126\/science.286.5439.531","volume":"286","author":"TR Golub","year":"1999","unstructured":"Golub TR, Slonim DK, Tamayo P, Huard C, Gaasenbeek M, Mesirov JP, Coller H, Loh ML, Downing JR, Caligiuri MA, et al. Molecular classification of cancer class discovery and class prediction by gene expression monitoring. Science. 1999;286(5439):531\u20137.","journal-title":"Science"},{"key":"4699_CR21","unstructured":"Gene expression dataset (Golub et al.). https:\/\/www.kaggle.com. Accessed 12 May 2021."},{"issue":"457","key":"4699_CR22","doi-asserted-by":"publisher","first-page":"77","DOI":"10.1198\/016214502753479248","volume":"97","author":"S Dudoit","year":"2002","unstructured":"Dudoit S, Fridlyand J, Speed TP. Comparison of discrimination methods for the classification of tumors using gene expression data. J Am Stat Assoc. 2002;97(457):77\u201387.","journal-title":"J Am Stat Assoc"},{"issue":"21","key":"4699_CR23","doi-asserted-by":"publisher","first-page":"2834","DOI":"10.1093\/bioinformatics\/bts531","volume":"28","author":"H Wang","year":"2012","unstructured":"Wang H, Lo S-H, Zheng T, Hu I. Interaction-based feature selection and classification for high-dimensional biological data. Bioinformatics. 2012;28(21):2834\u201342.","journal-title":"Bioinformatics"},{"key":"4699_CR24","unstructured":"Quinlan J. The Morgan Kaufmann series in machine learning. San Mateo; 1993."},{"issue":"1","key":"4699_CR25","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1186\/1471-2105-7-359","volume":"7","author":"IB Jeffery","year":"2006","unstructured":"Jeffery IB, Higgins DG, Culhane AC. Comparison and evaluation of methods for generating differentially expressed gene lists from microarray data. BMC Bioinform. 2006;7(1):1\u201316.","journal-title":"BMC Bioinform"},{"key":"4699_CR26","doi-asserted-by":"crossref","unstructured":"Yang Y, Webb G.I, Wu X, Discretization methods. In: Data mining and knowledge discovery handbook, p. 101\u2013116. Boston: Springer; 2009.","DOI":"10.1007\/978-0-387-09823-4_6"},{"key":"4699_CR27","volume-title":"An introduction to categorical data analysis","author":"A Agresti","year":"1996","unstructured":"Agresti A. An introduction to categorical data analysis. New York: Wiley; 1996."}],"container-title":["BMC Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s12859-022-04699-7.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1186\/s12859-022-04699-7\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s12859-022-04699-7.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,5,12]],"date-time":"2022-05-12T13:04:54Z","timestamp":1652360694000},"score":1,"resource":{"primary":{"URL":"https:\/\/bmcbioinformatics.biomedcentral.com\/articles\/10.1186\/s12859-022-04699-7"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,5,12]]},"references-count":27,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2022,12]]}},"alternative-id":["4699"],"URL":"https:\/\/doi.org\/10.1186\/s12859-022-04699-7","relation":{},"ISSN":["1471-2105"],"issn-type":[{"value":"1471-2105","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,5,12]]},"assertion":[{"value":"11 November 2021","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"14 April 2022","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"12 May 2022","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"In this work, we did not generate any new experimental data. The declarations of the datasets used in this work can be found in [CitationRef removed, CitationRef removed] correspondingly.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Ethics approval and consent to participate"}},{"value":"Not applicable.","order":3,"name":"Ethics","group":{"name":"EthicsHeading","label":"Consent for publication"}},{"value":"The authors declare that they have no competing interests.","order":4,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"176"}}