{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,26]],"date-time":"2026-02-26T20:34:17Z","timestamp":1772138057158,"version":"3.50.1"},"reference-count":35,"publisher":"Oxford University Press (OUP)","issue":"1","license":[{"start":{"date-parts":[[2022,12,9]],"date-time":"2022-12-09T00:00:00Z","timestamp":1670544000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"China Brain Project","award":["2021ZD0200403"],"award-info":[{"award-number":["2021ZD0200403"]}]},{"DOI":"10.13039\/501100012166","name":"National Key R#x00026;D Program of China","doi-asserted-by":"publisher","award":["2018AAA0100100"],"award-info":[{"award-number":["2018AAA0100100"]}],"id":[{"id":"10.13039\/501100012166","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100012166","name":"National Key R#x00026;D Program of China","doi-asserted-by":"publisher","award":["2018YFA0902600"],"award-info":[{"award-number":["2018YFA0902600"]}],"id":[{"id":"10.13039\/501100012166","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2023,1,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:sec>\n                    <jats:title>Motivation<\/jats:title>\n                    <jats:p>Cell-type-specific gene expression is maintained in large part by transcription factors (TFs) selectively binding to distinct sets of sites in different cell types. Recent research works have provided evidence that such cell-type-specific binding is determined by TF\u2019s intrinsic sequence preferences, cooperative interactions with co-factors, cell-type-specific chromatin landscapes and 3D chromatin interactions. However, computational prediction and characterization of cell-type-specific and shared binding sites is rarely studied.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Results<\/jats:title>\n                    <jats:p>In this article, we propose two computational approaches for predicting and characterizing cell-type-specific and shared binding sites by integrating multiple types of features, in which one is based on XGBoost and another is based on convolutional neural network (CNN). To validate the performance of our proposed approaches, ChIP-seq datasets of 10 binding factors were collected from the GM12878 (lymphoblastoid) and K562 (erythroleukemic) human hematopoietic cell lines, each of which was further categorized into cell-type-specific (GM12878- and K562-specific) and shared binding sites. Then, multiple types of features for these binding sites were integrated to train the XGBoost- and CNN-based models. Experimental results show that our proposed approaches significantly outperform other competing methods on three classification tasks. Moreover, we identified independent feature contributions for cell-type-specific and shared sites through SHAP values and explored the ability of the CNN-based model to predict cell-type-specific and shared binding sites by excluding or including DNase signals. Furthermore, we investigated the generalization ability of our proposed approaches to different binding factors in the same cellular environment.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Availability and implementation<\/jats:title>\n                    <jats:p>The source code is available at: https:\/\/github.com\/turningpoint1988\/CSSBS.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Supplementary information<\/jats:title>\n                    <jats:p>Supplementary data are available at Bioinformatics online.<\/jats:p>\n                  <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btac798","type":"journal-article","created":{"date-parts":[[2022,12,9]],"date-time":"2022-12-09T09:33:11Z","timestamp":1670578391000},"source":"Crossref","is-referenced-by-count":9,"title":["Computational prediction and characterization of cell-type-specific and shared binding sites"],"prefix":"10.1093","volume":"39","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-4232-7736","authenticated-orcid":false,"given":"Qinhu","family":"Zhang","sequence":"first","affiliation":[{"name":"Translational Medical Center for Stem Cell Therapy and Institute for Regenerative Medicine, Shanghai East Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University , Shanghai 200092, China"}]},{"given":"Pengrui","family":"Teng","sequence":"additional","affiliation":[{"name":"School of Information and Control Engineering, China University of Mining and Technology , Xuzhou 221116, China"}]},{"given":"Siguo","family":"Wang","sequence":"additional","affiliation":[{"name":"Institute of Machine Learning and Systems Biology, School of Electronics and Information Engineering, Tongji University , Shanghai 201804, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9592-7727","authenticated-orcid":false,"given":"Ying","family":"He","sequence":"additional","affiliation":[{"name":"Institute of Machine Learning and Systems Biology, School of Electronics and Information Engineering, Tongji University , Shanghai 201804, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5522-6750","authenticated-orcid":false,"given":"Zhen","family":"Cui","sequence":"additional","affiliation":[{"name":"Institute of Machine Learning and Systems Biology, School of Electronics and Information Engineering, Tongji University , Shanghai 201804, China"}]},{"given":"Zhenghao","family":"Guo","sequence":"additional","affiliation":[{"name":"Institute of Machine Learning and Systems Biology, School of Electronics and Information Engineering, Tongji University , Shanghai 201804, China"}]},{"given":"Yixin","family":"Liu","sequence":"additional","affiliation":[{"name":"School of Health Science and Engineering, University of Shanghai for Science and Technology , Shanghai 200093, China"}]},{"given":"Changan","family":"Yuan","sequence":"additional","affiliation":[{"name":"Big Data and Intelligent Computing Research Center, Guangxi Academy of Science , Nanning 530007, China"}]},{"given":"Qi","family":"Liu","sequence":"additional","affiliation":[{"name":"Translational Medical Center for Stem Cell Therapy and Institute for Regenerative Medicine, Shanghai East Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University , Shanghai 200092, China"}]},{"given":"De-Shuang","family":"Huang","sequence":"additional","affiliation":[{"name":"EIT Institute for Advanced Study , Ningbo, Zhejiang 315201, China"}]}],"member":"286","published-online":{"date-parts":[[2022,12,9]]},"reference":[{"key":"2023010805394969000_btac798-B1","doi-asserted-by":"crossref","first-page":"831","DOI":"10.1038\/nbt.3300","article-title":"Predicting the sequence specificities of DNA-and RNA-binding proteins by deep learning","volume":"33","author":"Alipanahi","year":"2015","journal-title":"Nat. Biotechnol"},{"key":"2023010805394969000_btac798-B2","doi-asserted-by":"crossref","first-page":"71","DOI":"10.1038\/s41576-019-0173-8","article-title":"Determinants of enhancer and promoter activities of regulatory elements","volume":"21","author":"Andersson","year":"2020","journal-title":"Nat. Rev. Genet"},{"key":"2023010805394969000_btac798-B3","doi-asserted-by":"crossref","first-page":"1723","DOI":"10.1101\/gr.127712.111","article-title":"Sequence and chromatin determinants of cell-type\u2013specific transcription factor binding","volume":"22","author":"Arvey","year":"2012","journal-title":"Genome Res"},{"key":"2023010805394969000_btac798-B4","doi-asserted-by":"crossref","first-page":"W369","DOI":"10.1093\/nar\/gkl198","article-title":"MEME: discovering and analyzing DNA and protein sequence motifs","volume":"34","author":"Bailey","year":"2006","journal-title":"Nucleic Acids Res"},{"key":"2023010805394969000_btac798-B5","doi-asserted-by":"crossref","first-page":"1429","DOI":"10.1038\/nbt1246","article-title":"Compact, universal DNA microarrays to comprehensively determine transcription-factor binding site specificities","volume":"24","author":"Berger","year":"2006","journal-title":"Nat. Biotechnol"},{"key":"2023010805394969000_btac798-B6","doi-asserted-by":"crossref","first-page":"1045","DOI":"10.1038\/nbt1010-1045","article-title":"The NIH roadmap epigenomics mapping consortium","volume":"28","author":"Bernstein","year":"2010","journal-title":"Nat. Biotechnol"},{"key":"2023010805394969000_btac798-B7","doi-asserted-by":"crossref","first-page":"657","DOI":"10.1109\/TCBB.2018.2868071","article-title":"Probe efficient feature representation of gapped k-mer frequency vectors from sequences using deep neural networks","volume":"17","author":"Cao","year":"2020","journal-title":"IEEE\/ACM Trans. Comput. Biol. Bioinform"},{"key":"2023010805394969000_btac798-B8","doi-asserted-by":"crossref","first-page":"D165","DOI":"10.1093\/nar\/gkab1113","article-title":"JASPAR 2022: the 9th release of the open-access database of transcription factor binding profiles","volume":"50","author":"Castro-Mondragon","year":"2022","journal-title":"Nucleic Acids Res"},{"key":"2023010805394969000_btac798-B9","first-page":"785","author":"Chen","year":"2016"},{"key":"2023010805394969000_btac798-B10","doi-asserted-by":"crossref","first-page":"840","DOI":"10.1038\/nrg3306","article-title":"ChIP\u2013seq and beyond: new and improved methodologies to detect and characterize protein\u2013DNA interactions","volume":"13","author":"Furey","year":"2012","journal-title":"Nat. Rev. Genet"},{"key":"2023010805394969000_btac798-B11","doi-asserted-by":"crossref","first-page":"25","DOI":"10.1016\/j.molcel.2013.08.037","article-title":"Distinct properties of cell-type-specific and shared transcription factor binding sites","volume":"52","author":"Gertz","year":"2013","journal-title":"Mol. Cell"},{"key":"2023010805394969000_btac798-B12","doi-asserted-by":"crossref","first-page":"e1003711","DOI":"10.1371\/journal.pcbi.1003711","article-title":"Enhanced regulatory sequence prediction using gapped k-mer features","volume":"10","author":"Ghandi","year":"2014","journal-title":"PLoS Comput. Biol"},{"key":"2023010805394969000_btac798-B13","doi-asserted-by":"crossref","first-page":"9","DOI":"10.1186\/s13059-018-1614-y","article-title":"Accurate prediction of cell type-specific transcription factor binding","volume":"20","author":"Keilwagen","year":"2019","journal-title":"Genome Biol"},{"key":"2023010805394969000_btac798-B14","doi-asserted-by":"crossref","first-page":"1351","DOI":"10.1038\/nbt.1508","article-title":"Design and analysis of ChIP-seq experiments for DNA-binding proteins","volume":"26","author":"Kharchenko","year":"2008","journal-title":"Nat. Biotechnol"},{"key":"2023010805394969000_btac798-B15","doi-asserted-by":"crossref","first-page":"650","DOI":"10.1016\/j.cell.2018.01.029","article-title":"The human transcription factors","volume":"172","author":"Lambert","year":"2018","journal-title":"Cell"},{"key":"2023010805394969000_btac798-B16","doi-asserted-by":"crossref","first-page":"2196","DOI":"10.1093\/bioinformatics\/btw142","article-title":"LS-GKM: a new gkm-SVM for large-scale datasets","volume":"32","author":"Lee","year":"2016","journal-title":"Bioinformatics"},{"key":"2023010805394969000_btac798-B17","doi-asserted-by":"crossref","first-page":"721","DOI":"10.1101\/gr.269613.120","article-title":"Fast decoding cell type-specific transcription factor binding landscape at single-nucleotide resolution","volume":"31","author":"Li","year":"2021","journal-title":"Genome Res"},{"key":"2023010805394969000_btac798-B18","doi-asserted-by":"crossref","first-page":"281","DOI":"10.1101\/gr.237156.118","article-title":"Anchor: trans-cell type prediction of transcription factor binding sites","volume":"29","author":"Li","year":"2019","journal-title":"Genome Res"},{"key":"2023010805394969000_btac798-B19","doi-asserted-by":"crossref","first-page":"550","DOI":"10.1186\/s13059-014-0550-8","article-title":"Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2","volume":"15","author":"Love","year":"2014","journal-title":"Genome Biol"},{"key":"2023010805394969000_btac798-B20","doi-asserted-by":"crossref","first-page":"749","DOI":"10.1038\/s41551-018-0304-0","article-title":"Explainable machine-learning predictions for the prevention of hypoxaemia during surgery","volume":"2","author":"Lundberg","year":"2018","journal-title":"Nat. Biomed. Eng"},{"key":"2023010805394969000_btac798-B21","first-page":"2825","article-title":"Scikit-learn: machine learning in python","volume":"12","author":"Pedregosa","year":"2011","journal-title":"J. Mach. Learn. Res"},{"key":"2023010805394969000_btac798-B22","doi-asserted-by":"crossref","first-page":"e107","DOI":"10.1093\/nar\/gkw226","article-title":"DanQ: a hybrid convolutional and recurrent deep neural network for quantifying the function of DNA sequences","volume":"44","author":"Quang","year":"2016","journal-title":"Nucleic Acids Res"},{"key":"2023010805394969000_btac798-B23","doi-asserted-by":"crossref","first-page":"40","DOI":"10.1016\/j.ymeth.2019.03.020","article-title":"FactorNet: a deep learning framework for predicting cell type specific transcription factor binding from nucleotide-resolution sequential data","volume":"166","author":"Quang","year":"2019","journal-title":"Methods"},{"key":"2023010805394969000_btac798-B24","doi-asserted-by":"crossref","first-page":"1297","DOI":"10.1016\/j.celrep.2015.02.004","article-title":"Comparative Hi-C reveals that CTCF underlies evolution of chromosomal domain architecture","volume":"10","author":"Rudan","year":"2015","journal-title":"Cell Rep"},{"key":"2023010805394969000_btac798-B25","doi-asserted-by":"crossref","first-page":"12","DOI":"10.1038\/s41467-020-17239-9","article-title":"A supervised learning framework for chromatin loop detection in genome-wide contact maps","volume":"11","author":"Salameh","year":"2020","journal-title":"Nat. Commun"},{"key":"2023010805394969000_btac798-B26","first-page":"3145","author":"Shrikumar","year":"2017"},{"key":"2023010805394969000_btac798-B27","doi-asserted-by":"crossref","first-page":"100903","DOI":"10.1016\/j.jbc.2021.100903","article-title":"A novel inhibitor L755507 efficiently blocks c-Myc-MAX heterodimerization and induces apoptosis in cancer cells","volume":"297","author":"Singh","year":"2021","journal-title":"J. Biol. Chem"},{"key":"2023010805394969000_btac798-B28","doi-asserted-by":"crossref","first-page":"194443","DOI":"10.1016\/j.bbagrm.2019.194443","article-title":"Sequence and chromatin determinants of transcription factor binding and the establishment of cell type-specific binding patterns","volume":"1863","author":"Srivastava","year":"2020","journal-title":"Biochim. Biophys. Acta. Gene Regul. Mech"},{"key":"2023010805394969000_btac798-B29","doi-asserted-by":"crossref","first-page":"D605","DOI":"10.1093\/nar\/gkaa1074","article-title":"The STRING database in 2021: customizable protein\u2013protein networks, and functional characterization of user-uploaded gene\/measurement sets","volume":"49","author":"Szklarczyk","year":"2021","journal-title":"Nucleic Acids Res"},{"key":"2023010805394969000_btac798-B30","doi-asserted-by":"crossref","first-page":"1611","DOI":"10.1016\/j.cell.2015.11.024","article-title":"CTCF-mediated human 3D genome architecture reveals chromatin topology for transcription","volume":"163","author":"Tang","year":"2015","journal-title":"Cell"},{"key":"2023010805394969000_btac798-B31","doi-asserted-by":"crossref","first-page":"57","DOI":"10.1038\/nature11247","article-title":"An integrated encyclopedia of DNA elements in the human genome","volume":"489","author":"The ENCODE Project Consortium","year":"2012","journal-title":"Nature"},{"key":"2023010805394969000_btac798-B32","doi-asserted-by":"crossref","first-page":"1798","DOI":"10.1101\/gr.139105.112","article-title":"Sequence features and chromatin structure around the genomic regions bound by 119 human transcription factors","volume":"22","author":"Wang","year":"2012","journal-title":"Genome Res"},{"key":"2023010805394969000_btac798-B33","doi-asserted-by":"crossref","first-page":"bbaa435","DOI":"10.1093\/bib\/bbaa435","article-title":"Locating transcription factor binding sites by fully convolutional neural network","volume":"22","author":"Zhang","year":"2021","journal-title":"Brief. Bioinform"},{"key":"2023010805394969000_btac798-B34","doi-asserted-by":"crossref","first-page":"e1009941","DOI":"10.1371\/journal.pcbi.1009941","article-title":"Base-resolution prediction of transcription factor binding signals by a deep learning framework","volume":"18","author":"Zhang","year":"2022","journal-title":"PLoS Comput. Biol"},{"key":"2023010805394969000_btac798-B35","doi-asserted-by":"crossref","first-page":"931","DOI":"10.1038\/nmeth.3547","article-title":"Predicting effects of noncoding variants with deep learning\u2013based sequence model","volume":"12","author":"Zhou","year":"2015","journal-title":"Nat. Methods"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btac798\/48001327\/btac798.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/39\/1\/btac798\/48520710\/btac798.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/39\/1\/btac798\/48520710\/btac798.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,1,8]],"date-time":"2023-01-08T00:40:25Z","timestamp":1673138425000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/doi\/10.1093\/bioinformatics\/btac798\/6885447"}},"subtitle":[],"editor":[{"given":"Pier Luigi","family":"Martelli","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2022,12,9]]},"references-count":35,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2023,1,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btac798","relation":{"has-preprint":[{"id-type":"doi","id":"10.1101\/2022.05.06.490975","asserted-by":"object"}]},"ISSN":["1367-4811"],"issn-type":[{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2023,1,1]]},"published":{"date-parts":[[2022,12,9]]},"article-number":"btac798"}}