Skip to main content

RISC: A New Filter Approach for Feature Selection from Proteomic Data

  • Conference paper
Medical Biometrics (ICMB 2008)

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 4901))

Included in the following conference series:

  • 1161 Accesses

Abstract

This paper proposes a novel feature selection technique for SELDI-TOF spectrum data. The new technique, called RISC (Relevance Index by Sample Counting) , measures the relevance of features based on each sample’s discriminating power to partition the samples in the opposite class. We also proposes a heuristic searching method to obtain the optimal feature set, which makes use of the relevance parameters. Our technique is fast even for extremely high-dimensional datasets such as SELDI spectrum, since it has low computational complexity and consists of simple counting operations. The new technique also shows good performance comparable to the conventional feature selection techniques from the experiment on three clinical datasets from NCI/CCR and FDA/CBER Clinical Proteomics Program Databank: Ovarian 4-3-02, Ovarian 7-8-02, Prostate.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Petricoin, E.F., Ardekani, A.M., Hitt, B.A., Levine, P.J., Fusaro, V.A., Steinberg, S.M., et al.: Use of proteomic patterns in serum to identify ovarian cancer. Lancet 359(9306), 572–577 (2002)

    Article  Google Scholar 

  2. Baggerly, K.A., Morris, J.S., Coombes, K.R.: Reproducibility of SELDI-TOF protein patterns in serum: comparing datasets from different experiments. Bioinformatics 20(5), 777–785 (2004)

    Article  Google Scholar 

  3. Jong, K., Marchiori, E., Sebag, M., van der Vaart, A.: Feature Selection in Proteomic Pattern Data with Support Vector Machines. In: CIBCB, pp. 41–48. IEEE, Los Alamitos (2004)

    Google Scholar 

  4. Levner, I.: Feature selection and nearest centroid classification for protein mass spectrometry. BMC Bioinformatics (2005)

    Google Scholar 

  5. Lilien, R.H., Farid, H., Donald, B.R.: Probabilistic disease classification of expression-dependent proteomic data from mass spectrometry of human serum. Computational Biology (2003)

    Google Scholar 

  6. Tibshirani, R., Hastiey, T., Narasimhanz, T., Soltys, S., Shi, G., Koong, A., Le, Q.: Sample classifcation from protein mass spectrometry by peak probability contrasts. BioInformatics (2004)

    Google Scholar 

  7. Wu, B., Abbott, T., Fishman, D., McMurray, W., Mor, G., Stone, K., Ward, D., Williams, K., Zhao, H.: Comparison of statistical methods for classifcation of ovarian cancer using mass spectrometry data. BioInformatics 19(13) (2003)

    Google Scholar 

  8. Qu, Y., Adam, B., Yasui, Y., Ward, M.D., Cazares, L.H., Schellhammer, P.F., Feng, Z., Semmes, O.J., Wright, G.L.: Boosted decision tree analysis of surface-enhanced laser desortion/ionization mass spectral serum profiles discriminates prostate cancer from noncancer patients. Clinical Chemistry 48(10), 1835–1843 (2002)

    Google Scholar 

  9. Guyon, I., Elissee, A.: An introduction to variable and feature selection. Machine Learning, Special Issue on variable and feature selection 3, 1157–1182 (2003)

    MATH  Google Scholar 

  10. Kononenko, I.: Estimating attributes: Analysis and extensions of relief. In: ECML, pp. 171–182 (1994)

    Google Scholar 

  11. Biesiada, J., Duch, W.: Feature Selection for High-Dimensional Data: A Kolmogorov-Smirnov Correlation-Based Filter Solution. Computer Recognition Systems. In: Kurzynski, M., Puchała, E., Wozniak, M., Zolnierek, A. (eds.) CORES 2005. Proc. of the 4th International Conference on Computer Recognition Systems. Advances in Soft Computing, pp. 95–104. Springer, Heidelberg (2005)

    Google Scholar 

  12. Plant, C., Osl, M., Tilg, B., Baumgartner, C.: Feature Selection on High Throughput SELDI-TOF Mass-Spectrometry Data for Identifying Biomarker Candidates in Ovarian and Prostate Cancer. In: Proceedings of the Sixth IEEE International Conference on Data Mining – Workshops, pp. 174–179 (2006)

    Google Scholar 

  13. Marchiori, E., Jimenez, C.R., West-Nielsen, M., Heegaard, N.H.H.: Robust SVM-Based Biomarker Selection with Noisy Mass Spectrometric Proteomic Data. In: EvoWorkshops, pp. 79–90 (2006)

    Google Scholar 

  14. Peterson, L.E., Hoogeveen, R.C., Pownall, H.J., Morrisett, J.D.: Classification Analysis of Surface-enhanced Laser Desorption/Ionization Mass Spectral Serum Profiles for Prostate Cancer. In: IJCNN 2006. International Joint Conference on Neural Networks, pp. 3828–3835 (2006)

    Google Scholar 

  15. Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C.: Introduction to Algorithms, Second Edition, Ch. 7, MIT Press and McGraw-Hill, Quicksort, pp. 145–164 (2001)

    Google Scholar 

  16. Cristianini, N., Shawe-Taylor, J.: Support Vector machines. Cambridge Press, Cambridge (2000)

    Book  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Vu, TN., Ohn, SY., Kim, CW. (2007). RISC: A New Filter Approach for Feature Selection from Proteomic Data. In: Zhang, D. (eds) Medical Biometrics. ICMB 2008. Lecture Notes in Computer Science, vol 4901. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-77413-6_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-77413-6_3

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-77410-5

  • Online ISBN: 978-3-540-77413-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics