Skip to main content

On Feature Extraction for Spam E-Mail Detection

  • Conference paper
Multimedia Content Representation, Classification and Security (MRCS 2006)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4105))

  • 1720 Accesses


Electronic mail is an important communication method for most computer users. Spam e-mails however consume bandwidth resource, fill-up server storage and are also a waste of time to tackle.The general way to label an e-mail as spam or non-spam is to set up a finite set of discriminative features and use a classifier for the detection. In most cases, the selection of such features is empirically verified. In this paper, two different methods are proposed to select the most discriminative features among a set of reasonably arbitrary features for spam e-mail detection. The selection methods are developed using the Common Vector Approach (CVA) which is actually a subspace-based pattern classifier.Experimental results indicate that the proposed feature selection methods give considerable reduction on the number of features without affecting recognition rates.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others


  1. Qiu, X., Jihong, H., Ming, C.: Flow-Based Anti-Spam. In: Proceedings IEEE Workshop on IP Operations and Management, pp. 99–103 (2004)

    Google Scholar 

  2. Pelletier, L., Almhana, J., Choulakian, V.: Adaptive Filtering of SPAM. In: Proceedings of Second Annual Conference on Communication Networks and Services Research, pp. 218–224 (2004)

    Google Scholar 

  3. Sahami, M., Dumais, S., Heckerman, D., Horvitz, E.: A Bayesian Approach to Filtering Junk E-Mail. In: Proc. of AAAI 1998, Workshop on Learning for Text Categorization, Madison, WI (1998)

    Google Scholar 

  4. Michelakis, E., Androutsopoulos, I., Paliouras, G., Sakkis, G., Stamatopoulos, P.: Filtron: A Learning-Based Anti-Spam Filter. In: Proc. of the 1st Conf. on E-mail and Anti-Spam (CEAS 2004), Mountain View, CA (2004)

    Google Scholar 

  5. Drucker, H.D., Wu, D., Vapnik, V.: Support Vector Machines for Spam Categorization. IEEE Transactions on Neural Networks 10(5), 1048–1054 (1999)

    Article  Google Scholar 

  6. Agrawal, B., Kumar, N., Molle, M.: Controlling Spam Emails at the Routers. In: IEEE International Conference on Communications, vol. 3, pp. 1588–1592 (2005)

    Google Scholar 

  7. Ching-Tung, W., Cheng, K.-T., Zhu, Q., Wu, Y.-L.: Using Visual Features for Anti-Spam Filtering. In: Proceedings of IEEE International Conference on Image Processing (ICIP 2005), vol. 3, pp. 509–512 (2005)

    Google Scholar 

  8. Lai, C.-C., Tsai, M.-C.: An Empirical Performance Comparison of Machine Learning Methods for Spam E-mail Categorization. In: Proceedings of Fourth International Conference on Hybrid Intelligent Systems, HIS 2004, pp. 44–48 (2004)

    Google Scholar 

  9. Wang, X.-L., Cloete, I.: Learning to Classify Email: A Survey. In: Proceedings of 2005 International Conference on Machine Learning and Cybernetics, vol. 9, pp. 5716–5719 (2005)

    Google Scholar 

  10. Gülmezoğlu, M.B., Dzhafarov, V., Keskin, M., Barkana, A.: A novel approach to isolated word recognition. IEEE Trans. on Speech and Audio Processing 7, 620–628 (1999)

    Article  Google Scholar 

  11. Gülmezoğlu, M.B., Dzhafarov, V., Barkana, A.: The common vector approach and its relation to the principal component analysis. IEEE Trans. on Speech and Audio Processing 9, 655–662 (2001)

    Article  Google Scholar 

  12. Çevikalp, H., Neamtu, M., Wilkes, M., Barkana, A.: Discriminative common vectors for face recognition. IEEE Trans. on Pattern Analysis and Machine Intelligence 27, 1–10 (2005)

    Article  Google Scholar 

  13. Gülmezoğlu, M.B., Dzhafarov, V., Barkana, A.: Comparison of the Common Vector Approach with the other subspace methods when there are sufficient data in the training set. In: Proc. of 8th National Conf. on Signal Processing and Applications, Belek, Turkey, pp. 13–18 (June 2000)

    Google Scholar 

  14. Oja, E.: Subspace methods of pattern recognition. John Wiley and Sons, New York (1983)

    Google Scholar 

  15. Swets, D.L., Weng, J.: Using discriminant eigenfeatures for image retrieval. IEEE Trans. on Pattern Analysis and Machine Intelligence 18, 831–836 (1996)

    Article  Google Scholar 

  16. Vaughan-Nichols, S.J.: Saving Private E-mail. IEEE Spectrum Magazine, 40–44 (August 2003)

    Google Scholar 

  17. Günal, S., Ergin, S., Gerek, Ö.N.: Spam E-mail Recognition by Subspace Analysis. In: INISTA – International Symposium on Innovations in Intelligent Systems and Applications, pp. 307–310 (2005)

    Google Scholar 

  18. Katakis, I., Tsoumakas, G., Vlahavas, I.: On the Utility of Incremental Feature Selection for the Classification of Textual Data Streams. In: Bozanis, P., Houstis, E.N. (eds.) PCI 2005. LNCS, vol. 3746, pp. 338–348. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations


Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Günal, S., Ergin, S., Gülmezoğlu, M.B., Gerek, Ö.N. (2006). On Feature Extraction for Spam E-Mail Detection. In: Gunsel, B., Jain, A.K., Tekalp, A.M., Sankur, B. (eds) Multimedia Content Representation, Classification and Security. MRCS 2006. Lecture Notes in Computer Science, vol 4105. Springer, Berlin, Heidelberg.

Download citation

  • DOI:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-39392-4

  • Online ISBN: 978-3-540-39393-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics