The observed features of a given phenomenon are not all equally informative: some may be noisy, others correlated or irrelevant. The purpose of feature selection is to select a set of features pertinent to a given task. This is a complex process, but it is an important issue in many fields. In neural networks, feature selection has been studied for the last ten years, using conventional and original methods. This paper is a review of neural network approaches to feature selection. We first briefly introduce baseline statistical methods used in regression and classification. We then describe families of methods which have been developed specifically for neural networks. Representative methods are then compared on different test problems.
Article PDF
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
Change history
13 February 2021
A Correction to this paper has been published: https://doi.org/10.1007/s41237-020-00127-3
Akaike, H. (1970). Statistical Predictor Identification, Ann. Inst. Statist. Math., 22, 203–217.
Battiti, R. (1994). Using Mutual Information for Selecting Features in Supervised Neural Net Learning, IEEE Transactions on Neural Networks, 5(4), 537–550.
Baxt, W.G. & White, H. (1995). Bootstrapping confidence intervals for clinical input variable effects in a network trained to identify the presence of acute myocardial infraction, Neural Computation, 7, 624–638.
Bonnlander, B.V. & Weigend, A.S. (1994). Selecting Input Variables Using Mutual Information and Nonparametric Density Evaluation, Proceedings of ISANN’94, 42–50.
Breiman, L., Friedman, J., Olshen, R. & Stone, C. (1984). Classification and Regression Trees. Wadsworth International Group.
Cibas, T., Fogelman Soulié, F., Gallinari, P. & Raudys, S. (1994a). Variable Selection with Optimal Cell Damage. Proceedings of ICANN’94.
Cibas, T., Fogelman Soulié, F., Gallinari, P. & Raudys, S. (1996). Variable Selection with Neural Networks. Neurocomputing, 12, 223–248.
Czernichow, T. (1996). Architecture Selection through Statistical Sensitivity Analysis. Procceings of ICANN’96, Bochum, Germany.
Dorizzi, B., Pellieux, G., Jacquet, F., Czernichow, T. & Munoz, A. (1996). Variable Selection Using Generalized RBF Networks: Application to the Forecast of the French T-Bonds. Proceedings of IEEE-IMACS’96, Lille, France.
Fraser, A.M. & Swinney, H.L. (1986). Independent Coordinates for Strange Attractors from Mutual Information, Physical Review A, 33(2), 1134–1140.
Goutte, C. (1997). Extracting the Relevant Decays in Time Series Modelling, Neural Networks for Signal Processing VII, Proceedings of the IEEE Workshop.
Gustafson & Hajlmarsso. (1995). 21 maximum likelihood estimators for model selection. Automatica.
Habbema, J.D.F & Hermans, J. (1977). Selection of Variables in Discriminant Analysis by F -statistic and Error Rate, Technometrics, 19(4), 487–493.
Härdie, W. (1990). Applied Nonparametric Regression. Cambridge University Press. Econometric Society Monograph n. 19.
Hashem, S. (1992). Sensitivity Adalysis for Feedforward Artificial Neural Networks with Differentiable Activation Functions. Proceedings 1992 International Joint Conference on Neural Networks IJCNN92 I, 419–424.
Hassibi, B. & Stork, D.G. (1993). Second Order Derivatives for Network Pruning: Optimal Brain Surgeon Neural Information Processing Systems, 5, 164–171.
Kittler. (1986). Feature Selection are Extraction, Chaptre 3 in Handbook of Pattern Recogntion and Image Processing, Eds. Tzay Y. Young, King-Sun Fu, Academic Press. 59–83.
Larsen, J. & Hansen, L.K. (1994). Generalized perfomance of regularized neural networks models. Proceedings of the 1994 IEEE Workshop on Neural Networks for Signal Processing. 42–51.
LeCun, Y., Denker, J.S. & Solla, S.A. (1990). Optimal Brain Damage. Neural Information Processing Systems, 2, 598–605.
MacKay, D.J.C. (1994). Bayesian Non-linear Modelling for the Energy Prediction Competition. ASHRAE Transactions. 1053–1062.
Mao, J., Mohiuddin, K. & Jain, A.K. (1994). Parsimonious Network Design and Feature Selection Through Node Pruning. Proceedings of the 12th International Conference on Pattern Recognition. 622–624.
McLachlan, G.J. (1992). Discriminant Analysis and Statistical Pattern Recognition, Wiley-Interscience publication.
Moody, J. (1991). Note on generalization, regularization and architecture selection in nonlinear learning systems. Proceedings of the first IEEE Workshop on Neural Networks for Signal Processing. 1–10.
Moody, J. & Utans, J. (1992). Principled Architecture Selection for Neural Networks: Application to Corporate Bond Rating Prediction. Neural Information Processing Systems, 4.
Moody, J. (1994). Prediction Risk and Architecture Selection for Neural Networks, in From Statistics to Neural Networks—Theory and Pattern Recognition Applications, Eds V. Cherkass-ky, J.H. Friedman, H. Wechsler, Springer-Verlag.
Narendra, P.M. & Fukunaga, K. (1977). A Branch and Bound Algorithm for Feature Subset Selection. IEEE Transactions on Computers, 26(9), 917–922.
Pedersen, M.W., Hansen, L.K. & Larsen, J. (1996). Pruning with generalisation based weight salien-cies: γOBD, γOBS. Neural Information Processing Systems, 8.
Priddy, K.L., Rogers, S.K., Ruck, D.W., Tarr, G.L. & Kabrisky, M. (1993). Bayesian Selection of Important Features for Eeedforward Neural Networks. Neurocomputing, 5, 91–103. Elsevier ed.
Pudil, P., Novovicova, J. & Kittler, J. (1994). Floating search methods in feature selection. Pattern Recognition Letters, 15 1119–1125.
Refenes, A.N., Zapranis, A. & Utans, J. (1996). Neural Model Identification, Variable Selection and Model Adequacy. Neural Networks in Financial Engineering, Proceedings of NnCM-96.
Rossi, F. (1996). Attribute Suppression with Multi-Layer Perceptron. Proceedings of IEEE-IMACS’96, Lille, France.
Ruck, D.W., Rogers, S.K. & Kabrispy, M. (1990). Feature Selection Using a MultiLayer Perceptron. J. Neural Network Comput., 2(2), 40–48.
Stahlberger, A. & Riedmiller, M. (1997). Fast Network Pruning and Feature Extraction Using the Unit-OBS Algorithm. Neural Information Processing Systems, 9, 655–661.
Thompson, MX. (1978). Selection of Variables in Multiple Regression. Part I: A Review and Evaluation, International Statistical Review, 46, 1–19.
Thompson, MX. (1978). Selection of Variables in Multiple Regression. Part II: Chosen Procedures, Computations and Examples, International Statistical Review, 46, 129–146.
Tresp, V., Neuneier, R. & Zimmermann, G. (1997). Early Brain Damage. Neural Information Processing Systems, 9, 669–675.
Van de Laar, P., Gielen, S. & Heskes, T. (1997). Input Selection with Partial Retraining. Proceedings of ICANN’97.
White, H. (1989). Learning in Artificial Neural Networks: A Statistical Perspective. Neural Computation, 1, 425–464.
Wilks, S.S. (1963). Mathematical Statistics, Wiley, New York.
Yacoub, M. & Bennani, Y. (1997). HVS: A Heuristic for Variable Selection in Multilayer Artificial Neural Network Classifier. Proceedings of ANNIE’97. 527–532.
Author information
Authors and Affiliations
Corresponding author
Additional information
The original online version of this article was revised due to the retrospective open access order.
Rights and permissions
This article is published under an open access license. Please check the 'Copyright Information' section either on this page or in the PDF for details of this license and what re-use is permitted. If your intended use exceeds what is permitted by the license or if you are unable to locate the licence and re-use information, please contact the Rights and Permissions team.
About this article
Cite this article
Leray, P., Gallinari, P. Feature Selection With Neural Networks. Behaviormetrika 26, 145–166 (1999). https://doi.org/10.2333/bhmk.26.145
Issue Date:
DOI: https://doi.org/10.2333/bhmk.26.145