Transductive Feature Selection Using Clustering-Based Sample Entropy for Temperature Prediction in Weather Forecasting
Abstract
:1. Introduction
2. Materials and Methods
2.1. Background
2.1.1. Entropy and Information Transfer
2.1.2. Lag-Specific Information Transfer
2.1.3. Soft Kernel Spectral Clustering
2.1.4. Least Squares Support Vector Machines
2.2. Transductive Feature Selection Using Clustering-Based Sample Entropy
- For each , a candidate set is created, and the conditional entropy is computed based on the clustering-based sample entropy.
- The component W that minimizes the conditional entropy is selected to be added to the selected set V.
- V and are updated as follows: and , and the termination condition is checked.
- Assuming is the concatenation of the selected set of features for all samples, the samples can be partitioned into separated groups based on the clustering information such that represents the selected features for the samples in the cluster c.
- In each cluster, for i ranging from 1–, and in k and dimensional space are calculated as follows:
- Similar to sample entropy, and in k and are defined to be equal to the average of and over all possible :
- Finally, the clustering-based sample entropy (CluSampEnt), which represents the conditional entropy, in k dimensional space is calculated as follows:
3. Results
3.1. Experiments on the Simulated Dataset
3.2. Weather Dataset
3.3. Weather Forecasting Experiments
4. Discussion
5. Conclusions
Acknowledgments
Author Contributions
Conflicts of Interest
References
- Anand, K.; Bianconi, G. Entropy measures for networks: Toward an information theory of complex topologies. Phys. Rev. E 2009, 80, 045102. [Google Scholar] [CrossRef] [PubMed]
- Sandoval, L. Structure of a global network of financial companies based on transfer entropy. Entropy 2014, 16, 4443–4482. [Google Scholar] [CrossRef]
- Richman, J.S.; Moorman, J.R. Physiological time series analysis using approximate entropy and sample entropy. Am. J. Physiol. Heart Circ. Physiol. 2000, 278, H2039–H2049. [Google Scholar] [CrossRef] [PubMed]
- Shuangcheng, L.; Qiaofu, Z.; Shaohong, W.; Erfu, D. Measurement of climate complexity using sample entropy. Int. J. Climatol. 2006, 26, 2131–2139. [Google Scholar] [CrossRef]
- Balasis, G.; Donner, R.V.; Potirakis, S.M.; Runge, J.; Papadimitriou, C.; Daglis, I.A.; Eftaxias, K.; Kurths, J. Statistical mechanics and information-theoretic perspectives on complexity in the Earth system. Entropy 2013, 15, 4844–4888. [Google Scholar] [CrossRef] [Green Version]
- Wang, Z.; Li, Y.; Childress, A.R.; Detre, J.A. Brain entropy mapping using fMRI. PLoS ONE 2014, 9, e89948. [Google Scholar] [CrossRef] [PubMed]
- Porta, A.; Baselli, G.; Lombardi, F.; Montano, N.; Malliani, A.; Cerutti, S. Conditional entropy approach for the evaluation of the coupling strength. Biol. Cybern. 1999, 81, 119–129. [Google Scholar] [CrossRef] [PubMed]
- Faes, L.; Marinazzo, D.; Montalto, A.; Nollo, G. Lag-specific transfer entropy as a tool to assess cardiovascular and cardiorespiratory information transfer. IEEE Trans. Biomed. Eng. 2014, 61, 2556–2568. [Google Scholar] [CrossRef] [PubMed]
- Bauer, P.; Thorpe, A.; Brunet, G. The quiet revolution of numerical weather prediction. Nature 2015, 525, 47–55. [Google Scholar] [CrossRef] [PubMed]
- Brunsdon, C.; Fotheringham, S.; Charlton, M. Geographically weighted regression. J. R. Stat. Soc. Ser. D 1998, 47, 431–443. [Google Scholar] [CrossRef]
- Bottou, L.; Vapnik, V. Local learning algorithms. Neural Computation 1992, 4, 888–900. [Google Scholar] [CrossRef]
- Karevan, Z.; Suykens, J.A.K. Clustering-based feature selection for black-box weather temperature prediction. In Proceedings of the 2016 International Joint Conference on Neural Networks, Vancouver, BC, Canada, 24–29 July 2016. [Google Scholar]
- Karevan, Z.; Feng, Y.; Suykens, J.A.K. Moving Least Squares Support Vector Machines for weather temperature prediction. In Proceedings of the European Symposium on Artificial Neural Networks, Brugge, Belgium, 27–29 April 2016; pp. 611–616. [Google Scholar]
- Hmamouche, Y.; Casali, A.; Lakhal, L. Causality based feature selection approach for multivariate time series forecasting. In Proceedings of the International Conference on Advances in Databases, Knowledge, and Data Applications, Barcelona, Spain, 21–25 May 2017. [Google Scholar]
- Van Dijck, G.; Van Hulle, M.M. Speeding up the wrapper feature subset selection in regression by mutual information relevance and redundancy analysis. In Proceedings of the International Conference on Artificial Neural Networks, Athens, Greece, 10–14 September 2006; pp. 31–40. [Google Scholar]
- Ramırez-Gallego, S.; Mourino-Talın, H.; Martınez-Rego, D.; Bolón-Canedo, V.; Benıtez, J.M.; Alonso-Betanzos, A.; Herrera, F. An Information Theory-Based Feature Selection Framework for Big Data under Apache Spark. IEEE Trans. Syst. Man Cybern. Syst. 2017. [Google Scholar] [CrossRef]
- Wang, Y.; Wang, J.; Liao, H.; Chen, H. An efficient semi-supervised representatives feature selection algorithm based on information theory. Pattern Recognit. 2017, 61, 511–523. [Google Scholar] [CrossRef]
- Marinazzo, D.; Pellicoro, M.; Stramaglia, S. Causal information approach to partial conditioning in multivariate data sets. Comput. Math. Methods Med. 2012, 2012, 303601. [Google Scholar] [CrossRef] [PubMed]
- Wang, H.; Wang, G.; Zeng, X.; Peng, S. Online Streaming Feature Selection Based on Conditional Information Entropy. In Proceedings of the 2017 IEEE International Conference on Big Knowledge (ICBK), Hefei, China, 9–10 August 2017; pp. 230–235. [Google Scholar]
- Weather Underground. Available online: www.wunderground.com (accessed on 5 April 2018).
- Shannon, C.E. A mathematical theory of communication. ACM Sigmob. Mob. Comput. Commun. Rev. 2001, 5, 3–55. [Google Scholar] [CrossRef]
- Cover, T.M.; Thomas, J.A. Elements of Information Theory; John Wiley & Sons: New York, NY, USA, 2012. [Google Scholar]
- Xiong, W.; Faes, L.; Ivanov, P.C. Entropy measures, entropy estimators, and their performance in quantifying complex dynamics: Effects of artifacts, nonstationarity, and long-range correlations. Phys. Rev. E 2017, 95, 062114. [Google Scholar] [CrossRef] [PubMed]
- Kolmogorov, A.N. Entropy per unit time as a metric invariant of automorphisms. Dokl. Akad. Nauk SSSR 1959, 124, 754–755. [Google Scholar]
- Sinai, Y.G. On the Notion of entropy of a dynamical system. Dokl. Akad. Nauk SSSR 1959, 124, 768–771. [Google Scholar]
- Keller, K.; Unakafov, A.M.; Unakafova, V.A. Ordinal patterns, entropy, and EEG. Entropy 2014, 16, 6212–6239. [Google Scholar] [CrossRef]
- Ebeling, W. Entropy, information and predictability of evolutionary systems. World Futures J. Gen. Evol. 1997, 50, 467–481. [Google Scholar] [CrossRef]
- Parzen, E. On estimation of a probability density function and mode. Ann. Math. Stat. 1962, 33, 1065–1076. [Google Scholar] [CrossRef]
- Runge, J.; Heitzig, J.; Petoukhov, V.; Kurths, J. Escaping the curse of dimensionality in estimating multivariate transfer entropy. Phys. Rev. Lett. 2012, 108, 258701. [Google Scholar] [CrossRef] [PubMed]
- Granger, C.W. Investigating causal relations by econometric models and cross-spectral methods. Econom. J. Econom. Soc. 1969, 37, 424–438. [Google Scholar] [CrossRef]
- Amblard, P.O.; Michel, O.J. The relation between Granger causality and directed information theory: A review. Entropy 2012, 15, 113–143. [Google Scholar] [CrossRef]
- Faes, L.; Nollo, G.; Porta, A. Information-based detection of nonlinear Granger causality in multivariate processes via a nonuniform embedding technique. Phys. Rev. E 2011, 83, 051112. [Google Scholar] [CrossRef] [PubMed]
- Langone, R.; Mall, R.; Suykens, J.A.K. Soft Kernel Spectral clustering. In Proceedings of the International Joint Conference on Neural Networks, Dallas, TX, USA, 4–9 August 2013; pp. 1–8. [Google Scholar]
- Alzate, C.; Suykens, J.A.K. Multiway spectral clustering with out-of-sample extensions through weighted kernel PCA. IEEE Trans. Pattern Anal. Mach. Intell. 2010, 32, 335–347. [Google Scholar] [CrossRef] [PubMed]
- Mercer, J. Functions of positive and negative type, and their connection with the theory of integral equations. Philos. Trans. R. Soc. Lond. Ser. A 1909, 209, 415–446. [Google Scholar] [CrossRef]
- Suykens, J.A.K.; Vandewalle, J. Least Squares Support Vector Machine Classifiers. Neural Process. Lett. 1999, 9, 293–300. [Google Scholar] [CrossRef]
- Suykens, J.A.K.; Van Gestel, T.; De Brabanter, J.; De Moor, B.; Vandewalle, J. Least Squares Support Vector Machines; World Scientific: Singapore, 2002. [Google Scholar]
- Leontaritis, I.; Billings, S.A. Input-output parametric models for non-linear systems part I: Deterministic non-linear systems. Int. J. Control 1985, 41, 303–328. [Google Scholar] [CrossRef]
- De Brabanter, K.; Karsmakers, P.; Ojeda, F.; Alzate, C.; De Brabanter, J.; Pelckmans, K.; De Moor, B.; Vandewalle, J.; Suykens, J.A.K. LS-SVMlab Toolbox User’s Guide: Version 1.8. 2011. LS-SVMlab. Available online: https://www.esat.kuleuven.be/sista/lssvmlab/ (accessed on 10 April 2018).
- Tibshirani, R. Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B (Methodol.) 1996, 73, 267–288. [Google Scholar]
- De Brabanter, K.; De Brabanter, J.; Suykens, J.A.K.; De Moor, B. Approximate confidence and prediction intervals for least squares support vector regression. IEEE Trans. Neural Netw. 2011, 22, 110–120. [Google Scholar] [CrossRef] [PubMed]
Method | Linear System | Nonlinear System | ||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Global-FS | 0 | 10 | 0 | 0 | 10 | 0 | 0 | 0 | 0 | 0 | 10 | 0 | 0 | 10 | 0 | 0 | 0 | 0 | 0 | 0 |
Transductive-FS | 0 | 10 | 0 | 0 | 10 | 0 | 0 | 0 | 0 | 0 | 10 | 0 | 0 | 10 | 0 | 0 | 0 | 0 | 0 | 0 |
ARD [37] | 0 | 10 | 0 | 0 | 10 | 0 | 0 | 0 | 0 | 0 | 10 | 0 | 0 | 10 | 0 | 0 | 0 | 0 | 0 | 0 |
MI-based [18] | 0 | 10 | 0 | 0 | 10 | 0 | 0 | 0 | 0 | 0 | 10 | 4 | 1 | 2 | 1 | 0 | 0 | 2 | 0 | 0 |
LASSO [40] | 0 | 10 | 0 | 0 | 10 | 0 | 0 | 0 | 0 | 0 | 10 | 3 | 0 | 1 | 2 | 0 | 0 | 6 | 0 | 0 |
Method | Linear System | Nonlinear System | ||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Global-FS | 0 | 9 | 0 | 2 | 7 | 0 | 1 | 1 | 0 | 0 | 8 | 1 | 0 | 9 | 2 | 0 | 0 | 0 | 0 | 0 |
Transductive-FS | 0 | 9 | 0 | 0 | 10 | 1 | 0 | 0 | 0 | 0 | 0 | 10 | 0 | 0 | 9 | 0 | 0 | 1 | 0 | 0 |
ARD [37] | 0 | 0 | 0 | 10 | 0 | 0 | 10 | 0 | 0 | 0 | 1 | 0 | 0 | 9 | 10 | 0 | 0 | 0 | 0 | 0 |
MI-based [18] | 0 | 5 | 0 | 7 | 3 | 0 | 5 | 0 | 0 | 0 | 1 | 0 | 0 | 9 | 0 | 7 | 0 | 1 | 0 | 2 |
LASSO [40] | 0 | 10 | 0 | 10 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 10 | 6 | 1 | 0 | 0 | 0 | 3 |
Method | Linear System | Nonlinear System | ||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Global-FS | 0 | 9 | 0 | 2 | 7 | 0 | 1 | 1 | 0 | 0 | 8 | 1 | 0 | 9 | 2 | 0 | 0 | 0 | 0 | 0 |
Transductive-FS | 0 | 0 | 0 | 10 | 0 | 0 | 10 | 0 | 0 | 0 | 10 | 0 | 0 | 10 | 0 | 0 | 0 | 0 | 0 | 0 |
ARD [37] | 0 | 0 | 0 | 10 | 0 | 0 | 10 | 0 | 0 | 0 | 1 | 0 | 0 | 9 | 10 | 0 | 0 | 0 | 0 | 0 |
MI-based [18] | 0 | 5 | 0 | 7 | 3 | 0 | 5 | 0 | 0 | 0 | 1 | 0 | 0 | 9 | 0 | 7 | 0 | 1 | 0 | 2 |
LASSO [40] | 0 | 10 | 0 | 10 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 10 | 6 | 1 | 0 | 0 | 0 | 3 |
Step Ahead | Temp. | r = 0.7 | r = 1 | r = 1.6 | |||
---|---|---|---|---|---|---|---|
Global-FS | Transductive-FS | Global-FS | Transductive-FS | Global-FS | Transductive-FS | ||
1 | Min | 1.48 ± 0.001 | 1.54 ± 0.001 | 1.52 ± 0.001 | 1.45 ± 0.004 | 1.68 ± 0.001 | 1.50 ± 0.001 |
Max | 1.76 ± 0.001 | 1.73 ± 0.003 | 1.42 ± 0.001 | 1.47 ± 0.003 | 1.45 ± 0.003 | 1.39 ± 0.001 | |
2 | Min | 2.15 ± 0.0001 | 1.95 ± 0.004 | 1.98 ± 0.001 | 1.77 ± 0.01 | 1.76 ± 0.001 | 1.89 ± 0.001 |
Max | 2.13 ± 0.001 | 1.72 ± 0.002 | 1.88 ± 0.003 | 1.80 ± 0.001 | 1.73 ± 0.003 | 1.49 ± 0.02 | |
3 | Min | 2.07 ± 0.005 | 2.00 ± 0.003 | 1.90 ± 0.001 | 1.98 ± 0.01 | 2.16 ± 0.001 | 2.33 ± 0.004 |
Max | 1.77 ± 0.002 | 1.88 ± 0.03 | 2.13 ± 0.001 | 2.33 ± 0.2 | 2.14 ± 0.001 | 1.90 ± 0.003 | |
4 | Min | 1.59 ± 0.003 | 1.80 ± 0.002 | 2.21 ± 0.001 | 2.05 ± 0.01 | 2.22 ± 0.001 | 1.96 ± 0.02 |
Max | 2.37 ± 0.001 | 2.25 ± 0.001 | 2.18 ± 0.003 | 2.15 ± 0.001 | 1.54 ± 0.002 | 2.06 ± 0.001 | |
5 | Min | 2.37 ± 0.001 | 2.21 ± 0.001 | 2.20 ± 0.001 | 2.25 ± 0.001 | 2.46 ± 0.001 | 2.29 ± 0.004 |
Max | 2.19 ± 0.001 | 1.94 ± 0.01 | 1.92 ± 0.001 | 2.29 ± 0.2 | 1.79 ± 0.001 | 1.89 ± 0.05 | |
6 | Min | 2.40 ± 0.006 | 2.31 ± 0.005 | 1.66 ± 0.001 | 2.19 ± 0.02 | 2.17 ± 0.001 | 2.30 ± 0.1 |
Max | 1.95 ± 0.001 | 1.93 ± 0.002 | 2.42 ± 0.001 | 1.82 ± 0.005 | 2.36 ± 0.004 | 1.71 ± 0.01 |
Step Ahead | Temp. | r= 0.7 | r = 1 | r = 1.6 | |||
---|---|---|---|---|---|---|---|
Global-FS | Transductive-FS | Global-FS | Transductive-FS | Global-FS | Transductive-FS | ||
1 | Min | 1.65 ± 0.001 | 1.59 ± 0.001 | 1.74 ± 0.001 | 1.46 ± 0.001 | 1.63 ± 0.001 | 1.53 ± 0.001 |
Max | 2.09 ± 0.001 | 2.04 ± 0.001 | 2.23 ± 0.001 | 2.23 ± 0.001 | 2.31 ± 0.001 | 2.18 ± 0.003 | |
2 | Min | 2.01 ± 0.001 | 2.20 ± 0.002 | 2.09 ± 0.001 | 1.98 ± 0.002 | 2.06 ± 0.001 | 1.98 ± 0.002 |
Max | 2.31 ± 0.001 | 2.18 ± 0.005 | 2.09 ± 0.001 | 2.29 ± 0.002 | 2.12 ± 0.001 | 2.25 ± 0.001 | |
3 | Min | 2.11 ± 0.001 | 2.29 ± 0.004 | 2.27 ± 0.001 | 2.03 ± 0.002 | 2.12 ± 0.001 | 2.12 ± 0.01 |
Max | 2.52 ± 0.001 | 2.48 ± 0.004 | 2.83 ± 0.002 | 2.56 ± 0.001 | 2.47 ± 0.001 | 2.40 ± 0.002 | |
4 | Min | 3.01 ± 0.001 | 2.69 ± 0.001 | 2.59 ± 0.004 | 2.63 ± 0.001 | 2.01 ± 0.001 | 2.25 ± 0.003 |
Max | 2.39 ± 0.004 | 2.10 ± 0.001 | 2.32 ± 0.004 | 2.42 ± 0.03 | 2.49 ± 0.001 | 2.28 ± 0.003 | |
5 | Min | 2.90 ± 0.001 | 2.98 ± 0.002 | 2.50 ± 0.001 | 2.40 ± 0.002 | 2.87 ± 0.002 | 2.80 ± 0.001 |
Max | 2.56 ± 0.004 | 2.39 ± 0.001 | 2.62 ± 0.005 | 2.54 ± 0.005 | 2.27 ± 0.001 | 2.37 ± 0.001 | |
6 | Min | 2.74 ± 0.003 | 2.59 ± 0.001 | 2.66 ± 0.001 | 2.70 ± 0.004 | 2.80 ± 0.001 | 2.57 ± 0.001 |
Max | 2.25 ± 0.02 | 2.35 ± 0.001 | 1.96 ± 0.002 | 2.64 ± 0.008 | 2.26 ± 0.005 | 1.91 ± 0.002 |
© 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Karevan, Z.; Suykens, J.A.K. Transductive Feature Selection Using Clustering-Based Sample Entropy for Temperature Prediction in Weather Forecasting. Entropy 2018, 20, 264. https://doi.org/10.3390/e20040264
Karevan Z, Suykens JAK. Transductive Feature Selection Using Clustering-Based Sample Entropy for Temperature Prediction in Weather Forecasting. Entropy. 2018; 20(4):264. https://doi.org/10.3390/e20040264
Chicago/Turabian StyleKarevan, Zahra, and Johan A. K. Suykens. 2018. "Transductive Feature Selection Using Clustering-Based Sample Entropy for Temperature Prediction in Weather Forecasting" Entropy 20, no. 4: 264. https://doi.org/10.3390/e20040264
APA StyleKarevan, Z., & Suykens, J. A. K. (2018). Transductive Feature Selection Using Clustering-Based Sample Entropy for Temperature Prediction in Weather Forecasting. Entropy, 20(4), 264. https://doi.org/10.3390/e20040264