Utilizing Macroeconomic Factors For Sector Rotation Based On Interpretable Machine Learning and Explainable AI
Utilizing Macroeconomic Factors For Sector Rotation Based On Interpretable Machine Learning and Explainable AI
Abstract—This paper focuses on the application of explainable Chen et al. [2] tested seven macroeconomic factors such as
AI in finance, introducing the use of machine learning models industrial production and inflation, and found that inflation, in-
such as multiple linear regression, ridge regression, and random dustrial production, changes in the risk premium and twists in
forest. We also compare their effects through empirical analysis
on Chinese stock market. In addition, we propose three methods, the yield curve have a significant impact on the stock market.
which are feature selection, discretization of returns, and signal Adam and Tweneboha [3] analyzed the short-term and long-
timing strategy, to improve the utility of our model. The empirical term effects of four macroeconomic factors on stock market
results show that our models can effectively select industries that indexes based on Ghanaian data, and found that inflation and
will perform well in the future, further proving the importance exchange rate have important effects on stock prices in the
and application feasibility of explainable AI in the financial field.
short term, while in the long run, interest rates and inflation
Keywords—explainable AI, random forest, feature selection, have more significant effects. Singh et al. [4] tested the
macroeconomic factors, crowded market indicator relationship between index returns and macroeconomic factors
through Taiwan data. The results showed that exchange rate
I. I NTRODUCTION and GDP have a significant impact on the overall economic
Aristotle said: “Knowing yourself is the beginning of all situation, while inflation, exchange rate and money supply
wisdom”. In recent years, big data artificial intelligence repre- have an impact on large and medium-sized companies.
sented by deep learning has developed rapidly, and machines
B. Constructing Macroeconomic Factors
have gradually surpassed humans in perception capabilities
such as image and speech recognition. However, the learning Based on the macroeconomic factors in the references above
and prediction of machine learning is often “black box” and [2] [3] [4], we construct five types of factors. In order to
lacks interpretability, which greatly reduces the credibility of increase the frequency of modeling, we unify the data to
the prediction results. Therefore, Machine Learning Explain- monthly, and each type of factor is synthesized from some
able Artificial Intelligence (XAI) is developed to make the AI indicators.
learning process transparent so that the results and process can • Growth factor : GDP can well reflect the growth of the
be better interpretable. market economy, but the frequency of publication is low.
This paper applies part of the explainable AI models to Therefore, we use Project Management Institute(PMI),
financial data, constructs an industry rotation strategy based the growth rate of infrastructure investment and the
on macroeconomic data, focuses on the interpretability of the growth rate of total industrial profits to reflect the growth
model and the interpretation of the results, and proves the rate of GDP. After unified data preprocessing and after
effectiveness of the explainable AI model in the financial field. removing the seasonal trend, we use the reciprocal of
volatility as a weight for weighted synthesis.
II. BACKGROUND
• Inflation factor : Reflecting the inflation of life and
A. The Impact of Macroeconomic Factors on the Stock Market production through the oil price, pork price and thread
It is well known that macroeconomic data has an impact on price index, not only can reflect the CPI, but also make
the stock market. In 2004, Merrill Lynch Wealth Management the extra lag of the data unnecessary.
released the Merrill Lynch investment clock model [1]. Based • Rate factor : This factor focuses on bond market in-
on economic growth factor(Gross Domestic Product, GDP) terest rates. Through the yield to maturity of the one-
and inflation factor (Consumer Price Index, CPI), the busi- year treasury bond and the yield to maturity of the
ness cycle was divided into four stages: recession, recovery, 10-year treasury bond, it reflects the short-term market
overheating and stagflation period. structure and the long-term market structure, respectively,
1st period lag 1st + 6th periods lag 1st + 6th + 12th periods lag
Annualized rate of return Sharpe ratio Annualized rate of return Sharpe ratio Annualized rate of return Sharpe ratio
MLR 9.4% 0.330 7.9% 0.264 10.8% 0.358
RR 7.2% 0.250 14.0% 0.467 12.2% 0.399
RF 10.8% 0.364 13.0% 0.439 10.3% 0.358
Benchmark 8.8% 0.321 8.8% 0.321 8.8% 0.321
* MLR represents multiple linear regression model ; RR represents ridge regression model ; RF represents random forest regression model.
** Benchmark is the industry equal weight index.
***Annualized rate of return refers to the rate of return obtained by converting investment income into one year. Sharpe ratio measures the ratio of benefits
to risks. A higher annualized rate of return or a higher sharpe ratio indicates a better portfolio performance.
probability that the return rate is positive, that is, the number of
decision trees with positive results divided by the total number
of decision trees.
Breiman [7] found that random forests perform better than
other classifiers in classification, such as discriminant analysis,
support vector machines, etc. Couronné et al. [13]compared
the random forest model with default parameters and the
logistic regression model based on multiple real data sets.
In about 69% of the data sets, random forest classification
performed better than logistic regression.
We consider the improved method of modeling after dis-
cretizing the rate of return. First, we try to construct a portfolio
of five industries with the highest probability of positive
returns. From the backtest results in Table IV, the lagging
Fig. 1. Trend of net value of our portfolio and benchmark
influence of macroeconomic factors is very obvious, and the
prediction ability of the model with the 6th period lag and the
12th period lag has been greatly improved.
Secondly, we set up a threshold and try to select industries
with a positive rate of return that are above the threshold to
construct a portfolio. If none of them exceeds the threshold,
the current time point adopts industry equivalence. We choose
the first period lag of macroeconomic factors as the feature
combination, and choose the random forest classification mod-
el and a threshold of 0.5. Fig. 1 is the trend of net value of
our portfolio and benchmark. Fig. 2 is the relative strength of
our portfolio and benchmark. We can see that the performance
of this portfolio is relatively good. However, changes to the
combination of factors or slight adjustments to the threshold
Fig. 2. Relative strength of our portfolio and benchmark
TABLE IV
BACKTEST RESULTS OF CLASSIFICATION MODELS will make the results unsatisfactory. This shows that if we only
want to predict the sign of the rate of return, we can find the
Annualized rate of return Sharpe ratio
Benchmark 8.8% 0.321 effective situation by adjusting the parameters, but in this way,
1st 7.1% 0.245 the parameter setting is not subjective and the stability of the
LR 1st+6th 12.3% 0.415 strategy is not enough.
1st+6th+12th 14.0% 0.487
1st 8.7% 0.300
RF 1st+6th 14.2% 0.495 C. Signal Timing Optimization in Pursuit of Revenue
1st+6th+12th 11.9% 0.413
*
This section analyzes whether there is overheating of trading
LR represents logistic regression model; RF represents random forest
classification model.
on the cross-section, so as to optimize the model strategy and
** Benchmark is the industry equal weight index. increase the interpretability of the strategy. When the industry
*** 1st means factors (1st period lag); 1st+6th means factors (1st period
index has risen in the past period of time and the crowdedness
lag + 6th period lag), and 1st+6th+12th means factors (1st period lag
+ 6th period lag + 12th period lag).
index exceeds the historical quantile threshold, the industry is
considered to be crowded and should not be selected into the
portfolio, and those that are already in the portfolio should be TABLE VI
cleared. BACKTEST RESULTS AFTER CONGESTION OPTIMIZATION
Yang and Zhou [14] explained excess returns by construct- Annualized rate of return Sharpe ratio
ing investor sentiment indicators and stock crowding indica- Benchmark 10.8% 0.358
tors. The empirical results support the view that crowding and monthly 10.6% 0.355
CTC
daily 11.3% 0.383
investor sentiment significantly affect stock prices. Kinlaw et monthly 10.8% 0.357
CVC
al. [15] used congestion and valuation to classify the state daily 10.7% 0.359
of the sector. The empirical results show that congestion can monthly 11.7% 0.399
Kurtosis
daily 10.5% 0.387
effectively avoid the bubble period. monthly 11.4% 0.391
Our method of constructing the congestion index is to select CF
daily 11.6% 0.436
multiple indicators from six aspects: momentum, liquidity, * CTC is the correlation coefficient between turnover rate and closing price;
deviation rate, volume-price correlation coefficient, volatility CVC is the correlation coefficient between volume and closing price;
Kurtosis is the kurtosis of the index return; CF is the composite indicator
and distribution characteristic coefficient, and assign different generate from indicators above.
calculation window periods and quantile thresholds. We test ** Benchmark is the backtest result of the multiple regression model, which
the effectiveness of monthly optimization and daily optimiza- uses 1st+6th+12th periods lag as the feature combination.
tion for each indicator on the basis of industry equal weight
benchmarks. Among them, the monthly optimization strategy
refers to selecting all industries that are not crowded at the
end of the month and constructing an equal weight portfolio;
The daily optimization strategy means that on the basis of the
monthly optimization strategy, the index is monitored every
day, and if there is a signal of congestion in the holding
industry, the industry will be cleared immediately to hold
cash. Table V shows the finally selected indicators and their
corresponding window periods and thresholds.
If any congestion indicator sends a congestion signal, we
follow the signal and clear the index, that is, we combine
these three congestion indicators into a composite indicator.
From the results of daily optimization in Table VI, composite
indicator is better than single indicators.
Finally, we try to combine multiple improvement measures. Fig. 3. Trend of net value of our portfolio and benchmark
First, we discretize the rate of return, and then we use compos-
ite indicators for optimization. How does the model perform? *Benchmark is the industry equal weight index; Original strategy uses
random forest classification and feature combination is macroeconomic factors
Take an empirical example, where the feature combination is (1st period lag + 6th period lag); Monthly and Daily optimization strategy is
macroeconomic factors (1st period lag + 6th period lag) and based on the original strategy and optimized using the composite crowdedness
the model is random forest classification. Fig. 3 is the results of indicator.
it. It is shown that our strategy has significantly outperformed
the industry equal weight benchmark, and the Sharpe ratio has
empirical results show that the establishment of a model to
also been greatly improved.
construct a portfolio can greatly increase the annualized rate
V. C ONCLUSION of return and the Sharpe ratio compared with the industry
equal weight benchmark, which fully proves the application
From the perspective of macroeconomic factors, this paper
prospects of the explainable AI models in the financial field.
uses the CITIC industry index data of the past ten years
and applies explainable AI models on a rolling basis to ACKNOWLEDGMENT
predict index returns and construct investment portfolios. The
This work is supported by China Asset Management. We
also thank the reviewer for very helpful comments.
TABLE V R EFERENCES
O UR CONGESTION INDICATORS
[1] T. Greetham and M. Hartnett, “The Investment Clock, Special Report
Index Period Threshold #1: Making Money from Macro,” p. 28, 2004.
CTC 60 1% [2] N.-F. Chen, R. Roll, and S. A. Ross, “Economic forces and the stock
CVC 40 1% market,” Journal of business, pp. 383–403, 1986.
Kurtosis 60 1% [3] A. M. Adam and G. Tweneboah, “Macroeconomic factors and stock
* CTC is the correlation coefficient between turnover rate and market movement: Evidence from ghana,” Munich personal RePEc
closing price; CVC is the correlation coefficient between archive, 2008.
volume and closing price; Kurtosis is the kurtosis of the [4] T. Singh, S. Mehta, and M. Varsha, “Macroeconomic factors and stock
index return. returns: Evidence from taiwan,” Journal of economics and international
finance, vol. 3, no. 4, pp. 217–227, 2011.
[5] B. Kibria and S. Banik, “Some ridge regression estimators and their
performances,” Journal of Modern Applied Statistical Methods, vol. 15,
no. 1, p. 12, 2016.
[6] A. M. E. Saleh, M. Arashi, and B. G. Kibria, Theory of ridge regression
estimation with applications. John Wiley & Sons, 2019, vol. 285.
[7] L. Breiman, “Random forests,” Machine learning, vol. 45, no. 1, pp.
5–32, 2001.
[8] P. F. Smith, S. Ganesh, and P. Liu, “A comparison of random forest
regression and multiple linear regression for prediction in neuroscience,”
Journal of neuroscience methods, vol. 220, no. 1, pp. 85–91, 2013.
[9] H. Abdi and L. J. Williams, “Principal component analysis,” Wiley
interdisciplinary reviews: computational statistics, vol. 2, no. 4, pp. 433–
459, 2010.
[10] A. Liaw, M. Wiener et al., “Classification and regression by randomfor-
est,” R news, vol. 2, no. 3, pp. 18–22, 2002.
[11] U. Grömping, “Variable importance assessment in regression: linear
regression versus random forest,” The American Statistician, vol. 63,
no. 4, pp. 308–319, 2009.
[12] J. L. Speiser, M. E. Miller, J. Tooze, and E. Ip, “A comparison of random
forest variable selection methods for classification prediction modeling,”
Expert Systems with Applications, vol. 134, pp. 93–101, 2019.
[13] R. Couronné, P. Probst, and A.-L. Boulesteix, “Random forest versus
logistic regression: a large-scale benchmark experiment,” BMC bioin-
formatics, vol. 19, no. 1, p. 270, 2018.
[14] C. Yang and L. Zhou, “Individual stock crowded trades, individual stock
investor sentiment and excess returns,” The North American Journal of
Economics and Finance, vol. 38, pp. 39–53, 2016.
[15] W. Kinlaw, M. Kritzman, and D. Turkington, “Crowded trades: Impli-
cations for sector rotation and factor timing,” The Journal of Portfolio
Management, vol. 45, no. 5, pp. 46–57, 2019.