See discussions, stats, and author profiles for this publication at: https://www.researchgate.
net/publication/322564911
A Fusion Financial Prediction Strategy Based on RNN and Representative Pattern
Discovery
Conference Paper · December 2017
DOI: 10.1109/PDCAT.2017.00024
CITATION                                                                                                  READS
1                                                                                                         119
1 author:
            Xiaopeng Fan
            Chinese Academy of Sciences
            40 PUBLICATIONS   191 CITATIONS   
               SEE PROFILE
Some of the authors of this publication are also working on these related projects:
              Stream Data Mining View project
              the Natural Science Foundation of Hubei Province (2015CFB192) View project
 All content following this page was uploaded by Xiaopeng Fan on 18 January 2018.
 The user has requested enhancement of the downloaded file.
 A Fusion Financial Prediction Strategy Based on RNN and Representative Pattern
                                     Discovery
                                        Lu Zhang∗† , Xiaopeng Fan∗ , Chengzhong Xu∗ ,
                ∗ Shenzhen  Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
                † Shenzhen College of Advanced Technology, University of Chinese Academy of Sciences, China
                                           Email: {zhanglu, xp.fan, cz.xu}@siat.ac.cn
   Abstract—To predicate the future with high accuracy is a          prices more reliable, time series analysis attracts extensive
holy grail in financial market. However, the volatility of chaotic   attentions.
financial market challenges new technologies from computer              Traditional methods in time series analysis lie on solid
science to economic science all the time. Recently, Recurrent
Neural Network (RNN) plays a new role in financial market            theoretical foundation, but nonlinear fitting ability is still
prediction. However, results from RNN are restricted by sample       not enough to meet the requirements of financial predic-
size of training datasets, and show predication accuracy can         tion. Recurrent Neural Network (RNN) is a deep learning
hardly be guaranteed in a long term. On the other hand,              algorithm for time series analysis, which is with a strong
Representative Pattern Discovery (RPD) is an effective way           ability of nonlinear fitting to extract deep features from
in long-term prediction while it is ineffective in short-term
prediction.                                                          datasets. However, in daily financial markets, there are only
   In this paper, we define a representative pattern for time        a few thousands of records for a stock or a future, which
series, and propose a fusion financial prediction strategy based     greatly undermines the ability of RNN due to small data
on RNN and RPD. We take the advantages of both RNN                   size. Therefore, the prediction accuracy of RNN can hardly
and RPD, in the way that the proposed strategy is stateful           be guaranteed more than several days.
to keep the short-term trend and it rectifies the predication
by a time-dependent incremental factor in a long-term way.              Fortunately, there is another way - prediction based on
Compared with RNN and pattern discovery respectively, our            pattern discovery. Almost all human actions have hidden
experimental results demonstrate that our proposed strategy          patterns, and 93% of human actions are predictable [1]. As
performs much better than that of others. It can increase the        one type of economic activities, there are also hidden pat-
prediction accuracy by 6% on the basis of RNN at most, but           terns in financial market, and some investors have discovered
at a cost of higher Mean Squared Error.
                                                                     some useful hidden patterns and benefit from them. The
   Keywords-Financial prediction; Fusion method; Representa-         discovery of hidden patterns from time series is so important
tive pattern discovery; RNNs                                         that many significant methods [2]–[4] have been proposed.
                                                                     In these algorithms, time-series subsequences clustering acts
                       I. I NTRODUCTION                              as one of the most important subroutines. However, it is
   To predicate the future with high accuracy is a holy grail        criticized to apply clustering approaches to discover frequent
in financial market. Since the birth of financial markets,           patterns on time series subsequence [5]. The reason is when
financial prediction is the permanent goal for economists            sliding window is applied to discretize long time series
and investors to pursue. Investors may gain enormous profits         into subsequences given with a fixed window size, trivial
with a small amount of capital if they can accurately                match subsequences always exist. Extracting subsequences
predict market fluctuation. However, the volatility of chaotic       randomly with a length-variable window can reduce trivial
financial market challenges new technologies from computer           matches significantly.
science to economic science all the time. As we all know,               In this paper, we define a representative pattern as the sub-
financial markets are complex chaotic systems and they               sequence that keeps the similarity with other subsequences
are influenced by many factors, including political events,          within a range of distance. We propose a representative pat-
macro-economic conditions, and traders’ expectations etc.            tern discovery algorithm based on the density peak clustering
Therefore, predicting financial market’s movements is a              algorithm [6] with Dynamic Time Warping (DTW) distance.
challenging problem.                                                 On the basis of discovered patterns, we try to predict the
   According to academic investigations, fluctuations of             S&P500 index movements. Experiment results show that this
prices in markets are not completely random. They behave             method can grasp the long-term fluctuations of the index.
in a highly nonlinear dynamic manner. The random-walk                Due to market volatility, it is always ineffective in short-
assumption of prices may merely describe randomness with             term prediction. Based on this observation, we propose a
a noisy nonlinear process. To make the forecasting of                novel financial market prediction strategy by the fusion of
                                                                     RNN and pattern matching. The proposed strategy applies
  Corresponding author: Dr. Xiaopeng Fan, xp.fan@siat.ac.cn          pattern matching to rectify the results of RNN. Experiment
results show that our fusion method performs much better          to represent the features of the dataset. In the density peak
in real datasets on financial market prediction.                  clustering algorithm, cluster centers are recognized as the
                                                                  maximum among local densities, which is far away from
                    II. R ELATED WORK                             any point with higher density [6]. In our work, the density
   Lonardi et al. [7] apply Approximation Distance                and distance are both measured by the DTW distance.
Map(ADM), and propose a high quality and high effective              RNN is another time series analysis algorithm. In a
repeat pattern discovery method. In our problem, the dataset      traditional neural network, all the inputs and outputs are
is small and with high noisy. It is hard to discover high         independent of each other. However, RNN makes full use
similar repeat patterns from the dataset.                         of the sequential information. RNN is so called ”recurrent”
   Das et al. [3] discretize time series by clustering basic      because they carry out the same task on each element of a
methods, and then discover rules from discretized sequences.      sequence, with the output depending on previous computa-
In this paper, we discovery representative pattern by den-        tions. RNN reserves part of memories to know what has been
sity peak clustering. Petitjean et al. [8] provide a global       calculated so far. Different from traditional neural networks,
averaging method for dynamic time warping with k-means            the states of hidden layers in RNN are calculated by the
clustering.                                                       following equation:
   Tak-chung et al. [4] discover patterns from stock time
series by using self-organizing Maps. Chonghui et al. [9]                         a(t) = W h(t−1) + U x(t) + b               (1)
convert stock time series data into feature vectors by using
                                                                  , where the parameters are the bias vectors b, the weight
the ICA algorithm, and then applies k-means algorithm to
                                                                  matrices U and W respectively for input-to-hidden and
the extracted feature vectors. Xiaozhe et al. [10] propose a
                                                                  hidden-to-hidden connections. The h(t−1) is the outputs of
method for clustering of time series based on the structural
                                                                  hidden layers at pervious step.
characteristics of stock time series.
                                                                     Suppose we consider tanh as an activation function, and
   Persio et al. [11] try to predict the trend of the S&P 500
                                                                  the outputs of hidden layers are calculated by the following
index by Long Short-Term Memory (LSTM) neural network,
                                                                  equation:
which is a widely used variant of RNNs. But their results are
not good enough, in the way that each prediction is almost                             h(t) = tanh(a(t) )                  (2)
the same as the previous day, and the accuracy rate was           The loss function used in RNN can be the mean squared cost,
only 52.2%. We consider that there may be an inappropriate        the cross-entropy cost and etc. If we can find a derivable
network structure in their work, and in our work we show          function to compute the DTW distance, we can use the DTW
a better results by improving the LSTM neural network.            distance as the cost of RNN.
   Maknickiene et al. [12] apply the EVOLINO learning
algorithm to select the RNNs’ super parameters to achieve                                III. S TRATEGY
better stock prediction performance. In our work, we try
to use a new way to improve the prediction performance               In this section, we propose a fusion strategy based on
of RNNs. To the best of our knowledge, our proposed               RNN and pattern matching. The idea is motivated by the
approach to combine RNNs with pattern matching hasn’t             following experiment results. RNN can accurately obtain
been reported until now.                                          the short-term trends while pattern matching has a better
                                                                  performance on long-term predictions. Our strategy consists
A. Background                                                     of four parts as follows.
   Among most of data mining algorithms on time series,           • Discover representative patterns. Inspired by the density
the quality of outputs depends almost exclusively on the            peak clustering algorithm, we select the subsequences are
distance metrics [13]. The Dynamic-Time-Warping (DTW)               with higher similarity to other subsequences, but with very
distance is considered as the best. In a time series with           low similarity among each other. These subsequences dis-
high noises such as financial data, subsequences may vary           covered are called Representative Patterns(RPs) because
greatly although they share the same pattern. A local lag           they may not be repeated patterns but they are more
between peaks will appear because of the changes in data,           representative than other subsequences.
such as traders’ expectation, Fed’s policy, macro-economic        • Prediction by pattern matching. For a prediction event,
conditions etc. For such an out-of-sync problem, the DTW            we calculate the DTW distance between the front end of
distance can capture invariants more accurately.                    the RPs and the historical data. If we can find matched
   But in a daily financial history dataset, it is difficult to     RPs, we locate the best join position by the value of points.
find repeat patterns with high similarity, even if we apply         The subsequent of RPs behind the join position is the
the DTW distance. How can we find hidden patterns inside            prediction of the future.
a dataset? Inspired by the density peak clustering algorithm,     • Prediction by RNN. We apply a RNN structure [14]
we try to find the subsequences which have higher ability           which can summarize a sequence and produce a fixed-size
    representation. In this problem, a fixed-size representation   Algorithm 2 Representative patterns discovery
    is the results we try to predict.                              Inputs: SS: the set of subsequences; n: the num of repre-
•   Fusion RNN with pattern matching. After we obtain the              sentative patterns.
    predictions from RNN and pattern matching, the fusion          Outputs: RP s, The set of representative patterns.
    adjusts the results of RNN by the predictions of pattern        1: function PATTERN D ISCOVERY (SS, k)
    matching.                                                       2:     D ← calculate all-pair DTW distances matrix of SS.
                                                                    3:     n = len(SS)
A. Representative Patterns Discovery                                4:     for all i ∈ 1 : n do
                                                                    5:          ρ[i] = Count(D(i, otherSubsequences) < dc )
                                                                    6:     end for
Algorithm 1 Extract subsequences
                                                                    7:     for all i ∈ 1 : n do
Inputs: HS: history sequences; w: basic length of subse-            8:          δlist [i] ← the list of subsequences with higher
    quences; r: dynamic scope (r < w/2).                               density than SS[i]
Outputs: SS, the set of subsequences.                               9:     end for
 1: function G ENERATE DATA(HS, w, r)
                                                                   10:     (δlist , index) = sort(δlist , ’descend’)
 2:    size = len(HS) * 2 / w                                      11:     for all j ∈ 2:n do
 3:    for all i ∈ 1:size do                                       12:          δ[index[j]]        =        MinDistance(index[j],
 4:        window size = w + r * random(-1, 1)                         δlist [index[j]])
 5:        begin pos      =   random(0,    len(HS)     -           13:     end for
    window size)                                                   14:     δ[index[1]] = MaxDistance(index[1], otherSubse-
 6:        SS[i] = HS[begin pos : begin pos +                          quences)
    window size]                                                   15:     RP s ← topk(Subsequences, ρ * δ)
 7:    end for                                                     16:     return RP s
 8:    return SS                                                   17: end function
 9: end function
   Finding repeated pattern with high similarity is difficult      B. Prediction by pattern matching
in daily financial history datasets due to small size and high        Until now we have obtained representative patterns. For
noise. So we try another heuristic way to obtain hidden            a concrete prediction event, the computation flow is shown
patterns from the datasets. The way by applying clustering         as Algorithm 3. At lines 2-5, we calculate the DTW dis-
approaches to discover frequent patterns is criticized mean-       tance between the front segment of the RPs and the input
ingless when discretizing long time series into subsequences       sequence, and choose the index of RPs which has minimum
given with a fixed window size. Therefore, we apply a              distance at line 6. If the distance is less than the given cut-
window with variable length (w ± r, where r<m/2), and              off distance, it means that we have a successful matching.
extract the subsequences with the num (2 * length of history       Then we determine the join position based on the last few
sequences)/w randomly. By this way, we can reduce trivial          values of the input sequence at line 10, and the subsequence
matches significantly. The flow is shown as Algorithm 1.           behind the join position is the prediction result.
   Most clustering algorithms are based on the assumption
that cluster averages are approximated as the ideal clustering     C. Prediction by RNN
center. This means, on the other hand, that we can use                There are three major design patterns for recurrent neural
clustering centers as approximations to cluster averages. We       networks. In our work, we choose the time-unfolded RNN
consider clustering centers with more possibility to represent     with a single output at the end of the sequence which is
such data than other points.                                       illustrated in Fig. 1. This structure can summarize a sequence
   Based on the rationality above, we propose a represen-          and produce a fixed-size representation.
tative pattern discovery algorithm based on density peak,             Before we use the RNN model to predict the future, we
and the flow is shown as Algorithm 2. At line 2, we                need to set the structure and the the super parameter firstly.
calculate all-pair DTW distances between subsequences.             Then we train the model by history data. After the RNN
From lines 4-6, ρ (i.e., the density of all the subsequences) is   model is trained, the RNN model can produce a fixed-size
calculated. δ (i.e., minimum distance between higher density       representation with a valid input sequence, which is the right
subsequences) is calculated in lines 7-13. For the special         prediction we need.
case of the highest density subsequences, δ is calculated in
line 14.                                                           D. Fusion RNN with pattern matching
   After ρ and δ are calculated, we choose the top-k subse-          According to prediction results obtained from RNN and
quences which have higher values in ρ ∗ δ in line 15.              pattern matching above, we rectify the results of RNN by the
Algorithm 3 Prediction by pattern matching                                                             Algorithm 4 Fusion Strategy
Inputs: RP : the set of representative patterns: ihs: input                                            Inputs: RS: the prediction of RNN; M S: the prediction of
    history sequence; n: The time span need prediction.                                                    pattern matching; n: the time span need prediction.
Outputs: M P R, The predictions by pattern matching.                                                   Outputs: F S: the predictions of the fusion method.
 1: function P REDICTION B Y PM(RP , ihs, n, dc )                                                       1: function F USION M ETHOD (RS, M S, n)
 2:     for all i ∈ 1:len(RP s) do                                                                      2:     F S[1] ← RS[1]
 3:         f rp ← the front end of the rp[i]                                                           3:     for all t ∈ 2:n do
 4:         d[i] = CalculateDtwDistance(f rp, ihs)                                                      4:         α = log(t+n) (t + 1)
 5:     end for                                                                                         5:         F S[t] = α ∗ ( M S[t]−M [t−1]
                                                                                                                                     M [t−1]     + 1) ∗ F S[t − 1] +
 6:     ι = argmin(d)                                                                                      (1 − α) ∗ RS(t)
 7:     M P R ← null                                                                                    6:     end for
 8:     if d[ι] < dc then                                                                               7:     return F S
 9:         mp = RP s[ι]                                                                                8: end function
10:         Determine the join position based on the last few
    values of the ihs
11:         M P R ← the subsequent of mp behind join                                                                          IV. E XPERIMENTS
    position
12:     end if                                                                                         A. Dataset
13:     return M P R                                                                                     We choose the S&P 500 as a typical example of financial
14: end function                                                                                       market time-series dataset. The Standard & Poor’s 500 Index
                                                                                                       (S&P 500) is an index of 500 stocks as a leading indicator of
                                                                                                       U.S. equities and a reflection of the performance of the large
                                                                    𝐿(()
                                                                                                       cap universe. The index is made up of companies selected
                                                                                                       by economists. In our work, we obtain the history data from
                                                            𝑦 (()          𝑜(()                        2004 to 2016 and contains 4399 records. The data between
                                                                              𝑉
                                                                                                       2004 and 2014 is extracted as the training dataset, while the
              𝑊             𝑊           𝑊             𝑊             𝑊              𝑊-                  others are considered as the testing dataset.
        …         𝐶 (#$%)       𝐶 (#)       𝐶 (#'%)       …                𝐶 (()        𝐶 -(.)
        …                                                 …
                      𝑈            𝑈            𝑈             𝑈               𝑈                        B. Evaluation Metrics
                  𝑥 (#$%)       𝑥 (#)       𝑥 (#'%)       𝑥 (…… )          𝑥 (()                          We take Mean Squared Error(MSE) and Mean Trend
                                                                                                       Accuracy(MTA) to evaluate the effectiveness of our pro-
Figure 1.    The RNN structure with a single output at the end of the                                  posed solution. MSE is an estimator metric to evaluate the
sequences.
                                                                                                       average of the squares of errors, i.e., the difference between
                                                                                                       the estimator and what is estimated. MTA is an estimator
                                                                                                       to evaluate the trend prediction ability of the models. It’s
predictions from pattern matching according to the following                                           defined as the rate of the correct times to total times. In our
equation:                                                                                              work, we evaluate the accuracy of prediction by the value
                                                                                                       between the prediction point and the start point. For instance,
             M (t) − M (t − 1)                                                                         in the prediction for the t5 , if the predicted value and the true
F (t) = α∗(                      +1)∗F (t−1)+(1−α)∗R(t)                                                value are both higher than the value of t0 , it means we have
                 M (t − 1)
                                                           (3)                                         a correct prediction. In the domain of financial prediction,
,where t = 1, . . . , T , F is the prediction of the fusion                                            the trend is more important than actual values. Thus, in our
method, M is the prediction of pattern matching, R is the                                              work, MTA is the major metric of performance.
prediction of RNN, and α is a value between 0 and 1.
                                                                                                       C. Performance Analysis
   The details of the fusion method are shown in Algorithm
4. At line 2, we assign the first day prediction of F S by the                                           The super parameter of RNN is showed in Table I, and
RS, because RNN is accurate enough within the time range.                                              each output represents the prediction of one day. The super
At lines 3-5, we calculate α by the following equation:                                                parameter of pattern discovery is showed in Table II.
                                                                                                                                    Table I
                            α = log(t+T ) (t + 1)                                                (4)                  P RIMARY SUPER PARAMETERS OF RNN.
                                                                                                           hidden layers   step nums   input size   output size    cells
and then calculate the prediction of t moment by Equation                                                        1             20          1            5         LSTM
3.
                                                               Future
                                        Figure 2.    (a) Pattern matching example. (b) Prediction example.
                                       Figure 3.    (a) The MTAs of methods. (b) The MSEs of methods.
                            Table II
            S UPER PARAMETERS OF PATTERN DISCOVERY.                         It can increase the accuracy by 6% on the basis of RNN
                                                                            at most. On the metric of MSE, RNN is with the minimum
   subseq length   center nums   predict length     cutdown percent         MSE, but the fusion strategy is with the largest MSE because
        25              16             5                  4%
                                                                            of some abnormal values during the fusion.
                                                                                                      V. C ONCLUSION
   The experimental results are based on the above super                      In this paper, we propose a representative pattern discov-
parameters. The example of prediction is showed in Fig.2.                   ery algorithm based on the density peak clustering algorithm.
The label of y-axis in Fig.2 is the S&P index value after the               The DTW distance is used in the algorithm as the metric of
z-score normalization. From Fig.2(a), after we find a repre-                similarity. After representative patterns are discovered, we
sentative pattern, our adjust strategy can make our patterns                apply the pattern to predict the future. On the basis of the
closer to true value. Fig.2(b) shows among three prediction                 above work, we propose a fusion strategy which rectifies
results, the fusion strategy reconciles the predictions of RNN              RNN’s prediction by pattern matching. Experimental results
and pattern matching, and its result is more closer to the true             show that the fusion strategy performs better on trend
value.                                                                      prediction than RNN and pattern matching while have a high
   To reduce the randomness of experiments, we take several                 MSE on the values. In the future, We aim to improve our
experiments to calculate the means and variances. The                       fusion algorithm so that it can preform better both on trends
experimental results are shown in Fig. 3. As the results show,              and MSE values.
RNN has a better performance on the trend prediction within
                                                                                                    ACKNOWLEDGMENT
two days, while pattern matching has a better performance
on other days. The fusion strategy takes the advantages of                    This research is partially supported by Major Project of
both methods, and have a good performance all the time.                     Chinese National Programs for Fundamental Research and
  Development (2015CB352400), National Natural Science
  Foundation of China (61672513, 61501443, U1401258), and
  Shenzhen Strategic Emerging Industry Development Funds
  (JSGG20160229200957727, JSGG20150512145714248).
                             R EFERENCES
    [1] A.-L. Barabási, Bursts: the hidden patterns behind everything
        we do, from your e-mail to bloody crusades. Penguin, 2010.
    [2] H. Wang, “Clustering by pattern similarity in large data sets,”
        in ACM SIGMOD International Conference on Management
        of Data, 2002, pp. 394–405.
    [3] G. Das, K.-I. Lin, H. Mannila, G. Renganathan, and P. Smyth,
        “Rule discovery from time series.” in KDD, vol. 98, no. 1,
        1998, pp. 16–22.
    [4] T. C. Fu, F. L. Chung, V. Ng, and R. Luk, “Pattern discovery
        from stock time series using self-organizing maps,” in Work-
        shop Notes of Kdd Workshop on Temporal Data Mining, 2001,
        pp. 1 – 8.
    [5] E. Keogh, J. Lin, and W. Truppel, “Clustering of time series
        subsequences is meaningless: Implications for previous and
        future research,” in IEEE International Conference on Data
        Mining, 2003, pp. 115–122.
    [6] A. Rodriguez and A. Laio, “Clustering by fast search and find
        of density peaks,” Science, vol. 344, no. 6191, pp. 1492–1496,
        2014.
    [7] J. Lonardi and P. Patel, “Finding motifs in time series,” in
        Proc. of the 2nd Workshop on Temporal Data Mining, 2002,
        pp. 53–68.
    [8] F. Petitjean, A. Ketterlin, and P. Gançarski, “A global aver-
        aging method for dynamic time warping, with applications to
        clustering,” Pattern Recognition, vol. 44, no. 3, pp. 678–693,
        2011.
    [9] C. Guo, H. Jia, and N. Zhang, “Time series clustering based
        on ica for stock data analysis,” in International Conference
        on Wireless Communications, NETWORKING and Mobile
        Computing, 2008, pp. 1–4.
  [10] X. Wang, K. Smith, and R. Hyndman, “Characteristic-based
       clustering for time series data,” Data Mining & Knowledge
       Discovery, vol. 13, no. 3, pp. 335–364, 2006.
  [11] L. Di Persio and O. Honchar, “Artificial neural networks
       architectures for stock price prediction: comparisons and
       applications,” International Journal of Circuits, Systems and
       Signal Processing, vol. 10, pp. 403–413, 2016.
  [12] N. Maknickienė and A. Maknickas, “Financial market predic-
       tion system with evolino neural network and delphi method,”
       Journal of Business Economics and Management, vol. 14,
       no. 2, pp. 403–413, 2013.
  [13] J. Shieh and E. Keogh, “i sax: indexing and mining terabyte
       sized time series,” in Proceedings of the 14th ACM SIGKDD
       international conference on Knowledge discovery and data
       mining. ACM, 2008, pp. 623–631.
  [14] I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning.
       MIT Press, 2016, http://www.deeplearningbook.org.
View publication stats