SSRN 4556048
SSRN 4556048
Abstract
Regime-driven models are popular for addressing temporal patterns in both financial market
performance and underlying stylized factors, wherein a regime describes periods with relatively
homogeneous behavior. Recently, statistical jump models have been proposed to learn regimes
with high persistence, based on clustering temporal features while explicitly penalizing jumps
across regimes. In this article, we extend the jump model by generalizing the discrete hidden
state variable into a probability vector over all regimes. This allows us to estimate the probability
of being in each regime, providing valuable information for downstream tasks such as regime-
aware portfolio models and risk management. Our model’s smooth transition from one regime
to another enhances robustness over the original discrete model. We provide a probabilistic
interpretation of our continuous model and demonstrate its advantages through simulations and
real-world data experiments. The interpretation motivates a novel penalty term, called mode
loss, which pushes the probability estimates to the vertices of the probability simplex thereby
improving the model’s ability to identify regimes. We demonstrate through a series of empirical
and real world tests that the approach outperforms traditional regime-switching models. This
outperformance is pronounced when the regimes are imbalanced and historical data is limited,
both common in financial markets.
Keywords: Regime Switching; Temporal Clustering; Statistical Jump Models; Probabilistic Mod-
eling; Times Series; Unsupervised Learning
∗
This article is dedicated to Professor Bill Ziemba, who spent his long career researching and discovering systematic
approaches to improve investment performance by strategies and approaches based on conditional “inefficient” pricing
behavior. His research covers examples in a wide range of markets and instruments, from sporting and gambling
events to traditional financial markets, and even less conventional markets such as wine and Turkish rugs. These
patterns are sometimes identified as anomalies (arbitrage), obtaining an edge (such card counting in blackjack) or
systematic risk-factors (traditional asset classes). With the rise of micro-level data and new data science methods,
the research Bill initiated has continued to grow, marking him as a pioneer explorer.
†
We emphasize that the statistical jump models we study in this article are not related to jump-diffusion models,
a common class of stochastic processes.
‡
Afşar Onat Aydınhan, aydinhan@princeton.edu. Petter N. Kolm, petter.kolm@nyu.edu. John M. Mulvey, mul-
vey@princeton.edu. Yizhan Shu (corresponding author), yizhans@princeton.edu.
2 Related Literature
In time series analysis, it is often valuable to partition the data into consecutive time periods
that exhibit similar characteristics. This allows us to fit separate models to each segment, thereby
capturing segment- or regime-specific characteristics and their changes more effectively. This task
has been extensively studied in various contexts, including cyclical analysis (Bry and Boschan,
1971), change point detection (Aminikhanghahi and Cook, 2017), segmentation (Himberg et al.,
2001), mixture models (Picard et al., 2011), and trend filtering (Kim et al., 2009; Gu and Mulvey,
2021). These approaches provide valuable insights into understanding and modeling the underlying
dynamics of financial time series.
After the seminal work of Rydén et al. (1998), which fitted simple HMMs on daily financial
return series, researchers have proposed numerous extensions to HMMs. By using conditional
distributions with heavy tails (Bulla, 2011) and sojourn time distributions other than memoryless
geometric distributions (Bulla and Bulla, 2006), the resulting models better capture long memory,
exhibit improved persistence of the hidden state sequence, and can reproduce many stylized facts
of financial time series. Nystrup et al. (2017) fit classical HMMs, but in an adaptive way that
allows for time-varying parameters, resulting in similar improved performance. Other extensions
to classical HMMs have been explored, including the utilization of higher-order Markov chains
to model the dynamics of the hidden state sequence (Zhang et al., 2019), the incorporation of
multi-scale hierarchical structures within HMMs (Fine et al., 1998), and the use of HMMs with
distributed state representations (Ghahramani and Jordan, 1995). Classical discrete-time HMMs
suffer from a quadratic increase in the number of parameters as the number of states expands,
often limiting researchers to models with few states. Nystrup et al. (2015) suggest alleviating this
constraint by fitting HMMs in continuous time, leading to linear growth in parameters with states.
While maximum likelihood estimation (MLE) is commonly used to estimate HMMs, gradient-
based optimizations of the likelihood function are computationally difficult due to the non-concavity
resulting from the integration over the hidden state variables. The Baum-Welch algorithm (Baum
et al., 1970), an iterative procedure later recognized as a specialization of the expectation–maximization
(EM) algorithm (Dempster et al., 1977), is typically employed but is only guaranteed to conver-
gence to a local optimum. To address the issue of local optima, Hsu et al. (2012) propose a singular
value decomposition (SVD)-based spectral algorithm with provable guarantees. This algorithm is
specifically designed for HMMs with a large number of hidden states. Bulla and Berzel (2008) pro-
pose a hybrid algorithm that combines the advantages of both direct numerical maximization and
EM algorithms. Additionally, several authors have introduced Bayesian approaches as alternative
3.1 Features
For financial return series of an individual instrument, the observation sequence consists of
scalar values x0 , . . . , xT −1 ∈ R. However, to enhance model performance, we construct a set of
additional features. Specifically, to facilitate comparison with previous work, we adopt the feature
construction method proposed by Nystrup et al. (2020a), as outlined in Algorithm 1. In turn,
these features are based on those in Zheng et al. (2021), with the distinction of allowing only
backward-looking features to enable easy adaptation for online predictions. These features include
a collection of rolling mean returns and volatilities calculated over different window sizes, essential
for understanding a return series. Specifically, we use two window lengths, w = 6 and 14, resulting
in a total of fifteen standardized features.
While we continue to use the feature sets popularized by Zheng et al. (2021); Nystrup et al.
(2020a), it is crucial to highlight the flexibility inherent in choosing the feature set within the
jump model framework. Features can be derived from various sources, including exogenous time
series such as macro-economic indicators and returns of relevant assets, as demonstrated in Cortese
et al. (2023a). Moreover, features can be specifically tailored for a particular financial market, as
illustrated in Shu et al. (2024). These examples underscore the diverse range of possibilities for
constructing feature sets within the jump model framework.
Given the non-convex nature of the objective function in equation (1), we can only expect
the coordinate descent algorithm to converge to a stationary point, without any guarantee of
finding the global optimum. Wright (2015) examines the use, implementation, and convergence of
coordinate descent algorithms for convex objective functions. Notably, global convergence results
for non-convex objective functions are only obtainable in particular special cases (see, for example,
Bertsekas (1999, Proposition 2.7.1); Attouch et al. (2010); Bolte et al. (2014)). To address the issue
of local optima, we run the algorithm ten times with different starting points, generated by the
K-means++ algorithm (Arthur and Vassilvitskii, 2007), and retain the solution that obtains the
best value of the objective function.
Until S i = S i−1 , or the improvement of the value of the objective function is smaller than
a specified tolerance.
Compute the empirical transition probability matrix.
Output: Model parameters and hidden state sequence
The time complexity of the DP algorithm is O(T K 2 ), which is equivalent to one forward-
backward pass in the E-step of the Baum-Welch algorithm. As discussed later in Section 4.2,
fitting the state sequence with fixed model parameters in the discrete jump model can also be
addressed by solving a relaxed linear programming (LP) problem. In our experience, the DP
end
Compute optimal value v̂ = mink V (T − 1, k).
Compute optimal state ŝT −1 = arg mink V (T − 1, k).
for t = T − 1, . . . , 1 do
ŝt−1 = min V (t − 1, j) + λ1{j6=ŝt } . (3)
j
end
Output: Optimal value v̂ and optimal state sequence {ŝ0 , . . . , ŝT −1 }.
solution is typically faster than calling an LP solver, especially when the number of states K is
relatively low (less than ten).
S ≥ 0, S1K = 1T , (4)
where Θ represents the model parameters, S is the hidden vector sequence, and the loss function
L is defined as a weighted average of the loss between yt to each cluster center, i.e.
X
L(yt , Θ, st ) = st,k l(yt , θk ) , (6)
k
with l and st,k denoting the scaled squared `2 -distance and the k-th element of st , respectively.
We remark that the factor of 1/4 in the second term of equation (5) ensures that λ is consistent
with that of the discrete model, if we were to reduce a continuous model to a discrete one. Like
the original jump model, the objective (5) fits the data to multiple models while simultaneously
keeping the total number of jumps small. We emphasize that the second term of the the objective
function penalizes transitions between two subsequent probability vectors using the squared `1 -
norm of their difference. The `1 -norm promotes sparsity in the difference between two consecutive
estimated probability vectors, hence encouraging persistence in the inferred state process. In Section
4.2, we provide theoretical justification for the choice of this regularization term for jumps on the
probability simplex.
Similarly, when the loss function l is convex with respect to the parameter θk , this optimization
problem is tractable. In the case of scaled squared `2 -distance, we obtain the solution analytically.
2
To address the issue of local optima, just like for the discrete jump model we run the algorithm ten times with
different starting points, generated by the K-means++ algorithm.
selects one of the K states at each time step t, the continuous model chooses one of the N candidate
probability vectors. Furthermore, instead of uniformly penalizing jumps across states with a penalty
term λ, the change between two consecutive estimated probability vectors is penalized using the
squared `1 -norm of the vector difference. We summarize this algorithm in Algorithm 5.
The time complexity of the approximate DP algorithm is O(T N 2 ), in contrast to the discrete
model, where the time complexity is O(T K 2 ). Consequently, the total running time of the con-
tinuous model increases by a factor in the order of hundreds. However, since the discrete model
is exceedingly fast to solve, the time required to solve one continuous jump model remains very
short, typically well under a second on a regular desktop computer. Nevertheless, if tasks such as
rolling fits or fitting with a list of jump penalties are required, we recommend leveraging parallel
processing to expedite the process.
nP o
T −1 i, s ) + λ
PT −1 2
(b) Fit state sequence S i = arg minS t=0 L(yt , Θ t 4 t=1 ks t−1 − s t k 1 .
Until S i = S i−1 , or the improvement of the value of the objective function is smaller than
a specified tolerance.
Compute the empirical transition probability matrix.
Output: Model parameters and hidden state sequence
In our experience, the error introduced by the DP approximation is negligible when the grid
size δ is less than approximately 0.05 4 . Employing the approximate DP algorithm to solve the QP
problem offers consistency with the discrete model and potential reductions in running time when
3
The approximate DP algorithm can decrease the running time by approximately five times with an appropriate
grid size, as described below. This reduction is crucial as QP problems need to be solved tens to hundreds of times
for each jump model solution. Another reason for proposing this DP approach is its adaptability to the mode loss
introduced later in Section 4.4.
4
The exact sequence of hidden vectors solved by a QP solver differs very little from the sequence given by the
approximate DP algorithm.
10
end
Compute optimal value v̂ = mini≤N −1 Ve (T − 1, i).
Compute optimal index îT −1 = arg mini≤N −1 Ve (T − 1, i), and optimal hidden vector
ŝT −1 = cîT −1 .
for t = T − 1, . . . , 1 do
ît−1 = arg min Ve (t − 1, j) + Λj,ît , ŝt−1 = cît−1 . (10)
j≤N −1
end
Output: Optimal value v̂ and optimal hidden vector sequence {ŝ0 , . . . , ŝT −1 }.
the number N of candidate vectors can be limited to around a hundred. However, since N scales
exponentially with K, if the number K of states is relatively large and requires more candidate
vectors, it is preferable to directly call a QP solver.
11
We remark that in contrast to (8), in the formula above we apply a factor of λ2 to the `1 -norm
to make it consistent with the discrete jump model. As per the fundamental theorem of linear
programming (Bertsimas and Tsitsiklis, 1997, Theorem 2.7), the minimum of (12) is attained at
an extreme point of the feasible set, which corresponds to a corner of the simplex. At the corners,
the probability estimation for a specific state is either 0 or 1, indicating a definitive assignment to
a single state. Consequently, the optimal hidden vector sequence represents a hard clustering of
the observations, reducing the model to a discrete one.5
As a result, we choose to use the squared `1 -norm of the vector difference as the final jump
penalty on the simplex ∆K−1 , which transforms (12) into the QP problem (8). When K is relatively
small and the number of candidate probability vectors is limited to a few hundred, we can expedite
computations by approximately solving this QP using dynamic programming with Algorithm 5,
yielding negligible numerical discrepancies. However, when the number of states are larger, it is
preferable to directly call a QP solver.
Using other powers of the `1 -norm of the vector difference, i.e. Ljump (st−1 , st ) ∝ kst−1 − st kα1 ,
is also a valid option. This introduces a mixed norm penalty (Kowalski, 2009), allowing for the
promotion of structured sparsity. However, in our empirical experiments, we did not find significant
differences when using α strictly greater than two. Therefore, we use α = 2 in our empirical work
in this article to maintain the QP form.
1. (Hidden Markov process) The hidden vector sequence S follows a Markov process on the
probability simplex, with initial distribution π : ∆K−1 → R+ , and transition kernel K :
∆K−1 × ∆K−1 → R+ , i.e.
−1
TY
p(S) = π(s0 ) × K(st−1 , st ) . (13)
t=1
2. (The conditional likelihood of Y ) Given the hidden vector sequence S, the observation yt
5
We remark that the LP problem (12) is a tight relaxation for fitting the hidden state sequence S in the discrete
jump model (1) when the model parameters Θ are held fixed, a task previously addressed by the DP algorithm 3.
12
To establish the correspondence of the probabilistic assumptions mentioned above with the
objective function (5) of the continuous jump model, it is convenient to express the objective in its
general form, following the notation of Bemporad et al. (2018)
T
X −1 T
X −1
J(Θ, S) = L(yt , Θ, st ) + Linit (s0 ) + Ljump (st−1 , st ) , (16)
t=0 t=1
where L(yt , Θ, st ) = k st,k l(yt , θk ) is the same as that in (5), Linit is a penalty term for the initial
P
state s0 , which we choose to be zero in our model, indicating that we do not hold any specific view
about the initial state. Ljump (st−1 , st ) represents the jumpP penalty, for which we have justified our
−1 jump
choice of λ4 kst−1 − st k21 . We denote by L(S) := Linit (s0 ) + Tt=1 L (st−1 , st ) the penalty on the
hidden vector sequence.
The subsequent propositions extend the result in Bemporad et al. (2018) to continuous jump
models. The first proposition states that, if (Y , S) follows a GHMM, and in (16), the loss function
l, initial penalty Linit , and jump penalty Ljump correspond to the negative log-likelihood of the
conditional distribution, initial distribution, and transition kernel, respectively, then maximizing a
lower bound of the joint log-likelihood of (Y , S) is equivalent to minimizing the objective function
J in the continuous jump model.
Proposition 1. Suppose assumptions 1 through 3 above are satisfied, and define the terms in the
loss function J as
l(y, θ) := − log p(y|θ),
Linit (s0 ) := − log π(s0 ), (17)
jump
L (st−1 , st ) := − log K(st−1 , st ).
Then maximizing a lower bound of the joint likelihood p(Y , S|Θ) is equivalent to minimizing the
objective function J with respect to Θ and S.
Conversely, given an objective function J, there is a statistical interpretation of it. In particular,
the following proposition states that if certain assumptions are satisfied, including the use of prob-
ability density functions p(y|θ) and p(S), then minimizing the objective function J is equivalent
to maximizing a lower bound of the joint likelihood.
Proposition 2. Define the probability density functions
e−l(y,θ)
p(y|θ) := , (18a)
ν
e−L(S)
p(S) := , (18b)
η
13
Suppose assumptions 2 and 3 above are satisfied with (18a) being the parametric family, and (18b)
is the density function for S. Then minimizing the objective function J with respect to Θ and S is
equivalent to maximizing a lower bound of the joint likelihood p(Y , S|Θ).
We remark that the assumption that ν does not depend on θ in the proposition above is satisfied
if, for example, l(y, θ) = g(y − θ).
Then, for S to follow a Markov process, the transition kernel K(st−1 , st ) needs to be proportional
jump (s
to e−L t−1 ,st ) , with a normalization constant given by
Z
jump (s
C(st−1 ) = e−L t−1 ,st )
dst . (21)
∆K−1
Conversely, Proposition 1 suggests that the appropriate jump penalty for this transition kernel is
Following Bemporad et al. (2018), we refer to the additional penalty term as the mode loss, defined
as Z
jump (s
L mode
(st−1 ) := log C(st−1 ) = log e−L t−1 ,st )
dst . (25)
∆K−1
14
5 Simulation Study
We conduct a simulation study to compare the performance of the HMM estimated using the
Baum-Welch algorithm, the discrete jump model, and the continuous jump model with and without
the mode loss. In our unsupervised problem setting, we simulate data from well- and misspecified
two- and three-state models where the true hidden state sequences and model parameters are
known, thereby allowing us to evaluate the accuracy of different approaches in identifying the state
sequences and in recovering the model parameters.
When evaluating classification accuracy in the context of financial time series, it is important to
consider the high imbalance among regimes, where the majority of periods are in the bull (normal)
regime rather than the bear regime. As a result, relying solely on overall accuracy frequently
leads to inflated scores that do not accurately reflect a model’s performance. To address this, we
employ three additional metrics: accuracy per class, balanced accuracy (BAC), and the area under
the Receiver Operating Characteristic curve (ROC-AUC). Accuracy per class measures the true
positive rate for each individual class, while BAC reflects the average accuracy across all the classes,
giving equal weight to each class irrespective of its proportion in the dataset (Brodersen et al., 2010).
Breaking down the accuracy by class allows for a more comprehensive evaluation of the model’s
performance across different regimes. While the first two metrics, accuracy per class and BAC,
are derived from the estimated label sequence, ROC-AUC is based on the estimated probability
and assesses the model’s ranking ability. Specifically, ROC-AUC represents the probability that a
classifier assigns a higher estimated probability to a randomly selected positive instance compared
to a randomly selected negative instance (Bradley, 1997). Unlike BAC, ROC-AUC can only be
computed for the sequences where all of the states are present.
15
where the state sequence {st } follows a first-order Markov chain and the state-conditional param-
eters and transition probability matrix are given by
µ1 = .0123, µ2 = −.0157,
σ1 = .0347, σ2 = .0778,
(30)
.9629 .0371
P = .
.2102 .7899
In this two-state HMM, the first state can be interpreted as a bullish market regime, while the second
state represents a bearish market regime. We note that Hardy (2001) estimated these parameters
from the monthly returns of an equity market index. We choose the starting distribution of s0 as
the stationary distribution of the Markov chain.
To simulate observation sequences with different state persistence, we construct daily and weekly
versions of the model above. Assuming a geometric Brownian motion for the return process and a
continuous-time Markov chain for the hidden state process, the following relationship between the
parameters for two time scales t1 and t2 holds
µs (t1 )/t1 = µs (t2 )/t2 , σs2 (t1 )/t1 = σs2 (t2 )/t2 ,
(31)
P (t1 )1/t1 = P (t2 )1/t2 ,
where the last equation is expressed in terms of matrix exponentials. In our empirical work, we use
t = 1, 5, 20 for the daily, weekly and monthly time scales, respectively. We remark that simulated
daily data exhibits the highest level of persistence but also has the lowest signal-to-noise ratio 6 .
5.1.1 Selecting λ
Figure 1 displays the average BAC of the three jump models as a function of the penalty
parameter for 1024 simulated sequences each of length 1000 using the daily parameters in (30).
Consistent with previous findings (Nystrup et al., 2020a,b), the discrete model achieves the highest
BAC at λ = 102 . Interestingly, the continuous model, although it can be reduced to a discrete model
with the same λ, achieves the best performance at λ = 103 and demonstrates better robustness to
6
Here, the signal-to-noise ratio measures the separability among clusters, and is calculated as the ratio of the
distance between two cluster centers to the volatility (Balakrishnan et al., 2017).
16
Figure 1: Balanced accuracy of the jump models as a function of the penalty parameter for 1024
simulated sequences each of length 1000. discrete, cont, and contM denote the discrete and contin-
uous jump models without and with mode loss, respectively.
17
18
19
HMM 0.0006 (0.0006) -0.0004 (0.0039) 0.0077 (0.0006) 0.0155 (0.0043) 0.0491 (0.1486) 0.1341 (0.2739) 0.9656 (0.1060) 0.8905 (0.2006) 0.9280 (0.1249) 0.9616 (0.1414)
discrete 0.0006 (0.0003) -0.0007 (0.0023) 0.0079 (0.0003) 0.0163 (0.0031) 0.0019 (0.0013) 0.0154 (0.0162) 0.9744 (0.0606) 0.8566 (0.1840) 0.9155 (0.1003) 0.9212 (0.0970)
cont 0.0006 (0.0003) -0.0005 (0.0019) 0.0079 (0.0003) 0.0156 (0.0033) 0.0023 (0.0016) 0.0137 (0.0108) 0.9539 (0.0969) 0.8816 (0.1662) 0.9177 (0.1059) 0.9710 (0.1011)
contM 0.0006 (0.0003) -0.0006 (0.0019) 0.0079 (0.0003) 0.0157 (0.0033) 0.0021 (0.0014) 0.0141 (0.0120) 0.9636 (0.0783) 0.8763 (0.1719) 0.9199 (0.1037) 0.9674 (0.0938)
2000
true 0.0006 (0.0000) -0.0008 (0.0000) 0.0078 (0.0000) 0.0174 (0.0000) 0.0021 (0.0000) 0.0120 (0.0000) 0.9965 (0.0061) 0.9212 (0.1230) 0.9588 (0.0612) 0.9975 (0.0157)
HMM 0.0006 (0.0003) -0.0010 (0.0037) 0.0078 (0.0004) 0.0166 (0.0028) 0.0266 (0.1040) 0.0738 (0.1971) 0.9819 (0.0733) 0.8946 (0.1773) 0.9382 (0.1096) 0.9693 (0.1185)
discrete 0.0006 (0.0002) -0.0008 (0.0016) 0.0079 (0.0002) 0.0171 (0.0018) 0.0017 (0.0009) 0.0136 (0.0111) 0.9899 (0.0190) 0.8717 (0.1388) 0.9308 (0.0707) 0.9314 (0.0702)
cont 0.0006 (0.0002) -0.0007 (0.0014) 0.0079 (0.0002) 0.0168 (0.0020) 0.0021 (0.0015) 0.0140 (0.0091) 0.9795 (0.0607) 0.8926 (0.1143) 0.9360 (0.0688) 0.9768 (0.0628)
contM 0.0006 (0.0002) -0.0007 (0.0014) 0.0079 (0.0002) 0.0168 (0.0019) 0.0020 (0.0011) 0.0141 (0.0095) 0.9846 (0.0391) 0.8885 (0.1219) 0.9366 (0.0685) 0.9712 (0.0641)
Table 1: Mean and standard deviation (in parenthesis) of parameter estimates and accuracy scores of 1024 simulations of length T =
250, 500, 1000, 2000 from the daily two-state HMM estimated with HMM, and the discrete and continuous jump models with and without mode
The parameters can be transformed to different scales using the formulas in equation (31).
20
21
HMM 0.0031 (0.0024) -0.0035 (0.0176) 0.0175 (0.0021) 0.0310 (0.0129) 0.0964 (0.2054) 0.2943 (0.3431) 0.9268 (0.1371) 0.6379 (0.3107) 0.7824 (0.1809) 0.8403 (0.2386)
discrete 0.0032 (0.0017) -0.0033 (0.0083) 0.0178 (0.0016) 0.0336 (0.0088) 0.0125 (0.0086) 0.0532 (0.0289) 0.8987 (0.1238) 0.7144 (0.2243) 0.8065 (0.1317) 0.8143 (0.1281)
cont 0.0031 (0.0016) -0.0026 (0.0073) 0.0181 (0.0016) 0.0327 (0.0086) 0.0119 (0.0083) 0.0457 (0.0235) 0.8886 (0.1330) 0.7259 (0.2281) 0.8073 (0.1377) 0.8627 (0.1452)
contM 0.0032 (0.0017) -0.0028 (0.0074) 0.0180 (0.0016) 0.0330 (0.0087) 0.0130 (0.0091) 0.0503 (0.0255) 0.8900 (0.1308) 0.7261 (0.2200) 0.8080 (0.1316) 0.8445 (0.1321)
500
true 0.0031 (0.0000) -0.0039 (0.0000) 0.0174 (0.0000) 0.0389 (0.0000) 0.0103 (0.0000) 0.0582 (0.0000) 0.9879 (0.0165) 0.7549 (0.2009) 0.8714 (0.0986) 0.9777 (0.0434)
HMM 0.0031 (0.0013) -0.0045 (0.0127) 0.0174 (0.0012) 0.0360 (0.0090) 0.0416 (0.1287) 0.1640 (0.2468) 0.9665 (0.0888) 0.7150 (0.2421) 0.8408 (0.1351) 0.9209 (0.1638)
discrete 0.0032 (0.0011) -0.0042 (0.0064) 0.0179 (0.0010) 0.0364 (0.0060) 0.0096 (0.0056) 0.0529 (0.0220) 0.9419 (0.0785) 0.7222 (0.1618) 0.8321 (0.0912) 0.8324 (0.0910)
cont 0.0031 (0.0011) -0.0034 (0.0057) 0.0181 (0.0010) 0.0354 (0.0058) 0.0090 (0.0054) 0.0457 (0.0175) 0.9357 (0.0846) 0.7394 (0.1594) 0.8375 (0.0935) 0.8899 (0.0969)
contM 0.0030 (0.0010) -0.0036 (0.0058) 0.0182 (0.0010) 0.0357 (0.0057) 0.0082 (0.0044) 0.0455 (0.0187) 0.9428 (0.0742) 0.7289 (0.1664) 0.8359 (0.0944) 0.8739 (0.0984)
Table 2: Mean and standard deviation (in parenthesis) of parameter estimates and accuracy scores of 1024 simulations of length T =
50, 100, 250, 500 from the weekly two-state HMM estimated with HMM, and the discrete and continuous jump models with and without mode
22
HMM 0.0125 (0.0033) -0.0258 (0.0365) 0.0343 (0.0032) 0.0694 (0.0205) 0.0665 (0.1098) 0.3350 (0.2599) 0.9539 (0.0886) 0.5080 (0.2182) 0.7310 (0.1084) 0.8483 (0.1473)
discrete 0.0136 (0.0043) -0.0122 (0.0139) 0.0349 (0.0031) 0.0650 (0.0129) 0.0520 (0.0312) 0.1570 (0.0433) 0.8368 (0.1041) 0.6391 (0.1515) 0.7380 (0.0789) 0.7380 (0.0789)
cont 0.0136 (0.0042) -0.0121 (0.0138) 0.0349 (0.0031) 0.0650 (0.0130) 0.0523 (0.0313) 0.1574 (0.0437) 0.8360 (0.1044) 0.6405 (0.1509) 0.7382 (0.0795) 0.7382 (0.0794)
contM 0.0136 (0.0044) -0.0121 (0.0138) 0.0349 (0.0031) 0.0650 (0.0130) 0.0525 (0.0315) 0.1576 (0.0437) 0.8356 (0.1050) 0.6403 (0.1508) 0.7380 (0.0795) 0.7380 (0.0795)
500
true 0.0123 (0.0000) -0.0157 (0.0000) 0.0347 (0.0000) 0.0778 (0.0000) 0.0371 (0.0000) 0.2101 (0.0000) 0.9842 (0.0135) 0.5194 (0.1291) 0.7518 (0.0627) 0.9135 (0.0377)
HMM 0.0123 (0.0022) -0.0151 (0.0254) 0.0346 (0.0025) 0.0739 (0.0154) 0.0554 (0.1050) 0.2785 (0.1928) 0.9678 (0.0670) 0.5136 (0.1810) 0.7407 (0.0895) 0.8810 (0.1108)
discrete 0.0138 (0.0029) -0.0134 (0.0095) 0.0351 (0.0022) 0.0663 (0.0092) 0.0473 (0.0234) 0.1668 (0.0356) 0.8597 (0.0710) 0.6301 (0.1047) 0.7449 (0.0523) 0.7449 (0.0523)
cont 0.0137 (0.0028) -0.0134 (0.0096) 0.0351 (0.0022) 0.0663 (0.0091) 0.0471 (0.0231) 0.1666 (0.0352) 0.8600 (0.0702) 0.6302 (0.1053) 0.7451 (0.0517) 0.7451 (0.0517)
contM 0.0138 (0.0028) -0.0134 (0.0096) 0.0351 (0.0021) 0.0663 (0.0092) 0.0471 (0.0233) 0.1664 (0.0356) 0.8600 (0.0707) 0.6294 (0.1057) 0.7447 (0.0524) 0.7447 (0.0524)
Table 3: Mean and standard deviation (in parenthesis) of parameter estimates and accuracy scores of 1024 simulations of length T =
60, 120, 250, 500 from the monthly two-state HMM estimated with HMM, and the discrete and continuous jump models with and without mode
23
24
true 0.0037 (0.0000) 0.0058 (0.0000) 0.0062 (0.0000) 0.9557 (0.1685) 0.7223 (0.3969) 0.7654 (0.3480) 0.8145 (0.1590) 0.9667 (0.0662)
HMM 0.2695 (0.3320) 0.3199 (0.3735) 0.2922 (0.3513) 0.6284 (0.3051) 0.4472 (0.3521) 0.5452 (0.3660) 0.5403 (0.1077) 0.6626 (0.1611)
discrete 0.0059 (0.0150) 0.0091 (0.0168) 0.0091 (0.0171) 0.8446 (0.2650) 0.6677 (0.3842) 0.6558 (0.3201) 0.7227 (0.1463) 0.7560 (0.1189)
cont 0.0059 (0.0327) 0.0071 (0.0130) 0.0084 (0.0297) 0.8671 (0.2477) 0.7046 (0.3786) 0.6480 (0.3651) 0.7399 (0.1467) 0.7968 (0.1274)
contM 0.0060 (0.0440) 0.0075 (0.0171) 0.0072 (0.0182) 0.8645 (0.2560) 0.7072 (0.3722) 0.6564 (0.3577) 0.7427 (0.1497) 0.7909 (0.1298)
1000
true 0.0037 (0.0000) 0.0058 (0.0000) 0.0062 (0.0000) 0.9703 (0.1017) 0.7465 (0.3560) 0.8129 (0.2843) 0.8432 (0.1325) 0.9828 (0.0328)
HMM 0.2671 (0.3474) 0.2474 (0.3617) 0.2516 (0.3265) 0.7114 (0.3099) 0.4029 (0.3858) 0.5460 (0.4026) 0.5534 (0.1224) 0.6812 (0.1968)
discrete 0.0078 (0.0179) 0.0090 (0.0136) 0.0081 (0.0138) 0.8866 (0.2370) 0.6296 (0.3796) 0.6941 (0.2814) 0.7368 (0.1495) 0.7932 (0.1180)
cont 0.0063 (0.0362) 0.0069 (0.0116) 0.0073 (0.0172) 0.8909 (0.2215) 0.6553 (0.3721) 0.7101 (0.2920) 0.7521 (0.1452) 0.8303 (0.1195)
contM 0.0064 (0.0243) 0.0072 (0.0115) 0.0070 (0.0145) 0.8878 (0.2294) 0.6576 (0.3748) 0.7123 (0.2923) 0.7526 (0.1455) 0.8291 (0.1205)
2000
true 0.0037 (0.0000) 0.0058 (0.0000) 0.0062 (0.0000) 0.9841 (0.0439) 0.8013 (0.2734) 0.8568 (0.2133) 0.8807 (0.1066) 0.9867 (0.0284)
HMM 0.2954 (0.3698) 0.2051 (0.3571) 0.2474 (0.3331) 0.8185 (0.2668) 0.3810 (0.4022) 0.5275 (0.4033) 0.5756 (0.1413) 0.7150 (0.2064)
Table 4: Mean and standard deviation (in parenthesis) of parameter estimates and accuracy scores of 1024 simulations of length T = 500, 1000, 2000
from the daily three-state HMM estimated with HMM, and the discrete and continuous jump models with and without mode loss (discrete, cont,
and contM ). The bold values represent the best parameter estimates and highest accuracy scores.
a 4% degradation for the discrete jump model, and only a 2% degradation for the continuous jump
model. Consequently, the continuous jump model outperforms all other models in terms of both
BAC and ROC-AUC for time series of at least 500 observations.
25
26
HMM 0.0006 (0.0003) -0.0004 (0.0171) 0.0067 (0.0006) 0.0174 (0.0037) 0.0622 (0.0830) 0.2857 (0.3090) 0.9587 (0.0471) 0.7425 (0.2672) 0.8506 (0.1380) 0.9303 (0.0994)
discrete 0.0006 (0.0003) -0.0006 (0.0062) 0.0079 (0.0006) 0.0189 (0.0093) 0.0018 (0.0011) 0.0278 (0.0404) 0.9827 (0.0354) 0.7635 (0.2735) 0.8731 (0.1370) 0.8748 (0.1362)
cont 0.0006 (0.0003) -0.0006 (0.0028) 0.0079 (0.0005) 0.0168 (0.0061) 0.0022 (0.0014) 0.0169 (0.0175) 0.9615 (0.0811) 0.8201 (0.2282) 0.8908 (0.1254) 0.9513 (0.1123)
contM 0.0006 (0.0003) -0.0006 (0.0030) 0.0079 (0.0005) 0.0170 (0.0062) 0.0020 (0.0012) 0.0179 (0.0195) 0.9699 (0.0631) 0.8127 (0.2349) 0.8913 (0.1238) 0.9398 (0.1178)
2000
true 0.0006 (0.0000) -0.0008 (0.0000) 0.0078 (0.0000) 0.0174 (0.0000) 0.0021 (0.0000) 0.0120 (0.0000) 0.9885 (0.0098) 0.8902 (0.1230) 0.9394 (0.0608) 0.9911 (0.0297)
HMM 0.0006 (0.0002) -0.0003 (0.0153) 0.0068 (0.0004) 0.0178 (0.0025) 0.0418 (0.0449) 0.2227 (0.2352) 0.9697 (0.0199) 0.7491 (0.2334) 0.8594 (0.1195) 0.9421 (0.0734)
discrete 0.0006 (0.0002) -0.0009 (0.0047) 0.0079 (0.0004) 0.0189 (0.0082) 0.0016 (0.0008) 0.0185 (0.0241) 0.9894 (0.0124) 0.7955 (0.2112) 0.8925 (0.1045) 0.8926 (0.1044)
cont 0.0006 (0.0002) -0.0008 (0.0026) 0.0079 (0.0003) 0.0176 (0.0055) 0.0018 (0.0009) 0.0144 (0.0126) 0.9842 (0.0320) 0.8434 (0.1639) 0.9138 (0.0837) 0.9655 (0.0731)
contM 0.0006 (0.0002) -0.0009 (0.0026) 0.0079 (0.0003) 0.0177 (0.0053) 0.0017 (0.0009) 0.0145 (0.0126) 0.9858 (0.0287) 0.8386 (0.1664) 0.9122 (0.0851) 0.9550 (0.0791)
Table 5: Mean and standard deviation (in parenthesis) of parameter estimates and accuracy scores of 1024 simulations of length T =
250, 500, 1000, 2000 from the daily two-state t-HMM estimated with HMM, and the discrete and continuous jump models with and without
27
HMM 0.0006 (0.0051) 0.0001 (0.0060) 0.0083 (0.0024) 0.0126 (0.0055) 0.1949 (0.2892) 0.3448 (0.3766) 0.8482 (0.2642) 0.7361 (0.3499) 0.7921 (0.2199) 0.8112 (0.2782)
discrete 0.0005 (0.0011) -0.0004 (0.0022) 0.0082 (0.0022) 0.0143 (0.0043) 0.0019 (0.0048) 0.0157 (0.0262) 0.9230 (0.1761) 0.7653 (0.3454) 0.8442 (0.2011) 0.8465 (0.2046)
cont 0.0005 (0.0009) -0.0003 (0.0017) 0.0084 (0.0022) 0.0133 (0.0044) 0.0020 (0.0042) 0.0106 (0.0161) 0.8939 (0.1772) 0.7704 (0.3471) 0.8322 (0.2104) 0.8713 (0.2201)
contM 0.0005 (0.0009) -0.0003 (0.0018) 0.0084 (0.0022) 0.0134 (0.0045) 0.0023 (0.0038) 0.0124 (0.0179) 0.8898 (0.1869) 0.7803 (0.3329) 0.8350 (0.2119) 0.8657 (0.2229)
2000
true 0.0006 (0.0000) -0.0008 (0.0000) 0.0078 (0.0000) 0.0174 (0.0000) 0.0021 (0.0000) 0.0120 (0.0000) 0.9894 (0.0717) 0.7689 (0.3660) 0.8792 (0.1831) 0.9588 (0.1250)
HMM 0.0002 (0.0040) -0.0005 (0.0044) 0.0081 (0.0018) 0.0136 (0.0051) 0.1429 (0.2447) 0.2602 (0.3489) 0.8950 (0.2141) 0.7505 (0.3509) 0.8227 (0.2212) 0.8319 (0.2754)
discrete 0.0006 (0.0007) -0.0004 (0.0020) 0.0080 (0.0013) 0.0149 (0.0040) 0.0012 (0.0028) 0.0126 (0.0185) 0.9617 (0.1017) 0.7751 (0.3412) 0.8684 (0.1870) 0.8714 (0.1868)
cont 0.0006 (0.0005) -0.0004 (0.0015) 0.0080 (0.0013) 0.0143 (0.0042) 0.0019 (0.0022) 0.0096 (0.0116) 0.9082 (0.1538) 0.8128 (0.3048) 0.8605 (0.2017) 0.8958 (0.2092)
contM 0.0006 (0.0006) -0.0004 (0.0016) 0.0080 (0.0014) 0.0144 (0.0042) 0.0023 (0.0030) 0.0125 (0.0144) 0.9176 (0.1496) 0.8178 (0.2939) 0.8677 (0.1885) 0.8964 (0.1922)
Table 6: Mean and standard deviation (in parenthesis) of parameter estimates and accuracy scores of 1024 simulations of length T =
250, 500, 1000, 2000 from the daily two-state HSMM estimated with HMM, and the discrete and continuous jump models with and without
µ0 µ1 σ0 σ1 γ01 γ10
discrete 0.1148% -0.1528% 1.2160% 2.6879% 0.0012 0.0025
cont 0.1029% -0.1879% 1.2891% 2.8616% 0.0021 0.0063
contM 0.0995% -0.1750% 1.2862% 2.8549% 0.0021 0.0062
Table 7: Estimated mean return and volatility under each of the two regimes and the transition
probabilities on the daily log-returns of the Nasdaq Index from 1996 to 2005.
Figure 2 displays the regimes estimated by the three jump models. Notably, the discrete jump
model identifies the dot-com crash as a single bear period, while the continuous jump models are
able to detect two rebound periods. The ability to capture rebounds is a result of the smooth
transition within the probability simplex in the continuous models. We underscore that the mode
loss penalty, when included, effectively smooths the probability curve, thereby eliminating minor
fluctuations with amplitudes lower than approximately 30%. This smoothing results in more per-
sistent state probabilities. In certain applications, maintaining unchanged probability estimation
can eliminate the necessity for portfolio rebalancing, ultimately leading to decreased turnover and
reduced transaction costs.
µ0 µ1 σ0 σ1 γ01 γ10
discrete 0.0829% -0.1336% 0.9828% 2.3885% 0.0014 0.0051
cont 0.0903% -0.1383% 0.9531% 2.3065% 0.0029 0.0117
contM 0.0868% -0.1386% 0.9686% 2.3490% 0.0019 0.0077
Table 8: Estimated mean return and volatility under each of the two regimes and the transition
probabilities on the daily log-returns of the Nasdaq Index from 2013 to 2022.
Figure 3 depicts the behavior of the three jump models during the relatively calm market
period from 2013 to 2022. The discrete jump model successfully identifies the three major bear
28
29
7 Conclusions
In this article, we extended the statistical jump model of Nystrup et al. (2020b,a) by generalizing
the discrete hidden state variable into a probability vector over all regimes. The continuous jump
model, in contrast to its discrete counterpart, allows for smooth regime transitions and provides a
more robust framework for estimating regime probabilities. We provided a probabilistic interpre-
tation of the new model and demonstrated its advantages through simulations and real-world data
experiments. Specifically, our simulation studies and an application to the Nasdaq Index demon-
strate that the continuous jump model offers advantages over competing methods such as hidden
Markov models and discrete jump models. The advantages of the new approach become particu-
larly pronounced when dealing with relatively short, imbalanced, and highly persistent time series,
making it well-suited for a broad range of applications in finance such as regime-aware portfolio
construction and risk management.
It would be beneficial for future research to integrate continuous jump models into portfolio
strategies, allowing investors to align their allocations with regime-specific insights and performance
expectations.
30
31
Thus
T
X −1 T
X −1
log p(Y , S|Θ) = log p(yt |Θ, st ) + log π(s0 ) + log K (st−1 , st ) . (35)
t=0 t=1
X T
X −1
log p(Y , S|Θ) ≥ stk log p(yt |θk ) + log π(s0 ) + log K (st−1 , st ) . (38)
t,k t=1
From equation (17), it follows that the above right hand side is precisely the negation of the
objective function J. Therefore minimizing J is equivalent to maximizing a lower bound of the
joint log-likelihood function.
= − J + const . (42)
Thus, minimizing J is equivalent to maximizing a lower bound of the log-likelihood p(Y , S|Θ).
32
Andersson, S., Rydén, T., and Johansson, R. (2003). Linear optimal prediction and innovations rep-
resentations of hidden markov models. Stochastic Processes and their Applications, 108(1):131–
149.
Ang, A. and Timmermann, A. (2012). Regime changes and financial markets. Annual Review of
Financial Economics, 4(1):313–337.
Arthur, D. and Vassilvitskii, S. (2007). K-means++: The advantages of careful seeding. In Pro-
ceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA ’07,
pages 1027–1035, USA. Society for Industrial and Applied Mathematics.
Attouch, H., Bolte, J., Redont, P., and Soubeyran, A. (2010). Proximal alternating minimization
and projection methods for nonconvex problems: An approach based on the Kurdyka-Lojasiewicz
inequality. Mathematics of Operations Research, 35(2):438–457.
Bae, G. I., Kim, W. C., and Mulvey, J. M. (2014). Dynamic asset allocation for varied finan-
cial markets under regime switching framework. European Journal of Operational Research,
234(2):450–458. 60 years following Harry Markowitz’s contribution to portfolio theory and op-
erations research.
Balakrishnan, S., Wainwright, M. J., and Yu, B. (2017). Statistical guarantees for the EM algorithm:
From population to sample-based analysis. The Annals of Statistics, 45(1):77 – 120.
Barberis, N. and Thaler, R. (2003). Chapter 18 A survey of behavioral finance. In Financial Markets
and Asset Pricing, volume 1 of Handbook of the Economics of Finance, pages 1053–1128. Elsevier.
Baum, L., Petrie, T., Soules, G., and Weiss, N. (1970). A maximization technique occurring in
the statistical analysis of probabilistic functions of Markov chains. The Annals of Mathematical
Statistics, pages 164–171.
Bazzi, M., Blasques, F., Koopman, S. J., and Lucas, A. (2017). Time-varying transition probabilities
for markov regime switching models. Journal of Time Series Analysis, 38(3):458–478.
Bemporad, A., Breschi, V., Piga, D., and Boyd, S. P. (2018). Fitting jump models. Automatica,
96:11–21.
Bickel, P. J., Ritov, Y., and Rydén, T. (1998). Asymptotic normality of the maximum-likelihood
estimator for general hidden Markov models. The Annals of Statistics, 26(4):1614 – 1635.
Bolte, J., Sabach, S., and Teboulle, M. (2014). Proximal alternating linearized minimization for
nonconvex and nonsmooth problems. Mathematical Programming, 146(1-2):459–494.
Boswijk, H. P., Hommes, C. H., and Manzan, S. (2007). Behavioral heterogeneity in stock prices.
Journal of Economic Dynamics and Control, 31(6):1938–1970. Tenth Workshop on Economic
Heterogeneous Interacting Agents.
33
Brodersen, K. H., Ong, C. S., Stephan, K. E., and Buhmann, J. M. (2010). The balanced accuracy
and its posterior distribution. In 2010 20th International Conference on Pattern Recognition,
pages 3121–3124.
Bry, G. and Boschan, C. (1971). Cyclical Analysis of Time Series: Selected Procedures and Com-
puter Programs. NBER.
Bulla, J. (2011). Hidden Markov models with t components: Increased persistence and other
aspects. Quantitative Finance, 11(3):459–475.
Bulla, J. and Berzel, A. (2008). Computational issues in parameter estimation for stationary hidden
Markov models. Computational Statistics, 23(1):1–18.
Bulla, J. and Bulla, I. (2006). Stylized facts of financial time series and hidden semi-Markov models.
Computational Statistics & Data Analysis, 51(4):2192–2209. Nonlinear Modelling and Financial
Econometrics.
Bulla, J., Mergner, S., Bulla, I., Sesboüé, A., and Chesneau, C. (2011). Markov-switching asset
allocation: Do profitable strategies exist? Journal of Asset Management, 12(4):310–321.
Cartea, A. and Jaimungal, S. (2013). Modelling asset prices for algorithmic and high-frequency
trading. Applied Mathematical Finance, 20(6):512–547.
Cortese, F., Kolm, P., and Lindström, E. (2023a). What drives cryptocurrency returns? a sparse
statistical jump model approach. Digit Finance.
Cortese, F. P., Kolm, P. N., and Lindström, E. (2023b). Generalized information criteria for sparse
statistical jump models. In Linde, P., editor, Symposium i anvendt statistik, Vol 44, Copenhagen.
Copenhagen Business School.
Dempster, A. P., Laird, N. M., and Rubin, D. B. (1977). Maximum likelihood from incomplete
data via the EM algorithm. J. R. Stat. Soc. B, pages 1–38.
Dias, J. G., Vermunt, J. K., and Ramos, S. (2015). Clustering financial time series: New insights
from an extended hidden Markov model. European Journal of Operational Research, 243(3):852–
864.
Ebbers, J., Heymann, J., Drude, L., Glarner, T., Haeb-Umbach, R., and Raj, B. (2017). Hidden
Markov model variational autoencoder for acoustic unit discovery. In InterSpeech, pages 488–492.
Elliott, R. J., Siu, T. K., and Badescu, A. (2010). On mean-variance portfolio selection under a
hidden Markovian regime-switching model. Economic Modelling, 27(3):678–686.
Fine, S., Singer, Y., and Tishby, N. (1998). The hierarchical hidden Markov model: Analysis and
applications. Machine Learning, 32(1):41–62.
Ghahramani, Z. and Jordan, M. (1995). Factorial hidden Markov models. In Touretzky, D., Mozer,
M., and Hasselmo, M., editors, Advances in Neural Information Processing Systems, volume 8.
MIT Press.
34
Gu, J. and Mulvey, J. M. (2021). Factor momentum and regime-switching overlay strategy. The
Journal of Financial Data Science, 3(4):101–129.
Hallac, D., Vare, S., Boyd, S., and Leskovec, J. (2017). Toeplitz inverse covariance-based clustering
of multivariate time series data. In Proceedings of the 23rd ACM SIGKDD International Confer-
ence on Knowledge Discovery and Data Mining, KDD ’17, pages 215–223, New York, NY, USA.
Association for Computing Machinery.
Hamilton, J. D. (1989). A new approach to the economic analysis of nonstationary time series and
the business cycle. Econometrica, 57(2):357–384.
Hand, D. and Till, R. (2001). A simple generalisation of the area under the ROC curve for multiple
class classification problems. Machine Learning, 45(2):171–186.
Himberg, J., Korpiaho, K., Mannila, H., Tikanmaki, J., and Toivonen, H. T. (2001). Time series
segmentation for context recognition in mobile devices. In Proceedings 2001 IEEE international
conference on data mining, pages 203–210. IEEE.
Hsu, D., Kakade, S. M., and Zhang, T. (2012). A spectral algorithm for learning hidden Markov
models. Journal of Computer and System Sciences, 78(5):1460–1480. JCSS Special Issue: Cloud
Computing 2011.
Kim, S.-J., Koh, K., Boyd, S., and Gorinevsky, D. (2009). `1 trend filtering. SIAM Review,
51(2):339–360.
Kowalski, M. (2009). Sparse regression using mixed norms. Applied and Computational Harmonic
Analysis, 27(3):303–324.
Levin, D. A., Peres, Y., and Wilmer, E. L. (2017). Markov Chains and Mixing Times. American
Mathematical Society, 2nd edition.
Li, X. and Mulvey, J. M. (2021). Portfolio optimization under regime switching and transaction
costs: Combining neural networks and dynamic programs. INFORMS Journal on Optimization,
3(4):398–417.
Li, X. and Mulvey, J. M. (2023). Optimal portfolio execution in a regime-switching market with
non-linear impact costs: Combining dynamic program and neural network. pre-print.
Lin, M. (2023). Essays on Applications of Networks and Discrete Optimization. Ph.d. dissertation,
Princeton University.
35
Mulvey, J. M. and Liu, H. (2016). Identifying economic regimes: Reducing downside risks for
university endowments and foundations. The Journal of Portfolio Management, 43(1):100–108.
Ng, A., Jordan, M., and Weiss, Y. (2001). On spectral clustering: Analysis and an algorithm.
Advances in neural information processing systems, 14.
Nystrup, P., Kolm, P. N., and Lindström, E. (2020a). Greedy online classification of persistent
market states using realized intraday volatility features. The Journal of Financial Data Science,
2(3):25–39.
Nystrup, P., Kolm, P. N., and Lindström, E. (2021). Feature selection in jump models. Expert
Systems with Applications, 184:115558.
Nystrup, P., Lindström, E., and Madsen, H. (2020b). Learning hidden Markov models with persis-
tent states by penalizing jumps. Expert Systems with Applications, 150:113307.
Nystrup, P., Madsen, H., and Lindström, E. (2015). Stylised facts of financial time series and
hidden Markov models in continuous time. Quantitative Finance, 15(9):1531–1541.
Nystrup, P., Madsen, H., and Lindström, E. (2017). Long memory of financial time series and
hidden Markov models with time-varying parameters. Journal of Forecasting, 36(8):989–1002.
Pagan, A. R. and Sossounov, K. A. (2003). A simple framework for analysing bull and bear markets.
Journal of Applied Econometrics, 18(1):23–46.
Peyré, G. and Cuturi, M. (2019). Computational optimal transport. Foundations and Trends in
Machine Learning, 11(5-6):355–607.
Picard, F., Lebarbier, E., Budinskà, E., and Robin, S. (2011). Joint segmentation of multivari-
ate Gaussian processes using mixed linear models. Computational Statistics & Data Analysis,
55(2):1160–1170.
Rabiner, L. R. (1989). A tutorial on hidden Markov models and selected applications in speech
recognition. Proceedings of the IEEE, 77(2):257–286.
Reus, L. and Mulvey, J. M. (2016). Dynamic allocations for currency futures under switching
regimes signals. European Journal of Operational Research, 253(1):85–93.
Rydén, T. (2008). EM versus Markov chain Monte Carlo for estimation of hidden Markov models:
a computational perspective. Bayesian Analysis, 3(4):659 – 688.
Rydén, T., Teräsvirta, T., and Åsbrink, S. (1998). Stylized facts of daily return series and the
hidden Markov model. Journal of Applied Econometrics, 13(3):217–244.
Sawhney, A. (2020). Regime identification, curse of dimensionality and deep generative models.
Technical report, Quantitative Brokers.
Schwert, G. W. (1989). Why does stock market volatility change over time? The Journal of
Finance, 44(5):1115–1153.
36
Uysal, A. S. and Mulvey, J. M. (2021). A machine learning approach in regime-switching risk parity
portfolios. The Journal of Financial Data Science, 3(2):87–108.
Viterbi, A. (1967). Error bounds for convolutional codes and an asymptotically optimum decoding
algorithm. IEEE Transactions on Information Theory, 13(2):260–269.
Witten, D. M. and Tibshirani, R. (2010). A framework for feature selection in clustering. Journal
of the American Statistical Association, 105(490):713–726. PMID: 20811510.
Yang, F., Balakrishnan, S., and Wainwright, M. J. (2017). Statistical and computational guarantees
for the Baum-Welch algorithm. The Journal of Machine Learning Research, 18(1):4528–4580.
Zhang, M., Jiang, X., Fang, Z., Zeng, Y., and Xu, K. (2019). High-order hidden Markov model for
trend prediction in financial time series. Physica A: Statistical Mechanics and its Applications,
517:1–12.
Zheng, K., Li, Y., and Xu, W. (2021). Regime switching model estimation: Spectral clustering
hidden Markov model. Annals of Operations Research, 303:297–319.
37