This function fits a Variable Length Markov Chain with covariates (coVLMC) to a discrete time series coupled with a time series of covariates.
Usage
covlmc(
x,
covariate,
alpha = 0.05,
min_size = 5L,
max_depth = 100L,
keep_data = TRUE,
control = covlmc_control(...),
...
)
Arguments
- x
a discrete time series; can be numeric, character, factor or logical.
- covariate
a data frame of covariates.
- alpha
number in (0,1) (default: 0.05) cut off value in the pruning phase (in quantile scale).
- min_size
number >= 1 (default: 5). Tune the minimum number of observations for a context in the growing phase of the context tree (see below for details).
- max_depth
integer >= 1 (default: 100). Longest context considered in growing phase of the context tree.
- keep_data
logical (defaults to
TRUE
). IfTRUE
, the original data are stored in the resulting object to enable post pruning (seeprune.covlmc()
).- control
a list with control parameters, see
covlmc_control()
.- ...
arguments passed to
covlmc_control()
.
Details
The model is built using the algorithm described in Zanin Zambom et al. As
for the vlmc()
approach, the algorithm builds first a context tree (see
ctx_tree()
). The min_size
parameter is used to compute the actual number
of observations per context in the growing phase of the tree. It is computed
as min_size*(1+ncol(covariate)*d)*(s-1)
where d
is the length of the
context (a.k.a. the depth in the tree) and s
is the number of states. This
corresponds to ensuring min_size observations per parameter of the logistic
regression during the estimation phase.
Then logistic models are adjusted in the leaves at the tree: the goal of each logistic model is to estimate the conditional distribution of the next state of the times series given the context (the recent past of the time series) and delayed versions of the covariates. A pruning strategy is used to simplified the models (mainly to reduce the time window associated to the covariates) and the tree itself.
Parameters specified by control
are used to fine tune the behaviour of the
algorithm.
Logistic models
By default, covlmc
uses two different computing engines for logistic
models:
when the time series has only two states,
covlmc
usesstats::glm()
with a binomial link (stats::binomial()
);when the time series has at least three states,
covlmc
useVGAM::vglm()
with a multinomial link (VGAM::multinomial()
).
Both engines are able to detect degenerate cases and lead to more robust
results that using nnet::multinom()
. It is nevertheless possible to
replace stats::glm()
and VGAM::vglm()
with nnet::multinom()
by setting
the global option mixvlmc.predictive
to "multinom"
(the default value is
"glm"
). Notice that while results should be comparable, there is no
guarantee that they will be identical.
References
Bühlmann, P. and Wyner, A. J. (1999), "Variable length Markov chains." Ann. Statist. 27 (2) 480-513 doi:10.1214/aos/1018031204
Zanin Zambom, A., Kim, S. and Lopes Garcia, N. (2022), "Variable length Markov chain with exogenous covariates." J. Time Ser. Anal., 43 (2) 312-328 doi:10.1111/jtsa.12615
See also
cutoff.covlmc()
and prune.covlmc()
for post-pruning.
Examples
pc <- powerconsumption[powerconsumption$week == 5, ]
dts <- cut(pc$active_power, breaks = c(0, quantile(pc$active_power, probs = c(1 / 3, 2 / 3, 1))))
dts_cov <- data.frame(day_night = (pc$hour >= 7 & pc$hour <= 17))
m_cov <- covlmc(dts, dts_cov, min_size = 15)
draw(m_cov)
#> * (merging ((0.556,1.78] and (1.78,7.54]): 1.347e-96)
#> +-- (0,0.556] (0.001385 [ -2.885 1.237
#> | -4.185 -15.4 ])
#> '-- (0.556,1.78] (0.8622 [ 2.046
#> | 0.1372 ])
#> '-- (1.78,7.54] (0.227 [ 3.714
#> 5.684 ])
withr::with_options(
list(mixvlmc.predictive = "multinom"),
m_cov_nnet <- covlmc(dts, dts_cov, min_size = 15)
)
draw(m_cov_nnet)
#> * (merging ((0.556,1.78] and (1.78,7.54]): 1.347e-96)
#> +-- (0,0.556] (0.001386 [ -2.885 1.237
#> | -4.185 -7.944 ])
#> '-- (0.556,1.78] (0.8622 [ 2.046
#> | 0.1372 ])
#> '-- (1.78,7.54] (0.2274 [ 3.714
#> 5.684 ])