Skip to contents

This function fits a Variable Length Markov Chain with covariates (coVLMC) to a discrete time series coupled with a time series of covariates.

Usage

covlmc(
  x,
  covariate,
  alpha = 0.05,
  min_size = 5L,
  max_depth = 100L,
  keep_data = TRUE,
  control = covlmc_control(...),
  ...
)

Arguments

x

a discrete time series; can be numeric, character, factor or logical.

covariate

a data frame of covariates.

alpha

number in (0,1) (default: 0.05) cut off value in the pruning phase (in quantile scale).

min_size

number >= 1 (default: 5). Tune the minimum number of observations for a context in the growing phase of the context tree (see below for details).

max_depth

integer >= 1 (default: 100). Longest context considered in growing phase of the context tree.

keep_data

logical (defaults to TRUE). If TRUE, the original data are stored in the resulting object to enable post pruning (see prune.covlmc()).

control

a list with control parameters, see covlmc_control().

...

arguments passed to covlmc_control().

Value

a fitted covlmc model.

Details

The model is built using the algorithm described in Zanin Zambom et al. As for the vlmc() approach, the algorithm builds first a context tree (see ctx_tree()). The min_size parameter is used to compute the actual number of observations per context in the growing phase of the tree. It is computed as min_size*(1+ncol(covariate)*d)*(s-1) where d is the length of the context (a.k.a. the depth in the tree) and s is the number of states. This corresponds to ensuring min_size observations per parameter of the logistic regression during the estimation phase.

Then logistic models are adjusted in the leaves at the tree: the goal of each logistic model is to estimate the conditional distribution of the next state of the times series given the context (the recent past of the time series) and delayed versions of the covariates. A pruning strategy is used to simplified the models (mainly to reduce the time window associated to the covariates) and the tree itself.

Parameters specified by control are used to fine tune the behaviour of the algorithm.

Logistic models

By default, covlmc uses two different computing engines for logistic models:

Both engines are able to detect degenerate cases and lead to more robust results that using nnet::multinom(). It is nevertheless possible to replace stats::glm() and VGAM::vglm() with nnet::multinom() by setting the global option mixvlmc.predictive to "multinom" (the default value is "glm"). Notice that while results should be comparable, there is no guarantee that they will be identical.

References

  • Bühlmann, P. and Wyner, A. J. (1999), "Variable length Markov chains." Ann. Statist. 27 (2) 480-513 doi:10.1214/aos/1018031204

  • Zanin Zambom, A., Kim, S. and Lopes Garcia, N. (2022), "Variable length Markov chain with exogenous covariates." J. Time Ser. Anal., 43 (2) 312-328 doi:10.1111/jtsa.12615

See also

cutoff.covlmc() and prune.covlmc() for post-pruning.

Examples

pc <- powerconsumption[powerconsumption$week == 5, ]
dts <- cut(pc$active_power, breaks = c(0, quantile(pc$active_power, probs = c(1 / 3, 2 / 3, 1))))
dts_cov <- data.frame(day_night = (pc$hour >= 7 & pc$hour <= 17))
m_cov <- covlmc(dts, dts_cov, min_size = 15)
draw(m_cov)
#> * (merging ((0.556,1.78] and (1.78,7.54]): 1.347e-96)
#> +-- (0,0.556] (0.001385 [ -2.885 1.237
#> |                         -4.185 -15.4 ])
#> '-- (0.556,1.78] (0.8622 [ 2.046 
#> |                          0.1372 ])
#> '-- (1.78,7.54] (0.227 [ 3.714
#>                          5.684 ])
withr::with_options(
  list(mixvlmc.predictive = "multinom"),
  m_cov_nnet <- covlmc(dts, dts_cov, min_size = 15)
)
draw(m_cov_nnet)
#> * (merging ((0.556,1.78] and (1.78,7.54]): 1.347e-96)
#> +-- (0,0.556] (0.001386 [ -2.885 1.237 
#> |                         -4.185 -7.944 ])
#> '-- (0.556,1.78] (0.8622 [ 2.046 
#> |                          0.1372 ])
#> '-- (1.78,7.54] (0.2274 [ 3.714
#>                           5.684 ])