Fit a Variable Length Markov Chain with Covariates (coVLMC)

This function fits a Variable Length Markov Chain with covariates (coVLMC) to a discrete time series coupled with a time series of covariates.

Usage

covlmc(
  x,
  covariate,
  alpha = 0.05,
  min_size = 5L,
  max_depth = 100L,
  keep_data = TRUE,
  control = covlmc_control(...),
  ...
)

Arguments

x: a discrete time series; can be numeric, character, factor or logical.
covariate: a data frame of covariates.
alpha: number in (0,1) (default: 0.05) cut off value in the pruning phase (in quantile scale).
min_size: number >= 1 (default: 5). Tune the minimum number of observations for a context in the growing phase of the context tree (see below for details).
max_depth: integer >= 1 (default: 100). Longest context considered in growing phase of the context tree.
keep_data: logical (defaults to TRUE). If TRUE, the original data are stored in the resulting object to enable post pruning (see prune.covlmc()).
control: a list with control parameters, see covlmc_control().
...: arguments passed to covlmc_control().

Value

a fitted covlmc model.

Details

The model is built using the algorithm described in Zanin Zambom et al. As for the vlmc() approach, the algorithm builds first a context tree (see ctx_tree()). The min_size parameter is used to compute the actual number of observations per context in the growing phase of the tree. It is computed as min_size*(1+ncol(covariate)*d)*(s-1) where d is the length of the context (a.k.a. the depth in the tree) and s is the number of states. This corresponds to ensuring min_size observations per parameter of the logistic regression during the estimation phase.

Then logistic models are adjusted in the leaves at the tree: the goal of each logistic model is to estimate the conditional distribution of the next state of the times series given the context (the recent past of the time series) and delayed versions of the covariates. A pruning strategy is used to simplified the models (mainly to reduce the time window associated to the covariates) and the tree itself.

Parameters specified by control are used to fine tune the behaviour of the algorithm.

Logistic models

By default, covlmc uses two different computing engines for logistic models:

when the time series has only two states, covlmc uses stats::glm() with a binomial link (stats::binomial());
when the time series has at least three states, covlmc use VGAM::vglm() with a multinomial link (VGAM::multinomial()).

Both engines are able to detect degenerate cases and lead to more robust results that using nnet::multinom(). It is nevertheless possible to replace stats::glm() and VGAM::vglm() with nnet::multinom() by setting the global option mixvlmc.predictive to "multinom" (the default value is "glm"). Notice that while results should be comparable, there is no guarantee that they will be identical.

References

Bühlmann, P. and Wyner, A. J. (1999), "Variable length Markov chains." Ann. Statist. 27 (2) 480-513 doi:10.1214/aos/1018031204
Zanin Zambom, A., Kim, S. and Lopes Garcia, N. (2022), "Variable length Markov chain with exogenous covariates." J. Time Ser. Anal., 43 (2) 312-328 doi:10.1111/jtsa.12615

Examples

pc <- powerconsumption[powerconsumption$week == 5, ]
dts <- cut(pc$active_power, breaks = c(0, quantile(pc$active_power, probs = c(1 / 3, 2 / 3, 1))))
dts_cov <- data.frame(day_night = (pc$hour >= 7 & pc$hour <= 17))
m_cov <- covlmc(dts, dts_cov, min_size = 15)
draw(m_cov)
#> * (merging ((0.556,1.78] and (1.78,7.54]): 1.347e-96)
#> +-- (0,0.556] (0.001385 [ -2.885 1.237
#> |                         -4.185 -15.4 ])
#> '-- (0.556,1.78] (0.8622 [ 2.046 
#> |                          0.1372 ])
#> '-- (1.78,7.54] (0.227 [ 3.714
#>                          5.684 ])
withr::with_options(
  list(mixvlmc.predictive = "multinom"),
  m_cov_nnet <- covlmc(dts, dts_cov, min_size = 15)
)
draw(m_cov_nnet)
#> * (merging ((0.556,1.78] and (1.78,7.54]): 1.347e-96)
#> +-- (0,0.556] (0.001386 [ -2.885 1.237 
#> |                         -4.185 -7.944 ])
#> '-- (0.556,1.78] (0.8622 [ 2.046 
#> |                          0.1372 ])
#> '-- (1.78,7.54] (0.2274 [ 3.714
#>                           5.684 ])