# Fit an optimal Variable Length Markov Chain with Covariates (coVLMC)

Source:`R/covlmc_tune.R`

`tune_covlmc.Rd`

This function fits a Variable Length Markov Chain with Covariates (coVLMC) to a discrete time series coupled with a time series of covariates by optimizing an information criterion (BIC or AIC).

## Arguments

- x
a discrete time series; can be numeric, character, factor and logical.

- covariate
a data frame of covariates.

- criterion
criterion used to select the best model. Either

`"BIC"`

(default) or`"AIC"`

(see details).- initial
specifies the likelihood function, more precisely the way the first few observations for which contexts cannot be calculated are integrated in the likelihood. See

`loglikelihood()`

for details.- alpha_init
if non

`NULL`

used as the initial cut off parameter (in quantile scale) to build the initial VLMC- min_size
integer >= 1 (default: 5). Tune the minimum number of observations for a context in the growing phase of the context tree (see

`covlmc()`

for details).- max_depth
integer >= 1 (default: 100). Longest context considered in growing phase of the initial context tree (see details).

- verbose
integer >= 0 (default: 0). Verbosity level of the pruning process.

- save
specify which BIC models are saved during the pruning process. The default value

`"best"`

asks the function to keep only the best model according to the`criterion`

. When`save="initial"`

the function keeps*in addition*the initial (complex) model which is then pruned during the selection process. When`save="all"`

, the function returns all the models considered during the selection process. See details for memory occupation.- trimming
specify the type of trimming used when saving the intermediate models, see details.

- best_trimming
specify the type of trimming used when saving the best model and the initial one (see details).

## Value

a list with the following components:

`best_model`

: the optimal COVLMC`criterion`

: the criterion used to select the optimal VLMC`initial`

: the likelihood function used to select the optimal VLMC`results`

: a data frame with details about the pruning process`saved_models`

: a list of intermediate COVLMCs if`save="initial"`

or`save="all"`

. It contains an`initial`

component with the large coVLMC obtained first and an`all`

component with a list of all the*other*coVLMC obtained by pruning the initial one.

## Details

This function automates the process of fitting a large coVLMC to a discrete
time series with `covlmc()`

and of pruning the tree (with `cutoff()`

and
`prune()`

) to get an optimal with respect to an information criterion. To
avoid missing long term dependencies, the function uses the `max_depth`

parameter as an initial guess but then relies on an automatic increase of the
value to make sure the initial context tree is only limited by the `min_size`

parameter. The initial value of the `alpha`

parameter of `covlmc()`

is also
set to a conservative value (0.5) to avoid prior simplification of the
context tree. This can be overridden by setting the `alpha_init`

parameter to
a more adapted value.

Once the initial coVLMC is obtained, the `cutoff()`

and `prune()`

functions
are used to build all the coVLMC models that could be generated using smaller
values of the alpha parameter. The best model is selected from this
collection, including the initial complex tree, as the one that minimizes the
chosen information criterion.

## Memory occupation

`covlmc`

objects tend to be large and saving all the models during the
search for the optimal model can lead to an unreasonable use of memory. To
avoid this problem, models are kept in trimmed form only using
`trim.covlmc()`

with `keep_model=FALSE`

. Both the initial model and the
best one are saved untrimmed. This default behaviour corresponds to
`trimming="full"`

. Setting `trimming="partial"`

asks the function to use
`keep_model=TRUE`

in `trim.covlmc()`

for intermediate models. Finally,
`trimming="none"`

turns off trimming, which is discouraged expected for
small data sets.

In parallel processing contexts (e.g. using foreach::%dopar%), the memory
occupation of the results can become very large as models tend to keep
environments attached to the formulas. In this situation, it is highly
recommended to trim all saved models, including the best one and the
initial one. This can be done via the `best_trimming`

parameter whose
possible values are identical to the ones of `trimming`

.

## Examples

```
pc <- powerconsumption[powerconsumption$week %in% 6:7, ]
dts <- cut(pc$active_power, breaks = c(0, quantile(pc$active_power, probs = c(0.5, 1))))
dts_cov <- data.frame(day_night = (pc$hour >= 7 & pc$hour <= 17))
dts_best_model_tune <- tune_covlmc(dts, dts_cov)
draw(as_covlmc(dts_best_model_tune))
#> *
#> +-- (0,1.26] (collapsing: 0.0003608)
#> | +-- (0,1.26] (0.0117 [ -2.603 ])
#> | '-- (1.26,6.48] (1 [ -1.52 ])
#> '-- (1.26,6.48]
#> +-- (0,1.26] (0.4311 [ 1.705 ])
#> '-- (1.26,6.48]
#> +-- (0,1.26] (0.5816 [ 1.609 ])
#> '-- (1.26,6.48] (collapsing: 7.999e-05)
#> +-- (0,1.26] (0.0006256 [ 0.47 2.862 ])
#> '-- (1.26,6.48] (0.9555 [ 2.856 ])
```