Simulate a discrete time series for a vlmc

This function simulates a time series from the distribution estimated by the given vlmc object.

Usage

# S3 method for class 'vlmc_cpp'
simulate(
  object,
  nsim = 1,
  seed = NULL,
  init = NULL,
  burnin = 0L,
  sample = c("fast", "slow", "R"),
  ...
)

Arguments

object: a fitted vlmc object.
nsim: length of the simulated time series (defaults to 1).
seed: an optional random seed (see the dedicated section).
init: an optional initial sequence for the time series.
burnin: number of initial observations to discard or "auto" (see the dedicated section).
sample: specifies which implementation of base::sample() to use. See the dedicated section.
...: additional arguments.

Value

a simulated discrete time series of the same type as the one used to build the vlmc with a seed attribute (see the Random seed section). The results has also the dts class to hide the seed attribute when using print or similar function.

Details

The time series can be initiated by a fixed sequence specified via the init parameter.

sampling method

The R backend for vlmc() uses base::sample() to generate samples for each context. Internally, this function sorts the probabilities of each state in decreasing probability order (among other things), which is not needed in our case. The C++ backend can be used with three different implementations:

sample="fast" uses a dedicated C++ implementation adapted to the data structures used internally. In general, the simulated time series obtained with this implementation will be different from the one generated with the R backend, even using the same seed.
sample="slow" uses another C++ implementation that mimics base::sample() in order to maximize the chance to provide identical simulation results regardless of the backend (when using the same random seed). This process is not perfect as we use the std::lib sort algorithm which is not guaranteed to give identical results as the ones of R internal 'revsort'.
sample="R" uses direct calls to base::sample(). Results are guaranteed to be identical between the two backends, but at the price of higher running time.

Burn in (Warm up) period

When using a VLMC for simulation purposes, we are generally interested in the stationary distribution of the corresponding Markov chain. To reduce the dependence of the samples from the initial values and get closer to this stationary distribution (if it exists), it is recommended to discard the first samples which are produced in a so-called "burn in" (or "warm up") period. The burnin parameter can be used to implement this approach. The VLMC is used to produce a sample of size burnin + nsim but the first burnin values are discarded. Notice that this burn in values can be partially given by the init parameter if it is specified.

If burnin is set to "auto", the burnin period is set to 64 * context_number(object), following the heuristic proposed in Mächler and Bühlmann (2004).

Random seed

This function reproduce the behaviour of stats::simulate(). If seed is NULL the function does not change the random generator state and returns the value of .Random.seed as a seed attribute in the return value. This can be used to reproduce exactly the simulation results by setting .Random.seed to this value. Notice that if the random seed has not be initialised by R so far, the function issues a call to runif(1) to perform this initialisation (as is done in stats::simulate()).

It seed is an integer, it is used in a call to set.seed() before the simulation takes place. The integer is saved as a seed attribute in the return value. The integer seed is completed by an attribute kind which contains the value as.list([RNGkind()]) exactly as with stats::simulate(). The random generator state is reset to its original value at the end of the call.

Extended contexts

As explained in details in loglikelihood.vlmc() documentation and in the dedicated vignette("likelihood", package = "mixvlmc"), the first initial values of a time series do not in general have a proper context for a VLMC with a non zero order. In order to simulate something meaningful for those values when init is not provided, we rely on the notion of extended context defined in the documents mentioned above. This follows the same logic as using loglikelihood.vlmc() with the parameter initial="extended". All vlmc functions that need to manipulate initial values with no proper context use the same approach.

References

Mächler, M. and Bühlmann, P. (2004) "Variable Length Markov Chains: Methodology, Computing, and Software" Journal of Computational and Graphical Statistics, 13 (2), 435-455, doi:10.1198/1061860043524

Examples

pc <- powerconsumption[powerconsumption$week == 5, ]
dts <- cut(pc$active_power, breaks = c(0, quantile(pc$active_power, probs = c(0.25, 0.5, 0.75, 1))))
model <- vlmc(dts, min_size = 5)
new_dts <- simulate(model, 500, seed = 0)
new_dts_2 <- simulate(model, 500, seed = 0, init = dts[1:5])
new_dts_3 <- simulate(model, 500, seed = 0, burnin = 500)