Skip to contents

This function extracts all the contexts from a fitted VLMC, possibly with some associated data.

Usage

# S3 method for vlmc
contexts(
  ct,
  sequence = FALSE,
  reverse = FALSE,
  frequency = NULL,
  positions = FALSE,
  local = FALSE,
  cutoff = NULL,
  metrics = FALSE,
  ...
)

# S3 method for vlmc_cpp
contexts(
  ct,
  sequence = FALSE,
  reverse = FALSE,
  frequency = NULL,
  positions = FALSE,
  local = FALSE,
  cutoff = NULL,
  metrics = FALSE,
  ...
)

Arguments

ct

a context tree.

sequence

if TRUE the function returns its results as a data.frame, if FALSE (default) as a list of ctx_node objects. (see details)

reverse

logical (defaults to FALSE). See details.

frequency

specifies the counts to be included in the result data.frame. The default value of NULL does not include anything. "total" gives the number of occurrences of each context in the original sequence. "detailed" includes in addition the break down of these occurrences into all the possible states.

positions

logical (defaults to FALSE). Specify whether the positions of each context in the time series used to build the context tree should be reported in a positions column of the result data frame. The availability of the positions depends on the way the context tree was built. See details for the definition of a position.

local

specifies how the counts reported by frequency are computed. When local is FALSE (default value) the counts include both counts that are specific to the context (if any) and counts from the descendants of the context in the tree. When local is TRUE the counts include only the number of times the context appears without being the last part of a longer context.

cutoff

specifies whether to include the cut off value associated to each context (see cutoff() and prune()). The default result with cutoff=NULL does not include those values. Setting cutoff to quantile adds the cut off values in quantile scale, while cutoff="native" adds them in the native scale. The returned values are directly based on the log likelihood ratio computed in the context tree and are not modified to ensure pruning (as when cutoff() is called by raw=TRUE).

metrics

if TRUE, adds predictive metrics for each context (see metrics() for the definition of predictive metrics).

...

additional arguments for the contexts function.

Value

A list of class contexts containing the contexts represented in this tree (as ctx_node) or a data.frame.

Details

The default behaviour of the function is to return a list of all the contexts using ctx_node objects (as returned by find_sequence()). The properties of the contexts can then be explored using adapted functions such as counts(), cutoff.ctx_node(), metrics.ctx_node() and positions().

When sequence=TRUE the method returns a data.frame whose first column, named context, contains the contexts as vectors (i.e. the value returned by as_sequence() applied to a ctx_node object). Other columns contain context specific values specified by the additional parameters. Setting any of those parameters to a value that ask for reporting information will toggle the result type of the function to data.frame.

The frequency parameter is described in details in the documentation of contexts.ctx_tree(). When cutoff is non NULL, the resulting data.frame contains a cutoff column with the cut off values, either in quantile or in native scale. See cutoff.vlmc() and prune.vlmc() for the definitions of cut off values and of the two scales.

Cut off values

The cut off values reported by contexts.vlmc can be different from the ones reported by cutoff.vlmc() for three reasons:

  1. cutoff.vlmc() reports only useful cut off values, i.e., cut off values that should induce a simplification of the VLMC when used in prune(). This exclude cut off values associated to simple contexts that are smaller than the ones of their descendants in the context tree. Those values are reported by context.vlmc.

  2. context.vlmc reports only cut off values of actual contexts, while cutoff.vlmc() reports cut off values for all nodes of the context tree.

  3. values are not modified to induce pruning, contrarily to the default behaviour of cutoff.vlmc()

Positions

A position of a context ctx in the time series x is an index value t such that the context ends with x[t]. Thus x[t+1] is after the context. For instance if x=c(0, 0, 1, 1) and ctx=c(0, 1) (in standard state order), then the position of ctx in x is 3.

State order in a context

Notice that contexts are given by default in the temporal order and not in the "reverse" order used by many VLMC research papers: older values are on the left. For instance, the context c(1, 0) is reported if the sequence 0, then 1 appeared in the time series used to build the context tree. Set reverse to TRUE for the reverse convention which is somewhat easier to relate to the way the context trees are represented by draw() (i.e. recent values at the top the tree).

See also

find_sequence() and find_sequence.covlmc() for direct access to a specific context, and contexts.ctx_tree(), contexts.vlmc() and contexts.covlmc() for concrete implementations of contexts().

Examples

dts <- sample(as.factor(c("A", "B", "C")), 100, replace = TRUE)
model <- vlmc(dts, alpha = 0.5)
## direct representation with ctx_node objects
model_ctxs <- contexts(model)
model_ctxs
#> Contexts:
#>  A, A, A
#>  B, A, A
#>  A, A
#>  B, B, B, A
#>  B, B, A
#>  A, C, B, A
#>  C, B, A
#>  B, A
#>  B, C, A
#>  A, C, C, A
#>  C, C, A
#>  C, A
#>  B, B, A, B
#>  C, B, A, B
#>  B, A, B
#>  C, A, B
#>  A, B
#>  A, B, B
#>  C, B, B
#>  B, B
#>  B, C, A, C, B
#>  C, A, C, B
#>  A, C, B
#>  C, B, C, B
#>  B, C, B
#>  C, B, C, C, B
#>  B, C, C, B
#>  C, C, B
#>  A, B, B, A, C
#>  B, B, A, C
#>  B, A, C
#>  C, A, C
#>  A, C
#>  B, A, B, C
#>  A, B, C
#>  B, B, C
#>  B, C
#>  A, C, C
#>  B, C, C
#>  A, C, C, C
#>  C, C, C
sapply(model_ctxs, cutoff, scale = "quantile")
#>  [1] 0.4897959 0.1574344 0.8930017 0.2222222 0.7491156 0.2500000 0.1573663
#>  [8] 0.9948751 0.4746094 0.5000000 0.7500000 0.6015948 0.5000000 0.5000000
#> [15] 0.5120000 0.4740741 0.3627739 0.3169333 0.4965458 0.2831756 0.4444444
#> [22] 0.6480000 0.1558617 0.5000000 0.7170617 0.4444444 0.4218750 0.7170617
#> [29] 0.4444444 0.4218750 0.5120000 0.1250000 0.4828227 0.5000000 0.8392869
#> [36] 0.3966942 0.9581648 0.1573663 0.1657851 0.4444444 0.9737040
sapply(model_ctxs, cutoff, scale = "native")
#>  [1] 0.713766468 1.848746401 0.113166844 1.504077397 0.288861954 1.386294361
#>  [7] 1.849179069 0.005138089 0.745263182 0.693147181 0.287682072 0.508171105
#> [13] 0.693147181 0.693147181 0.669430654 0.746391695 1.013975464 1.149063859
#> [19] 0.700079597 1.261688240 0.810930216 0.433864583 1.858785993 0.693147181
#> [25] 0.332593351 0.810930216 0.863046217 0.332593351 0.810930216 0.863046217
#> [31] 0.669430654 2.079441542 0.728105685 0.693147181 0.175202636 0.924589535
#> [37] 0.042735539 1.849179069 1.797063068 0.810930216 0.026647941
sapply(model_ctxs, function(x) metrics(x)$accuracy)
#>  [1] 0.5000000 0.6666667 1.0000000 0.5000000 0.7500000 1.0000000 0.0000000
#>  [8] 0.0000000 0.6666667 0.5000000 0.5000000 1.0000000 0.5000000 0.5000000
#> [15]       NaN 0.7500000 0.5000000 0.6000000 0.7500000 0.6250000 1.0000000
#> [22] 0.0000000 1.0000000 0.5000000 0.5000000 1.0000000 0.0000000 0.0000000
#> [29] 1.0000000 0.0000000 0.0000000 1.0000000 0.3333333 0.5000000 0.5000000
#> [36] 0.5000000 0.4000000 0.5000000 0.7500000 0.5000000 0.0000000
## data.frame format
contexts(model, frequency = "total")
#>         context freq
#> 1       A, A, A    2
#> 2       B, A, A    3
#> 3          A, A    7
#> 4    B, B, B, A    2
#> 5       B, B, A    6
#> 6    A, C, B, A    2
#> 7       C, B, A    4
#> 8          B, A   11
#> 9       B, C, A    3
#> 10   A, C, C, A    2
#> 11      C, C, A    4
#> 12         C, A    8
#> 13   B, B, A, B    2
#> 14   C, B, A, B    2
#> 15      B, A, B    4
#> 16      C, A, B    4
#> 17         A, B   10
#> 18      A, B, B    5
#> 19      C, B, B    4
#> 20         B, B   17
#> 21 B, C, A,....    2
#> 22   C, A, C, B    3
#> 23      A, C, B    5
#> 24   C, B, C, B    2
#> 25      B, C, B    4
#> 26 C, B, C,....    2
#> 27   B, C, C, B    3
#> 28      C, C, B    4
#> 29 A, B, B,....    2
#> 30   B, B, A, C    3
#> 31      B, A, C    4
#> 32      C, A, C    3
#> 33         A, C   10
#> 34   B, A, B, C    2
#> 35      A, B, C    4
#> 36      B, B, C    2
#> 37         B, C   11
#> 38      A, C, C    4
#> 39      B, C, C    4
#> 40   A, C, C, C    2
#> 41      C, C, C    3
contexts(model, cutoff = "quantile")
#>         context    cutoff
#> 1       A, A, A 0.4897959
#> 2       B, A, A 0.1574344
#> 3          A, A 0.8930017
#> 4    B, B, B, A 0.2222222
#> 5       B, B, A 0.7491156
#> 6    A, C, B, A 0.2500000
#> 7       C, B, A 0.1573663
#> 8          B, A 0.9948751
#> 9       B, C, A 0.4746094
#> 10   A, C, C, A 0.5000000
#> 11      C, C, A 0.7500000
#> 12         C, A 0.6015948
#> 13   B, B, A, B 0.5000000
#> 14   C, B, A, B 0.5000000
#> 15      B, A, B 0.5120000
#> 16      C, A, B 0.4740741
#> 17         A, B 0.3627739
#> 18      A, B, B 0.3169333
#> 19      C, B, B 0.4965458
#> 20         B, B 0.2831756
#> 21 B, C, A,.... 0.4444444
#> 22   C, A, C, B 0.6480000
#> 23      A, C, B 0.1558617
#> 24   C, B, C, B 0.5000000
#> 25      B, C, B 0.7170617
#> 26 C, B, C,.... 0.4444444
#> 27   B, C, C, B 0.4218750
#> 28      C, C, B 0.7170617
#> 29 A, B, B,.... 0.4444444
#> 30   B, B, A, C 0.4218750
#> 31      B, A, C 0.5120000
#> 32      C, A, C 0.1250000
#> 33         A, C 0.4828227
#> 34   B, A, B, C 0.5000000
#> 35      A, B, C 0.8392869
#> 36      B, B, C 0.3966942
#> 37         B, C 0.9581648
#> 38      A, C, C 0.1573663
#> 39      B, C, C 0.1657851
#> 40   A, C, C, C 0.4444444
#> 41      C, C, C 0.9737040
contexts(model, cutoff = "native", metrics = TRUE)
#>         context      cutoff  accuracy auc
#> 1       A, A, A 0.713766468 0.5000000  NA
#> 2       B, A, A 1.848746401 0.6666667  NA
#> 3          A, A 0.113166844 1.0000000  NA
#> 4    B, B, B, A 1.504077397 0.5000000  NA
#> 5       B, B, A 0.288861954 0.7500000  NA
#> 6    A, C, B, A 1.386294361 1.0000000  NA
#> 7       C, B, A 1.849179069 0.0000000  NA
#> 8          B, A 0.005138089 0.0000000  NA
#> 9       B, C, A 0.745263182 0.6666667  NA
#> 10   A, C, C, A 0.693147181 0.5000000  NA
#> 11      C, C, A 0.287682072 0.5000000  NA
#> 12         C, A 0.508171105 1.0000000  NA
#> 13   B, B, A, B 0.693147181 0.5000000  NA
#> 14   C, B, A, B 0.693147181 0.5000000  NA
#> 15      B, A, B 0.669430654       NaN  NA
#> 16      C, A, B 0.746391695 0.7500000  NA
#> 17         A, B 1.013975464 0.5000000  NA
#> 18      A, B, B 1.149063859 0.6000000 0.5
#> 19      C, B, B 0.700079597 0.7500000  NA
#> 20         B, B 1.261688240 0.6250000 0.5
#> 21 B, C, A,.... 0.810930216 1.0000000  NA
#> 22   C, A, C, B 0.433864583 0.0000000  NA
#> 23      A, C, B 1.858785993 1.0000000  NA
#> 24   C, B, C, B 0.693147181 0.5000000  NA
#> 25      B, C, B 0.332593351 0.5000000  NA
#> 26 C, B, C,.... 0.810930216 1.0000000  NA
#> 27   B, C, C, B 0.863046217 0.0000000  NA
#> 28      C, C, B 0.332593351 0.0000000  NA
#> 29 A, B, B,.... 0.810930216 1.0000000  NA
#> 30   B, B, A, C 0.863046217 0.0000000  NA
#> 31      B, A, C 0.669430654 0.0000000  NA
#> 32      C, A, C 2.079441542 1.0000000  NA
#> 33         A, C 0.728105685 0.3333333  NA
#> 34   B, A, B, C 0.693147181 0.5000000  NA
#> 35      A, B, C 0.175202636 0.5000000  NA
#> 36      B, B, C 0.924589535 0.5000000  NA
#> 37         B, C 0.042735539 0.4000000 0.5
#> 38      A, C, C 1.849179069 0.5000000  NA
#> 39      B, C, C 1.797063068 0.7500000  NA
#> 40   A, C, C, C 0.810930216 0.5000000  NA
#> 41      C, C, C 0.026647941 0.0000000  NA