Calculates sufficient statistics for the the left-censored inter-exceedances time \(D\)-gaps model for the extremal index \(\theta\).

dgaps_stat(data, u, q_u, D = 1, inc_cens = TRUE)

Arguments

data

A numeric vector of raw data. No missing values are allowed.

u

A numeric scalar. Extreme value threshold applied to data.

q_u

A numeric scalar. An estimate of the probability with which the threshold u is exceeded. If q_u is missing then it is calculated using mean(data > u, na.rm = TRUE).

D

A numeric scalar. Run parameter \(K\), as defined in Suveges and Davison (2010). Threshold inter-exceedances times that are not larger than k units are assigned to the same cluster, resulting in a \(K\)-gap equal to zero. Specifically, the \(K\)-gap \(S\) corresponding to an inter-exceedance time of \(T\) is given by \(S = \max(T - K, 0)\).

inc_cens

A logical scalar indicating whether or not to include contributions from right-censored inter-exceedance times relating to the first and last observation. It is known that these times are greater than or equal to the time observed. See Attalides (2015) for details.

Value

A list containing the sufficient statistics, with components

N0

the number of left-censored inter-exceedance times.

N1

contribution from inter-exceedance times that are not left-censored (see Details).

sum_qtd

the sum of the (scaled) inter-exceedance times that are not left-censored, that is, \(q (I_0 T_0 + \cdots + I_N T_N)\), where \(q\) is estimated by the proportion of threshold exceedances.

n_dgaps

the number of inter-exceedances that contribute to the log-likelihood.

q_u

the sample proportion of values that exceed the threshold.

D

the input value of D.

Details

The sample inter-exceedance times are \(T_0, T_1, ..., T_{N-1}, T_N\), where \(T_1, ..., T_{N-1}\) are uncensored and \(T_0\) and \(T_N\) are right-censored. Under the assumption that the inter-exceedance times are independent, the log-likelihood of the \(D\)-gaps model is given by $$l(\theta; T_0, \ldots, T_N) = N_0 \log(1 - \theta e^{-\theta d}) + 2 N_1 \log \theta - \theta q (I_0 T_0 + \cdots + I_N T_N),$$ where

  • \(q\) is the threshold exceedance probability, estimated by the proportion of threshold exceedances,

  • \(d = q D\),

  • \(I_j = 1\) if \(T_j > D\) and \(I_j = 0\) otherwise,

  • \(N_0\) is the number of sample inter-exceedance times that are left-censored, that is, are less than or equal to \(D\),

  • (apart from an adjustment for the contributions of \(T_0\) and \(T_N\)) \(N_1\) is the number of inter-exceedance times that are uncensored, that is, are greater than \(D\),

  • specifically, if inc_cens = TRUE then \(N_1\) is equal to the number of \(T_1, ..., T_{N-1}\) that are uncensored plus \((I_0 + I_N) / 2\).

The differing treatment of uncensored and censored \(K\)-gaps reflects differing contributions to the likelihood. Right-censored inter-exceedance times whose observed values are less than or equal to \(D\) add no information to the likelihood because we do not know to which part of the likelihood they should contribute.

If \(N_1 = 0\) then we are in the degenerate case where there is one cluster (all inter-exceedance times are left-censored) and the likelihood is maximized at \(\theta = 0\).

If \(N_0 = 0\) then all exceedances occur singly (no inter-exceedance times are left-censored) and the likelihood is maximized at \(\theta = 1\).

References

Holesovsky, J. and Fusek, M. Estimation of the extremal index using censored distributions. Extremes 23, 197-213 (2020). doi:10.1007/s10687-020-00374-3

Attalides, N. (2015) Threshold-based extreme value modelling, PhD thesis, University College London. https://discovery.ucl.ac.uk/1471121/1/Nicolas_Attalides_Thesis.pdf

See also

dgaps for maximum likelihood estimation of the extremal index \(\theta\) using the \(D\)-gaps model.

Examples

u <- quantile(newlyn, probs = 0.90)
dgaps_stat(newlyn, u = u, D = 1)
#> $N0
#> [1] 184
#> 
#> $N1
#> [1] 105
#> 
#> $sum_qtd
#> [1] 270.5256
#> 
#> $n_dgaps
#> [1] 290
#>