Sufficient statistics for the left-censored inter-exceedances time model

Calculates sufficient statistics for the the left-censored inter-exceedances time $D$-gaps model for the extremal index $\theta$.

dgaps_stat(data, u, q_u, D = 1, inc_cens = TRUE)

Arguments

data: A numeric vector of raw data. No missing values are allowed.
u: A numeric scalar. Extreme value threshold applied to data.
q_u: A numeric scalar. An estimate of the probability with which the threshold u is exceeded. If q_u is missing then it is calculated using mean(data > u, na.rm = TRUE).
D: A numeric scalar. Run parameter $K$, as defined in Suveges and Davison (2010). Threshold inter-exceedances times that are not larger than k units are assigned to the same cluster, resulting in a $K$-gap equal to zero. Specifically, the $K$-gap $S$ corresponding to an inter-exceedance time of $T$ is given by $S = \max(T - K, 0)$.
inc_cens: A logical scalar indicating whether or not to include contributions from right-censored inter-exceedance times relating to the first and last observation. It is known that these times are greater than or equal to the time observed. See Attalides (2015) for details.

Value

A list containing the sufficient statistics, with components

N0: the number of left-censored inter-exceedance times.
N1: contribution from inter-exceedance times that are not left-censored (see Details).
sum_qtd: the sum of the (scaled) inter-exceedance times that are not left-censored, that is, $q (I_0 T_0 + \cdots + I_N T_N)$, where $q$ is estimated by the proportion of threshold exceedances.
n_dgaps: the number of inter-exceedances that contribute to the log-likelihood.
q_u: the sample proportion of values that exceed the threshold.
D: the input value of D.

Details

The sample inter-exceedance times are $T_0, T_1, ..., T_{N-1}, T_N$, where $T_1, ..., T_{N-1}$ are uncensored and $T_0$ and $T_N$ are right-censored. Under the assumption that the inter-exceedance times are independent, the log-likelihood of the $D$-gaps model is given by $$l(\theta; T_0, \ldots, T_N) = N_0 \log(1 - \theta e^{-\theta d}) + 2 N_1 \log \theta - \theta q (I_0 T_0 + \cdots + I_N T_N),$$ where

$q$ is the threshold exceedance probability, estimated by the proportion of threshold exceedances,
$d = q D$,
$I_j = 1$ if $T_j > D$ and $I_j = 0$ otherwise,
$N_0$ is the number of sample inter-exceedance times that are left-censored, that is, are less than or equal to $D$,
(apart from an adjustment for the contributions of $T_0$ and $T_N$) $N_1$ is the number of inter-exceedance times that are uncensored, that is, are greater than $D$,
specifically, if inc_cens = TRUE then $N_1$ is equal to the number of $T_1, ..., T_{N-1}$ that are uncensored plus $(I_0 + I_N) / 2$.

The differing treatment of uncensored and censored $K$-gaps reflects differing contributions to the likelihood. Right-censored inter-exceedance times whose observed values are less than or equal to $D$ add no information to the likelihood because we do not know to which part of the likelihood they should contribute.

If $N_1 = 0$ then we are in the degenerate case where there is one cluster (all inter-exceedance times are left-censored) and the likelihood is maximized at $\theta = 0$.

If $N_0 = 0$ then all exceedances occur singly (no inter-exceedance times are left-censored) and the likelihood is maximized at $\theta = 1$.

References

Holesovsky, J. and Fusek, M. Estimation of the extremal index using censored distributions. Extremes 23, 197-213 (2020). doi:10.1007/s10687-020-00374-3

Attalides, N. (2015) Threshold-based extreme value modelling, PhD thesis, University College London. https://discovery.ucl.ac.uk/1471121/1/Nicolas_Attalides_Thesis.pdf

Examples

u <- quantile(newlyn, probs = 0.90)
dgaps_stat(newlyn, u = u, D = 1)
#> $N0
#> [1] 184
#> 
#> $N1
#> [1] 105
#> 
#> $sum_qtd
#> [1] 270.5256
#> 
#> $n_dgaps
#> [1] 290
#>