Sufficient statistics for the left-censored inter-exceedances time model
Source:R/dgaps.R
dgaps_stat.RdCalculates sufficient statistics for the the left-censored inter-exceedances time \(D\)-gaps model for the extremal index \(\theta\).
Arguments
- data
A numeric vector of raw data. No missing values are allowed.
- u
A numeric scalar. Extreme value threshold applied to data.
- q_u
A numeric scalar. An estimate of the probability with which the threshold
uis exceeded. Ifq_uis missing then it is calculated usingmean(data > u, na.rm = TRUE).- D
A numeric scalar. Run parameter \(K\), as defined in Suveges and Davison (2010). Threshold inter-exceedances times that are not larger than
kunits are assigned to the same cluster, resulting in a \(K\)-gap equal to zero. Specifically, the \(K\)-gap \(S\) corresponding to an inter-exceedance time of \(T\) is given by \(S = \max(T - K, 0)\).- inc_cens
A logical scalar indicating whether or not to include contributions from right-censored inter-exceedance times relating to the first and last observation. It is known that these times are greater than or equal to the time observed. See Attalides (2015) for details.
Value
A list containing the sufficient statistics, with components
N0the number of left-censored inter-exceedance times.
N1contribution from inter-exceedance times that are not left-censored (see Details).
sum_qtdthe sum of the (scaled) inter-exceedance times that are not left-censored, that is, \(q (I_0 T_0 + \cdots + I_N T_N)\), where \(q\) is estimated by the proportion of threshold exceedances.
n_dgapsthe number of inter-exceedances that contribute to the log-likelihood.
q_uthe sample proportion of values that exceed the threshold.
Dthe input value of
D.
Details
The sample inter-exceedance times are \(T_0, T_1, ..., T_{N-1}, T_N\), where \(T_1, ..., T_{N-1}\) are uncensored and \(T_0\) and \(T_N\) are right-censored. Under the assumption that the inter-exceedance times are independent, the log-likelihood of the \(D\)-gaps model is given by $$l(\theta; T_0, \ldots, T_N) = N_0 \log(1 - \theta e^{-\theta d}) + 2 N_1 \log \theta - \theta q (I_0 T_0 + \cdots + I_N T_N),$$ where
\(q\) is the threshold exceedance probability, estimated by the proportion of threshold exceedances,
\(d = q D\),
\(I_j = 1\) if \(T_j > D\) and \(I_j = 0\) otherwise,
\(N_0\) is the number of sample inter-exceedance times that are left-censored, that is, are less than or equal to \(D\),
(apart from an adjustment for the contributions of \(T_0\) and \(T_N\)) \(N_1\) is the number of inter-exceedance times that are uncensored, that is, are greater than \(D\),
specifically, if
inc_cens = TRUEthen \(N_1\) is equal to the number of \(T_1, ..., T_{N-1}\) that are uncensored plus \((I_0 + I_N) / 2\).
The differing treatment of uncensored and censored \(K\)-gaps reflects differing contributions to the likelihood. Right-censored inter-exceedance times whose observed values are less than or equal to \(D\) add no information to the likelihood because we do not know to which part of the likelihood they should contribute.
If \(N_1 = 0\) then we are in the degenerate case where there is one cluster (all inter-exceedance times are left-censored) and the likelihood is maximized at \(\theta = 0\).
If \(N_0 = 0\) then all exceedances occur singly (no inter-exceedance times are left-censored) and the likelihood is maximized at \(\theta = 1\).
References
Holesovsky, J. and Fusek, M. Estimation of the extremal index using censored distributions. Extremes 23, 197-213 (2020). doi:10.1007/s10687-020-00374-3
Attalides, N. (2015) Threshold-based extreme value modelling, PhD thesis, University College London. https://discovery.ucl.ac.uk/1471121/1/Nicolas_Attalides_Thesis.pdf
See also
dgaps for maximum likelihood estimation of the
extremal index \(\theta\) using the \(D\)-gaps model.
Examples
u <- quantile(newlyn, probs = 0.90)
dgaps_stat(newlyn, u = u, D = 1)
#> $N0
#> [1] 184
#>
#> $N1
#> [1] 105
#>
#> $sum_qtd
#> [1] 270.5256
#>
#> $n_dgaps
#> [1] 290
#>