R/dgaps.R
dgaps_stat.Rd
Calculates sufficient statistics for the the left-censored inter-exceedances time \(D\)-gaps model for the extremal index \(\theta\).
dgaps_stat(data, u, q_u, D = 1, inc_cens = TRUE)
A numeric vector of raw data. No missing values are allowed.
A numeric scalar. Extreme value threshold applied to data.
A numeric scalar. An estimate of the probability with which
the threshold u
is exceeded. If q_u
is missing then it is
calculated using mean(data > u, na.rm = TRUE)
.
A numeric scalar. Run parameter \(K\), as defined in Suveges and
Davison (2010). Threshold inter-exceedances times that are not larger
than k
units are assigned to the same cluster, resulting in a
\(K\)-gap equal to zero. Specifically, the \(K\)-gap \(S\)
corresponding to an inter-exceedance time of \(T\) is given by
\(S = \max(T - K, 0)\).
A logical scalar indicating whether or not to include contributions from right-censored inter-exceedance times relating to the first and last observation. It is known that these times are greater than or equal to the time observed. See Attalides (2015) for details.
A list containing the sufficient statistics, with components
N0
the number of left-censored inter-exceedance times.
N1
contribution from inter-exceedance times that are not left-censored (see Details).
sum_qtd
the sum of the (scaled) inter-exceedance times that are not left-censored, that is, \(q (I_0 T_0 + \cdots + I_N T_N)\), where \(q\) is estimated by the proportion of threshold exceedances.
n_dgaps
the number of inter-exceedances that contribute to the log-likelihood.
q_u
the sample proportion of values that exceed the threshold.
D
the input value of D
.
The sample inter-exceedance times are \(T_0, T_1, ..., T_{N-1}, T_N\), where \(T_1, ..., T_{N-1}\) are uncensored and \(T_0\) and \(T_N\) are right-censored. Under the assumption that the inter-exceedance times are independent, the log-likelihood of the \(D\)-gaps model is given by $$l(\theta; T_0, \ldots, T_N) = N_0 \log(1 - \theta e^{-\theta d}) + 2 N_1 \log \theta - \theta q (I_0 T_0 + \cdots + I_N T_N),$$ where
\(q\) is the threshold exceedance probability, estimated by the proportion of threshold exceedances,
\(d = q D\),
\(I_j = 1\) if \(T_j > D\) and \(I_j = 0\) otherwise,
\(N_0\) is the number of sample inter-exceedance times that are left-censored, that is, are less than or equal to \(D\),
(apart from an adjustment for the contributions of \(T_0\) and \(T_N\)) \(N_1\) is the number of inter-exceedance times that are uncensored, that is, are greater than \(D\),
specifically, if inc_cens = TRUE
then \(N_1\) is equal
to the number of \(T_1, ..., T_{N-1}\) that are
uncensored plus \((I_0 + I_N) / 2\).
The differing treatment of uncensored and censored \(K\)-gaps reflects differing contributions to the likelihood. Right-censored inter-exceedance times whose observed values are less than or equal to \(D\) add no information to the likelihood because we do not know to which part of the likelihood they should contribute.
If \(N_1 = 0\) then we are in the degenerate case where there is one cluster (all inter-exceedance times are left-censored) and the likelihood is maximized at \(\theta = 0\).
If \(N_0 = 0\) then all exceedances occur singly (no inter-exceedance times are left-censored) and the likelihood is maximized at \(\theta = 1\).
Holesovsky, J. and Fusek, M. Estimation of the extremal index using censored distributions. Extremes 23, 197-213 (2020). doi:10.1007/s10687-020-00374-3
Attalides, N. (2015) Threshold-based extreme value modelling, PhD thesis, University College London. https://discovery.ucl.ac.uk/1471121/1/Nicolas_Attalides_Thesis.pdf
dgaps
for maximum likelihood estimation of the
extremal index \(\theta\) using the \(D\)-gaps model.
u <- quantile(newlyn, probs = 0.90)
dgaps_stat(newlyn, u = u, D = 1)
#> $N0
#> [1] 184
#>
#> $N1
#> [1] 105
#>
#> $sum_qtd
#> [1] 270.5256
#>
#> $n_dgaps
#> [1] 290
#>