Calculates sufficient statistics for the \(K\)-gaps model for the extremal
index \(\theta\). Called by kgaps
.
kgaps_stat(data, u, q_u, k = 1, inc_cens = TRUE)
A numeric vector of raw data.
A numeric scalar. Extreme value threshold applied to data.
A numeric scalar. An estimate of the probability with which
the threshold u
is exceeded. If q_u
is missing then it is
calculated using mean(data > u, na.rm = TRUE)
.
A numeric scalar. Run parameter \(K\), as defined in Suveges and
Davison (2010). Threshold inter-exceedances times that are not larger
than k
units are assigned to the same cluster, resulting in a
\(K\)-gap equal to zero. Specifically, the \(K\)-gap \(S\)
corresponding to an inter-exceedance time of \(T\) is given by
\(S = \max(T - K, 0)\).
A logical scalar indicating whether or not to include contributions from right-censored inter-exceedance times relating to the first and last observation. It is known that these times are greater than or equal to the time observed. See Attalides (2015) for details.
A list containing the sufficient statistics, with components
N0
the number of zero \(K\)-gaps.
N1
contribution from non-zero \(K\)-gaps (see Details).
sum_qs
the sum of the (scaled) \(K\)-gaps, that is, \(q (S_0 + \cdots + S_N)\), where \(q\) is estimated by the proportion of threshold exceedances.
n_kgaps
the number of \(K\)-gaps that contribute to the log-likelihood.
The sample \(K\)-gaps are \(S_0, S_1, ..., S_{N-1}, S_N\), where \(S_1, ..., S_{N-1}\) are uncensored and \(S_0\) and \(S_N\) are right-censored. Under the assumption that the \(K\)-gaps are independent, the log-likelihood of the \(K\)-gaps model is given by $$l(\theta; S_0, \ldots, S_N) = N_0 \log(1 - \theta) + 2 N_1 \log \theta - \theta q (S_0 + \cdots + S_N),$$ where
\(q\) is the threshold exceedance probability, estimated by the proportion of threshold exceedances,
\(N_0\) is the number of uncensored sample \(K\)-gaps that are equal to zero,
(apart from an adjustment for the contributions of \(S_0\) and \(S_N\)) \(N_1\) is the number of positive sample \(K\)-gaps,
specifically, if inc_cens = TRUE
then \(N_1\) is equal
to the number of \(S_1, ..., S_{N-1}\)
that are positive plus \((I_0 + I_N) / 2\), where \(I_0 = 1\) if
\(S_0\) is greater than zero and \(I_0 = 0\) otherwise, and
similarly for \(I_N\).
The differing treatment of uncensored and right-censored \(K\)-gaps reflects differing contributions to the likelihood. Right-censored \(K\)-gaps that are equal to zero add no information to the likelihood. For full details see Suveges and Davison (2010) and Attalides (2015).
If \(N_1 = 0\) then we are in the degenerate case where there is one cluster (all \(K\)-gaps are zero) and the likelihood is maximized at \(\theta = 0\).
If \(N_0 = 0\) then all exceedances occur singly (all \(K\)-gaps are positive) and the likelihood is maximized at \(\theta = 1\).
Suveges, M. and Davison, A. C. (2010) Model misspecification in peaks over threshold analysis, Annals of Applied Statistics, 4(1), 203-221. doi:10.1214/09-AOAS292
Attalides, N. (2015) Threshold-based extreme value modelling, PhD thesis, University College London. https://discovery.ucl.ac.uk/1471121/1/Nicolas_Attalides_Thesis.pdf
kgaps
for maximum likelihood estimation of the
extremal index \(\theta\) using the \(K\)-gaps model.
u <- quantile(newlyn, probs = 0.90)
kgaps_stat(newlyn, u)
#> $N0
#> [1] 184
#>
#> $N1
#> [1] 105
#>
#> $sum_qs
#> [1] 259.9402
#>
#> $n_kgaps
#> [1] 290
#>