Sufficient statistics for the $K$-gaps model

Calculates sufficient statistics for the $K$-gaps model for the extremal index $\theta$. Called by kgaps.

kgaps_stat(data, u, q_u, k = 1, inc_cens = TRUE)

Arguments

data: A numeric vector of raw data.
u: A numeric scalar. Extreme value threshold applied to data.
q_u: A numeric scalar. An estimate of the probability with which the threshold u is exceeded. If q_u is missing then it is calculated using mean(data > u, na.rm = TRUE).
k: A numeric scalar. Run parameter $K$, as defined in Suveges and Davison (2010). Threshold inter-exceedances times that are not larger than k units are assigned to the same cluster, resulting in a $K$-gap equal to zero. Specifically, the $K$-gap $S$ corresponding to an inter-exceedance time of $T$ is given by $S = \max(T - K, 0)$.
inc_cens: A logical scalar indicating whether or not to include contributions from right-censored inter-exceedance times relating to the first and last observation. It is known that these times are greater than or equal to the time observed. See Attalides (2015) for details.

Value

A list containing the sufficient statistics, with components

N0: the number of zero $K$-gaps.
N1: contribution from non-zero $K$-gaps (see Details).
sum_qs: the sum of the (scaled) $K$-gaps, that is, $q (S_0 + \cdots + S_N)$, where $q$ is estimated by the proportion of threshold exceedances.
n_kgaps: the number of $K$-gaps that contribute to the log-likelihood.

Details

The sample $K$-gaps are $S_0, S_1, ..., S_{N-1}, S_N$, where $S_1, ..., S_{N-1}$ are uncensored and $S_0$ and $S_N$ are right-censored. Under the assumption that the $K$-gaps are independent, the log-likelihood of the $K$-gaps model is given by $$l(\theta; S_0, \ldots, S_N) = N_0 \log(1 - \theta) + 2 N_1 \log \theta - \theta q (S_0 + \cdots + S_N),$$ where

$q$ is the threshold exceedance probability, estimated by the proportion of threshold exceedances,
$N_0$ is the number of uncensored sample $K$-gaps that are equal to zero,
(apart from an adjustment for the contributions of $S_0$ and $S_N$) $N_1$ is the number of positive sample $K$-gaps,
specifically, if inc_cens = TRUE then $N_1$ is equal to the number of $S_1, ..., S_{N-1}$ that are positive plus $(I_0 + I_N) / 2$, where $I_0 = 1$ if $S_0$ is greater than zero and $I_0 = 0$ otherwise, and similarly for $I_N$.

The differing treatment of uncensored and right-censored $K$-gaps reflects differing contributions to the likelihood. Right-censored $K$-gaps that are equal to zero add no information to the likelihood. For full details see Suveges and Davison (2010) and Attalides (2015).

If $N_1 = 0$ then we are in the degenerate case where there is one cluster (all $K$-gaps are zero) and the likelihood is maximized at $\theta = 0$.

If $N_0 = 0$ then all exceedances occur singly (all $K$-gaps are positive) and the likelihood is maximized at $\theta = 1$.

References

Suveges, M. and Davison, A. C. (2010) Model misspecification in peaks over threshold analysis, Annals of Applied Statistics, 4(1), 203-221. doi:10.1214/09-AOAS292

Attalides, N. (2015) Threshold-based extreme value modelling, PhD thesis, University College London. https://discovery.ucl.ac.uk/1471121/1/Nicolas_Attalides_Thesis.pdf

Examples

u <- quantile(newlyn, probs = 0.90)
kgaps_stat(newlyn, u)
#> $N0
#> [1] 184
#> 
#> $N1
#> [1] 105
#> 
#> $sum_qs
#> [1] 259.9402
#> 
#> $n_kgaps
#> [1] 290
#>

Sufficient statistics for the \(K\)-gaps model

Arguments

Value

Details

References

See also

Examples