Performs the information matrix test (IMT) of Suveges and Davison (2010) to diagnose misspecification of the \(K\)-gaps model.

kgaps_imt(data, u, k = 1, inc_cens = TRUE)

Arguments

data

A numeric vector or numeric matrix of raw data. If data is a matrix then the log-likelihood is constructed as the sum of (independent) contributions from different columns. A common situation is where each column relates to a different year.

If data contains missing values then split_by_NAs is used to divide the data into sequences of non-missing values.

u, k

Numeric vectors. u is a vector of extreme value thresholds applied to data. k is a vector of values of the run parameter \(K\), as defined in Suveges and Davison (2010). See kgaps for more details.

Any values in u that are greater than all the observations in data will be removed without a warning being given.

inc_cens

A logical scalar indicating whether or not to include contributions from censored inter-exceedance times, relating to the first and last observations. See Attalides (2015) for details.

Value

An object (a list) of class c("kgaps_imt", "exdex")

containing

imt

A length(u) by length(k) numeric matrix. Column i contains, for \(K\) = k[i], the values of the information matrix test statistic for the set of thresholds in u. The column names are the values in k. The row names are the approximate empirical percentage quantile levels of the thresholds in u.

p

A length(u) by length(k) numeric matrix containing the corresponding \(p\)-values for the test.

theta

A length(u) by length(k) numeric matrix containing the corresponding estimates of \(\theta\).

u,k

The input u and k.

Details

The \(K\)-gaps IMT is performed a over grid of all combinations of threshold and \(K\) in the vectors u and k. If the estimate of \(\theta\) is 0 then the IMT statistic, and its associated \(p\)-value is NA.

For details of the IMT see Suveges and Davison (2010). There are some typing errors on pages 18-19 that have been corrected in producing the code: the penultimate term inside {...} in the middle equation on page 18 should be \((c_j(K))^2\), as should the penultimate term in the first equation on page 19; the {...} bracket should be squared in the 4th equation on page 19; the factor \(n\) should be \(N-1\) in the final equation on page 19.

References

Suveges, M. and Davison, A. C. (2010) Model misspecification in peaks over threshold analysis, Annals of Applied Statistics, 4(1), 203-221. doi:10.1214/09-AOAS292

Attalides, N. (2015) Threshold-based extreme value modelling, PhD thesis, University College London. https://discovery.ucl.ac.uk/1471121/1/Nicolas_Attalides_Thesis.pdf

See also

kgaps for maximum likelihood estimation of the extremal index \(\theta\) using the \(K\)-gaps model.

choose_uk for graphical diagnostic to aid the choice of the threshold \(u\) and the run parameter \(K\).

Examples

### Newlyn sea surges

u <- quantile(newlyn, probs = seq(0.1, 0.9, by = 0.1))
imt <- kgaps_imt(newlyn, u = u, k = 1:5)

### S&P 500 index

u <- quantile(sp500, probs = seq(0.1, 0.9, by = 0.1))
imt <- kgaps_imt(sp500, u = u, k = 1:5)

### Cheeseboro wind gusts (a matrix containing some NAs)

probs <- c(seq(0.5, 0.98, by = 0.025), 0.99)
u <- quantile(cheeseboro, probs = probs, na.rm = TRUE)
imt <- kgaps_imt(cheeseboro, u = u, k = 1:5)