Performs the information matrix test (IMT) of Suveges and Davison (2010) to diagnose misspecification of the \(K\)-gaps model.
kgaps_imt(data, u, k = 1, inc_cens = TRUE)
A numeric vector or numeric matrix of raw data. If data
is a matrix then the log-likelihood is constructed as the sum of
(independent) contributions from different columns. A common situation is
where each column relates to a different year.
If data
contains missing values then split_by_NAs
is
used to divide the data into sequences of non-missing values.
Numeric vectors. u
is a vector of extreme value
thresholds applied to data. k
is a vector of values of the run
parameter \(K\), as defined in Suveges and Davison (2010).
See kgaps
for more details.
Any values in u
that are greater than all the observations in
data
will be removed without a warning being given.
A logical scalar indicating whether or not to include contributions from censored inter-exceedance times, relating to the first and last observations. See Attalides (2015) for details.
An object (a list) of class c("kgaps_imt", "exdex")
containing
A length(u)
by length(k)
numeric matrix.
Column i contains, for \(K\) = k[i]
, the values of the
information matrix test statistic for the set of thresholds in
u
. The column names are the values in k
.
The row names are the approximate empirical percentage quantile levels
of the thresholds in u
.
A length(u)
by length(k)
numeric matrix
containing the corresponding \(p\)-values for the test.
A length(u)
by length(k)
numeric matrix
containing the corresponding estimates of \(\theta\).
The input u
and k
.
The \(K\)-gaps IMT is performed a over grid of all
combinations of threshold and \(K\) in the vectors u
and k
. If the estimate of \(\theta\) is 0 then the
IMT statistic, and its associated \(p\)-value is NA
.
For details of the IMT see Suveges and Davison
(2010). There are some typing errors on pages 18-19 that have been
corrected in producing the code: the penultimate term inside {...}
in the middle equation on page 18 should be \((c_j(K))^2\), as should
the penultimate term in the first equation on page 19; the {...}
bracket should be squared in the 4th equation on page 19; the factor
\(n\) should be \(N-1\) in the final equation on page 19.
Suveges, M. and Davison, A. C. (2010) Model misspecification in peaks over threshold analysis, Annals of Applied Statistics, 4(1), 203-221. doi:10.1214/09-AOAS292
Attalides, N. (2015) Threshold-based extreme value modelling, PhD thesis, University College London. https://discovery.ucl.ac.uk/1471121/1/Nicolas_Attalides_Thesis.pdf
### Newlyn sea surges
u <- quantile(newlyn, probs = seq(0.1, 0.9, by = 0.1))
imt <- kgaps_imt(newlyn, u = u, k = 1:5)
### S&P 500 index
u <- quantile(sp500, probs = seq(0.1, 0.9, by = 0.1))
imt <- kgaps_imt(sp500, u = u, k = 1:5)
### Cheeseboro wind gusts (a matrix containing some NAs)
probs <- c(seq(0.5, 0.98, by = 0.025), 0.99)
u <- quantile(cheeseboro, probs = probs, na.rm = TRUE)
imt <- kgaps_imt(cheeseboro, u = u, k = 1:5)