R/dgaps.R
dgaps.Rd
Calculates maximum likelihood estimates of the extremal index \(\theta\) based on a model for threshold inter-exceedances times of Holesovsky and Fusek (2020). We refer to this as the \(D\)-gaps model, because it uses a tuning parameter \(D\), whereas the related \(K\)-gaps model of Suveges and Davison (2010) has a tuning parameter \(K\).
dgaps(data, u, D = 1, inc_cens = TRUE)
A numeric vector or numeric matrix of raw data. If data
is a matrix then the log-likelihood is constructed as the sum of
(independent) contributions from different columns. A common situation is
where each column relates to a different year.
If data
contains missing values then split_by_NAs
is
used to divide the data further into sequences of non-missing values,
stored in different columns in a matrix. Again, the log-likelihood
is constructed as a sum of contributions from different columns.
A numeric scalar. Extreme value threshold applied to data.
A numeric scalar. The censoring parameter \(D\). Threshold
inter-exceedances times that are not larger than D
units are
left-censored, occurring with probability
\(\log(1 - \theta e^{-\theta d})\),
where \(d = q D\) and \(q\) is the probability with which the
threshold \(u\) is exceeded.
A logical scalar indicating whether or not to include
contributions from right-censored inter-exceedance times, relating to the
first and last observations. It is known that these times are greater
than or equal to the time observed.
If data
has multiple columns then there will be right-censored
first and last inter-exceedance times for each column.
An object (a list) of class c("dgaps", "exdex")
containing
theta
The maximum likelihood estimate (MLE) of \(\theta\).
se
The estimated standard error of the MLE, calculated
using an algebraic expression for the observed information. If the
estimate of \(\theta\) is 0 then se
is NA
.
se_exp
The estimated standard error of the MLE,
calculated using an algebraic expression for the expected information.
If the estimate of \(\theta\) is 0 then se_exp
is NA
.
This is provided because cases may be encountered where the observed
information is not positive.
ss
The list of summary statistics returned from
dgaps_stat
.
D, u, inc_cens
The input values of D
,
u
and inc_cens
.
max_loglik
The value of the log-likelihood at the MLE.
call
The call to dgaps
.
If inc_cens = FALSE
then the maximum likelihood estimate of
the extremal index \(\theta\) under the \(D\)-gaps model of
Holesovsky and Fusek (2020) is calculated. Under this model
inter-exceedance times that are less than or equal to \(D\) are
left-censored, as a strategy to mitigate model mis-specification resulting
from the fact that inter-exceedance times that are equal to 0 are expected
under the model but only positive inter-exceedance times can be observed
in practice.
If inc_cens = TRUE
then information from the right-censored
first and last inter-exceedance times are also included in the likelihood
to be maximized.
For an explanation of the idea see Attalides (2015). The form of the
log-likelihood is given in the Details section of
dgaps_stat
.
It is possible that the estimate of \(\theta\) is equal to 1, and also
possible that it is equal to 0. dgaps_stat
explains the
respective properties of the data that cause these events to occur.
Holesovsky, J. and Fusek, M. Estimation of the extremal index using censored distributions. Extremes 23, 197-213 (2020). doi:10.1007/s10687-020-00374-3
Suveges, M. and Davison, A. C. (2010) Model misspecification in peaks over threshold analysis, Annals of Applied Statistics, 4(1), 203-221. doi:10.1214/09-AOAS292
dgaps_confint
to estimate confidence intervals
for \(\theta\).
dgaps_methods
for S3 methods for "dgaps"
objects.
dgaps_imt
for the information matrix test, which
may be used to inform the choice of the pair (u, D
).
choose_ud
for a diagnostic plot based on
dgaps_imt
.
dgaps_stat
for the calculation of sufficient
statistics for the \(D\)-gaps model.
### S&P 500 index
u <- quantile(sp500, probs = 0.60)
theta <- dgaps(sp500, u = u, D = 1)
theta
#>
#> Call:
#> dgaps(data = sp500, u = u, D = 1)
#>
#> Estimate of the extremal index theta:
#> theta
#> 0.9692
summary(theta)
#>
#> Call:
#> dgaps(data = sp500, u = u, D = 1)
#>
#> Estimate Std. Error
#> theta 0.9692 0.01561
coef(theta)
#> theta
#> 0.9691789
nobs(theta)
#> [1] 2901
vcov(theta)
#> theta
#> theta 0.0002438123
logLik(theta)
#> 'log Lik.' -3658.418 (df=1)
### Newlyn sea surges
u <- quantile(newlyn, probs = 0.60)
theta <- dgaps(newlyn, u = u, D = 2)
theta
#>
#> Call:
#> dgaps(data = newlyn, u = u, D = 2)
#>
#> Estimate of the extremal index theta:
#> theta
#> 0.1999
summary(theta)
#>
#> Call:
#> dgaps(data = newlyn, u = u, D = 2)
#>
#> Estimate Std. Error
#> theta 0.1999 0.01176
### Uccle July temperatures
# Using vector input, which merges data from different years
u <- quantile(uccle720$temp, probs = 0.9, na.rm = TRUE)
theta <- dgaps(uccle720$temp, u = u, D = 2)
theta
#>
#> Call:
#> dgaps(data = uccle720$temp, u = u, D = 2)
#>
#> Estimate of the extremal index theta:
#> theta
#> 0.5152
# Using matrix input to separate data from different years
u <- quantile(uccle720m, probs = 0.9, na.rm = TRUE)
theta <- dgaps(uccle720m, u = u, D = 2)
theta
#>
#> Call:
#> dgaps(data = uccle720m, u = u, D = 2)
#>
#> Estimate of the extremal index theta:
#> theta
#> 0.5541