Maximum likelihood estimation using left-censored inter-exceedances times
Source:R/dgaps.R
dgaps.RdCalculates maximum likelihood estimates of the extremal index \(\theta\) based on a model for threshold inter-exceedances times of Holesovsky and Fusek (2020). We refer to this as the \(D\)-gaps model, because it uses a tuning parameter \(D\), whereas the related \(K\)-gaps model of Suveges and Davison (2010) has a tuning parameter \(K\).
Arguments
- data
A numeric vector or numeric matrix of raw data. If
datais a matrix then the log-likelihood is constructed as the sum of (independent) contributions from different columns. A common situation is where each column relates to a different year.If
datacontains missing values thensplit_by_NAsis used to divide the data further into sequences of non-missing values, stored in different columns in a matrix. Again, the log-likelihood is constructed as a sum of contributions from different columns.- u
A numeric scalar. Extreme value threshold applied to data.
- D
A numeric scalar. The censoring parameter \(D\). Threshold inter-exceedances times that are not larger than
Dunits are left-censored, occurring with probability \(\log(1 - \theta e^{-\theta d})\), where \(d = q D\) and \(q\) is the probability with which the threshold \(u\) is exceeded.- inc_cens
A logical scalar indicating whether or not to include contributions from right-censored inter-exceedance times, relating to the first and last observations. It is known that these times are greater than or equal to the time observed. If
datahas multiple columns then there will be right-censored first and last inter-exceedance times for each column.
Value
An object (a list) of class c("dgaps", "exdex") containing
thetaThe maximum likelihood estimate (MLE) of \(\theta\).
seThe estimated standard error of the MLE, calculated using an algebraic expression for the observed information. If the estimate of \(\theta\) is 0 then
seisNA.se_expThe estimated standard error of the MLE, calculated using an algebraic expression for the expected information. If the estimate of \(\theta\) is 0 then
se_expisNA. This is provided because cases may be encountered where the observed information is not positive.ssThe list of summary statistics returned from
dgaps_stat.D, u, inc_censThe input values of
D,uandinc_cens.max_loglikThe value of the log-likelihood at the MLE.
callThe call to
dgaps.
Details
If inc_cens = FALSE then the maximum likelihood estimate of
the extremal index \(\theta\) under the \(D\)-gaps model of
Holesovsky and Fusek (2020) is calculated. Under this model
inter-exceedance times that are less than or equal to \(D\) are
left-censored, as a strategy to mitigate model mis-specification resulting
from the fact that inter-exceedance times that are equal to 0 are expected
under the model but only positive inter-exceedance times can be observed
in practice.
If inc_cens = TRUE then information from the right-censored
first and last inter-exceedance times are also included in the likelihood
to be maximized.
For an explanation of the idea see Attalides (2015). The form of the
log-likelihood is given in the Details section of
dgaps_stat.
It is possible that the estimate of \(\theta\) is equal to 1, and also
possible that it is equal to 0. dgaps_stat explains the
respective properties of the data that cause these events to occur.
References
Holesovsky, J. and Fusek, M. Estimation of the extremal index using censored distributions. Extremes 23, 197-213 (2020). doi:10.1007/s10687-020-00374-3
Suveges, M. and Davison, A. C. (2010) Model misspecification in peaks over threshold analysis, Annals of Applied Statistics, 4(1), 203-221. doi:10.1214/09-AOAS292
See also
dgaps_confint to estimate confidence intervals
for \(\theta\).
dgaps_methods for S3 methods for "dgaps"
objects.
dgaps_imt for the information matrix test, which
may be used to inform the choice of the pair (u, D).
choose_ud for a diagnostic plot based on
dgaps_imt.
dgaps_stat for the calculation of sufficient
statistics for the \(D\)-gaps model.
Examples
### S&P 500 index
u <- quantile(sp500, probs = 0.60)
theta <- dgaps(sp500, u = u, D = 1)
theta
#>
#> Call:
#> dgaps(data = sp500, u = u, D = 1)
#>
#> Estimate of the extremal index theta:
#> theta
#> 0.9692
summary(theta)
#>
#> Call:
#> dgaps(data = sp500, u = u, D = 1)
#>
#> Estimate Std. Error
#> theta 0.9692 0.01561
coef(theta)
#> theta
#> 0.9691789
nobs(theta)
#> [1] 2901
vcov(theta)
#> theta
#> theta 0.0002438123
logLik(theta)
#> 'log Lik.' -3658.418 (df=1)
### Newlyn sea surges
u <- quantile(newlyn, probs = 0.60)
theta <- dgaps(newlyn, u = u, D = 2)
theta
#>
#> Call:
#> dgaps(data = newlyn, u = u, D = 2)
#>
#> Estimate of the extremal index theta:
#> theta
#> 0.1999
summary(theta)
#>
#> Call:
#> dgaps(data = newlyn, u = u, D = 2)
#>
#> Estimate Std. Error
#> theta 0.1999 0.01176
### Uccle July temperatures
# Using vector input, which merges data from different years
u <- quantile(uccle720$temp, probs = 0.9, na.rm = TRUE)
theta <- dgaps(uccle720$temp, u = u, D = 2)
theta
#>
#> Call:
#> dgaps(data = uccle720$temp, u = u, D = 2)
#>
#> Estimate of the extremal index theta:
#> theta
#> 0.5152
# Using matrix input to separate data from different years
u <- quantile(uccle720m, probs = 0.9, na.rm = TRUE)
theta <- dgaps(uccle720m, u = u, D = 2)
theta
#>
#> Call:
#> dgaps(data = uccle720m, u = u, D = 2)
#>
#> Estimate of the extremal index theta:
#> theta
#> 0.5541