Maximum likelihood estimation using left-censored inter-exceedances times

Calculates maximum likelihood estimates of the extremal index \(\theta\) based on a model for threshold inter-exceedances times of Holesovsky and Fusek (2020). We refer to this as the \(D\)-gaps model, because it uses a tuning parameter \(D\), whereas the related \(K\)-gaps model of Suveges and Davison (2010) has a tuning parameter \(K\).

dgaps(data, u, D = 1, inc_cens = TRUE)

Arguments

data

A numeric vector or numeric matrix of raw data. If data is a matrix then the log-likelihood is constructed as the sum of (independent) contributions from different columns. A common situation is where each column relates to a different year.

If data contains missing values then split_by_NAs is used to divide the data further into sequences of non-missing values, stored in different columns in a matrix. Again, the log-likelihood is constructed as a sum of contributions from different columns.

u

A numeric scalar. Extreme value threshold applied to data.

D

A numeric scalar. The censoring parameter \(D\). Threshold inter-exceedances times that are not larger than D units are left-censored, occurring with probability \(\log(1 - \theta e^{-\theta d})\), where \(d = q D\) and \(q\) is the probability with which the threshold \(u\) is exceeded.

inc_cens

A logical scalar indicating whether or not to include contributions from right-censored inter-exceedance times, relating to the first and last observations. It is known that these times are greater than or equal to the time observed. If data has multiple columns then there will be right-censored first and last inter-exceedance times for each column.

Value

An object (a list) of class c("dgaps", "exdex") containing

theta: The maximum likelihood estimate (MLE) of \(\theta\).
se: The estimated standard error of the MLE, calculated using an algebraic expression for the observed information. If the estimate of \(\theta\) is 0 then se is NA.
se_exp: The estimated standard error of the MLE, calculated using an algebraic expression for the expected information. If the estimate of \(\theta\) is 0 then se_exp is NA. This is provided because cases may be encountered where the observed information is not positive.
ss: The list of summary statistics returned from dgaps_stat.
D, u, inc_cens: The input values of D, u and inc_cens.
max_loglik: The value of the log-likelihood at the MLE.
call: The call to dgaps.

Details

If inc_cens = FALSE then the maximum likelihood estimate of the extremal index \(\theta\) under the \(D\)-gaps model of Holesovsky and Fusek (2020) is calculated. Under this model inter-exceedance times that are less than or equal to \(D\) are left-censored, as a strategy to mitigate model mis-specification resulting from the fact that inter-exceedance times that are equal to 0 are expected under the model but only positive inter-exceedance times can be observed in practice.

If inc_cens = TRUE then information from the right-censored first and last inter-exceedance times are also included in the likelihood to be maximized. For an explanation of the idea see Attalides (2015). The form of the log-likelihood is given in the Details section of dgaps_stat.

It is possible that the estimate of \(\theta\) is equal to 1, and also possible that it is equal to 0. dgaps_stat explains the respective properties of the data that cause these events to occur.

References

Holesovsky, J. and Fusek, M. Estimation of the extremal index using censored distributions. Extremes 23, 197-213 (2020). doi:10.1007/s10687-020-00374-3

Suveges, M. and Davison, A. C. (2010) Model misspecification in peaks over threshold analysis, Annals of Applied Statistics, 4(1), 203-221. doi:10.1214/09-AOAS292

Examples

### S&P 500 index

u <- quantile(sp500, probs = 0.60)
theta <- dgaps(sp500, u = u, D = 1)
theta
#> 
#> Call:
#> dgaps(data = sp500, u = u, D = 1)
#> 
#> Estimate of the extremal index theta:
#>  theta  
#> 0.9692  
summary(theta)
#> 
#> Call:
#> dgaps(data = sp500, u = u, D = 1)
#> 
#>       Estimate Std. Error
#> theta   0.9692    0.01561
coef(theta)
#>     theta 
#> 0.9691789 
nobs(theta)
#> [1] 2901
vcov(theta)
#>              theta
#> theta 0.0002438123
logLik(theta)
#> 'log Lik.' -3658.418 (df=1)

### Newlyn sea surges

u <- quantile(newlyn, probs = 0.60)
theta <- dgaps(newlyn, u = u, D = 2)
theta
#> 
#> Call:
#> dgaps(data = newlyn, u = u, D = 2)
#> 
#> Estimate of the extremal index theta:
#>  theta  
#> 0.1999  
summary(theta)
#> 
#> Call:
#> dgaps(data = newlyn, u = u, D = 2)
#> 
#>       Estimate Std. Error
#> theta   0.1999    0.01176

### Uccle July temperatures

# Using vector input, which merges data from different years
u <- quantile(uccle720$temp, probs = 0.9, na.rm = TRUE)
theta <- dgaps(uccle720$temp, u = u, D = 2)
theta
#> 
#> Call:
#> dgaps(data = uccle720$temp, u = u, D = 2)
#> 
#> Estimate of the extremal index theta:
#>  theta  
#> 0.5152  

# Using matrix input to separate data from different years
u <- quantile(uccle720m, probs = 0.9, na.rm = TRUE)
theta <- dgaps(uccle720m, u = u, D = 2)
theta
#> 
#> Call:
#> dgaps(data = uccle720m, u = u, D = 2)
#> 
#> Estimate of the extremal index theta:
#>  theta  
#> 0.5541