Fits a GEV distribution to block maxima using maximum likelihood estimation, with the option to make an adjustment for the numbers of non-missing raw values in each block. The GEV location and scale parameters are adjusted to reflect the proportion of raw values that are missing.
Arguments
- data
Either
a numeric vector containing a time series of raw data,
an object returned from
block_maxima, a list with componentsmaxima,notNAandn,a data frame or named list containing the same information, that is, the variables
maxima,notNAandn, as an object returned fromblock_maxima, such as the data frameBrestSurgeMaxima.
- block_length
A numeric scalar. Used calculate the maxima of disjoint blocks of
block_lengthcontiguous values in the vectordata. Iflength(data)is not an integer multiple ofblock_lengththen the values at the end ofdatathat do not constitute a complete block of lengthblock_lengthare discarded, without warning.- block
A numeric vector with the same length as
data. The value ofblock[i]indicates the block into whichdata[i]falls. For example,blockcould provide the year in which observationiwas observed.- adjust
A logical scalar or a numeric scalar in
[0, 100].If
adjust = TRUEthen the adjustment, described in Details, for the numbers of non-missing values underlying each block maximum is performed.If
adjust = FALSEthen no adjustment is made, that is, the block maxima are treated as if the underlying raw data have no missing values.
- discard
A numeric scalar. Any block maximum for which greater than
discardpercent of the underlying raw values were missing is discarded. Whether or not an adjustment for missingness is made for the block maxima that remain is determined byadjust.- init
Either a character scalar, one of
"quartiles"or"moments", or a numeric vector of length 3 giving initial estimates of the GEV location, scale and shape parameters: \(\mu\), \(\sigma\) and \(\xi\). Ifinit = "quartiles"then initial estimates of \(\mu\) and \(\sigma\) are based on sample quartiles of block maxima, ignoring the underlying numbers of non-missing raw data, and a value of 0 for \(\xi\). Ifinit = "moments"then instead we use the sample mean and variance of these maxima and an initial value of 0.1 for \(\xi\).- ...
Further arguments to be passed to
stats::optim.
Value
A list, returned from stats::optim (the MLEs are in the
component par), with the additional components:
loglik: value of the maximised log-likelihood.vcov: estimated variance-covariance matrix of the parameters.se: estimated standard errors of the parameters.maxima: the vector of block maxima used to fit the model.notNA: the number of non-missing raw values on which the maxima inmaximaare based.n: the maximal block length, that is, the largest number of values that could have been observed in each of these blocks.adjust,discard: the values of these input arguments.
The call to gev_mle is provided in the attribute "call".
The class of the returned object is c("evmissing", "mle", "list").
Objects inheriting from class "evmissing" have coef, logLik, nobs,
summary, vcov and confint methods. See evmissing_methods.
Details
If data is numeric vector then exactly one of the arguments
block_length or block must be supplied. The parameters are fitted
using maximum likelihood estimation.
The adjustment for the numbers of non-missing values underlying the block maxima is based on the strong assumption that missing values occur completely at random. We suppose that a block maximum \(M_n\) based on a full (no missing raw values) block of length \(n\) has a \(\text{GEV}(\mu, \sigma, \xi)\) distribution, with distribution function \(G(x)\). Let \(n_i\) be the number of non-missing values in block \(i\) and let \(M_{n_i}\) denote the block maximum of such a block. We suppose that \(M_{n_i}\) has a \(\text{GEV}(\mu(n_i), \sigma(n_i), \xi)\) distribution, where $$\mu(n_i) = \mu + \sigma [(n_i/n)^\xi -1] / \xi,$$ $$\sigma(n_i) = \sigma (n_i/n)^\xi.$$
These expressions are based on inferring the parameters of an approximate GEV distribution for \(M_{n_i}\) from its approximate distribution function \([G(x)]^{n_i/n}\).
A likelihood is constructed as the product of contributions from the maxima from distinct blocks, under the assumption that these maxima are independent. Let \(\theta = (\mu, \sigma, \xi)\) and let \(\ell_F(\underline{z}; \theta)\) denote the usual, unadjusted, GEV log-likelihood for the full-data case where there are no missing values. It can be shown that our adjusted log-likelihood \(\ell(\theta, \underline{z})\) is given by
$$\ell(\theta, \underline{z}) = \ell_F(\underline{z}; \theta) - \sum_{i=1}^n p_i \log G(z_i; \theta)$$
where \(p_i = 1 - n_i / n\) is the proportion of missing values in block \(i\).
The negated log-likelihood is minimised using a call to
stats::optim with hessian = TRUE. If stats::optim throws an error
then a warning is produced and the returned object has NA values for
the components par, loglik, vcov and se and an extra component
optim_error containing the error message. If the estimated observed
information matrix is singular then a warning is produced and the returned
object has NA values for the components vcov and se.
Examples
## Simulate raw data from an exponential distribution
set.seed(13032025)
blocks <- 50
block_length <- 365
sdata <- sim_data(blocks = blocks, block_length = block_length)
# sdata$data_full have no missing values
# sdata$data_miss have had missing values created artificially
# Fit a GEV distribution to block maxima from the full data
fit1 <- gev_mle(sdata$data_full, block_length = sdata$block_length)
summary(fit1)
#>
#> Call:
#> gev_mle(data = sdata$data_full, block_length = sdata$block_length)
#>
#> Estimate Std. Error
#> mu 5.82700 0.1769
#> sigma 1.08400 0.1306
#> xi -0.01449 0.1243
# An identical fit supplying the block indicator instead of block_length
fit1b <- gev_mle(sdata$data_full, block = sdata$block)
summary(fit1b)
#>
#> Call:
#> gev_mle(data = sdata$data_full, block = sdata$block)
#>
#> Estimate Std. Error
#> mu 5.82700 0.1769
#> sigma 1.08400 0.1306
#> xi -0.01449 0.1243
# Make adjustment for the numbers of non-missing values per block
fit2 <- gev_mle(sdata$data_miss, block_length = sdata$block_length)
summary(fit2)
#>
#> Call:
#> gev_mle(data = sdata$data_miss, block_length = sdata$block_length)
#>
#> Estimate Std. Error
#> mu 5.97300 0.1907
#> sigma 1.07400 0.1157
#> xi -0.05945 0.1177
# Do not make the adjustment
fit3 <- gev_mle(sdata$data_miss, block_length = sdata$block_length,
adjust = FALSE)
summary(fit3)
#>
#> Call:
#> gev_mle(data = sdata$data_miss, block_length = sdata$block_length,
#> adjust = FALSE)
#>
#> Estimate Std. Error
#> mu 5.66300 0.1785
#> sigma 1.10600 0.1295
#> xi -0.06951 0.1152
# Remove all block maxima with greater than 25% missing values and
# do not make the adjustment
fit4 <- gev_mle(sdata$data_miss, block_length = sdata$block_length,
adjust = FALSE, discard = 25)
summary(fit4)
#>
#> Call:
#> gev_mle(data = sdata$data_miss, block_length = sdata$block_length,
#> adjust = FALSE, discard = 25)
#>
#> Estimate Std. Error
#> mu 5.67200 0.2454
#> sigma 1.08200 0.1845
#> xi -0.01368 0.1880
## Plymouth ozone data
# Calculate the values in Table 3 of Simpson and Northrop (2025)
# discard = 50 is chosen to discard data from 2001 and 2006
fit1 <- gev_mle(PlymouthOzoneMaxima)
fit2 <- gev_mle(PlymouthOzoneMaxima, adjust = FALSE)
fit3 <- gev_mle(PlymouthOzoneMaxima, discard = 50)
fit4 <- gev_mle(PlymouthOzoneMaxima, adjust = FALSE, discard = 50)
se <- function(x) return(sqrt(diag(vcov(x))))
MLEs <- cbind(coef(fit1), coef(fit2), coef(fit3), coef(fit4))
SEs <- cbind(se(fit1), se(fit2), se(fit3), se(fit4))
round(MLEs, 2)
#> [,1] [,2] [,3] [,4]
#> mu 128.77 126.52 129.55 128.59
#> sigma 18.81 25.50 17.65 17.78
#> xi 0.00 -0.28 0.04 0.04
round(SEs, 2)
#> [,1] [,2] [,3] [,4]
#> mu 4.40 5.53 4.60 4.55
#> sigma 2.63 4.00 3.41 3.61
#> xi 0.16 0.15 0.26 0.27