Weighted GEV ML Inference with Adjustment for Missing Data

Fits a GEV distribution to block maxima using maximum likelihood estimation, with the option to make an adjustment for the numbers of non-missing raw values in each block using one of the two weighting schemes proposed in McVittie and Murphy (2025).

Usage

gev_weighted(data, scheme = 1, block_length, block, init = "quartiles", ...)

Arguments

data

Either

a numeric vector containing a time series of raw data,
an object returned from block_maxima, a list with components maxima, notNA and n,
a data frame or named list containing the same information (variables maxima, notNA and n) as an object returned from block_maxima, such as the data frame BrestSurgeMaxima.

scheme

A numeric scalar. Pass scheme = 1 for the first weighting scheme, or scheme = 2 for the second weighting scheme, in McVittie and Murphy (2025). Any value other than 1 or 2 result in an unweighted fit, that is, all weight are set to 1. See Details.

block_length

A numeric scalar. Used calculate the maxima of disjoint blocks of block_length contiguous values in the vector data. If length(data) is not an integer multiple of block_length then the values at the end of data that do not constitute a complete block of length block_length are discarded, without warning.

block

A numeric vector with the same length as data. The value of block[i] indicates the block into which data[i] falls. For example, block could provide the year in which observation i was observed.

init

Either a character scalar, one of "quartiles" or "moments", or a numeric vector of length 3 giving initial estimates of the GEV location, scale and shape parameters: \(\mu\), \(\sigma\) and \(\xi\). If init = "quartiles" then initial estimates of \(\mu\) and \(\sigma\) are based on sample quartiles of block maxima, ignoring the underlying numbers of non-missing raw data, and a value of 0 for \(\xi\). If init = "moments" then instead we use the sample mean and variance of these maxima and an initial value of 0.1 for \(\xi\).

...

Further arguments to be passed to stats::optim.

Value

A list, returned from stats::optim (the MLEs are in the component par), with the additional components:

loglik: value of the maximised log-likelihood.
vcov: estimated variance-covariance matrix of the parameters.
se: estimated standard errors of the parameters.
maxima: the vector of block maxima used to fit the model.
notNA: the number of non-missing raw values on which the maxima in maxima are based.
n: the maximal block length, that is, the largest number of values that could have been observed in each of these blocks.
weights: the weights used in the fitting.

The call to gev_mle is provided in the attribute "call".

The class of the returned object is c("evmissing", "weighted_mle", "list").

Objects inheriting from class "evmissing" have coef, logLik, nobs, summary, vcov and confint methods. See evmissing_methods.

Details

Suppose that a full (no missing values) block will contain \(n\) raw values. Let \(n_i\) be the number of non-missing values, and \(m_i\) the observed block maximum, in block \(i\). The contribution of block maximum \(m_i\) to the GEV log-likelihood is weighted (multiplied) by the weight \(w_i\). The set of weights depends on the weighting scheme chosen by scheme as follows.

If scheme = 1 then \(w_i = n_i / n\), for \(i = 1, ..., n\).
If scheme = 2 then \(w_i = \hat{F}(m_i) ^ {n - n_i}\), for \(i = 1, ..., n\), where \(\hat{F}\) is the empirical distribution function of \(m_1, ..., m_n\).

For any other value of scheme all weights are set to 1, that is, an unweighted fit is performed.

For each weighting scheme, the larger the number \(n - n_i\) of missing values the smaller the weight. See McVittie and Murphy (2025) for further details.

References

McVittie, J. H. and Murphy, O. A. (2025) Weighted Parameter Estimators of the Generalized Extreme Value Distribution in the Presence of Missing Observations. doi:10.48550/arXiv.2506.15964

Examples

## Simulate raw data from an exponential distribution

set.seed(13032025)
blocks <- 50
block_length <- 365
sdata <- sim_data(blocks = blocks, block_length = block_length)

# sdata$data_full have no missing values
# sdata$data_miss have had missing values created artificially

## Fits to full data: fit0, fit 1 and fit2 should give the same results

# Fit a GEV distribution to block maxima from the full data
fit0 <- gev_mle(sdata$data_full, block_length = sdata$block_length)
summary(fit0)
#> 
#> Call:
#> gev_mle(data = sdata$data_full, block_length = sdata$block_length)
#> 
#>       Estimate Std. Error
#> mu     5.82700     0.1769
#> sigma  1.08400     0.1306
#> xi    -0.01449     0.1243

# Fit to the full data using weighting scheme 1
fit1 <- gev_weighted(sdata$data_full, scheme = 1,
                     block_length = sdata$block_length)
summary(fit1)
#> 
#> Call:
#> gev_weighted(data = sdata$data_full, scheme = 1, block_length = sdata$block_length)
#> 
#>       Estimate Std. Error
#> mu     5.82700     0.1769
#> sigma  1.08400     0.1306
#> xi    -0.01449     0.1243

# Fit to the full data using weighting scheme 2
fit2 <- gev_weighted(sdata$data_full, scheme = 2,
                     block_length = sdata$block_length)
summary(fit2)
#> 
#> Call:
#> gev_weighted(data = sdata$data_full, scheme = 2, block_length = sdata$block_length)
#> 
#>       Estimate Std. Error
#> mu     5.82700     0.1769
#> sigma  1.08400     0.1306
#> xi    -0.01449     0.1243

## Fits to the data with missings

#  Make adjustment for the numbers of non-missing values per block
fit0 <- gev_mle(sdata$data_miss, block_length = sdata$block_length)
summary(fit0)
#> 
#> Call:
#> gev_mle(data = sdata$data_miss, block_length = sdata$block_length)
#> 
#>       Estimate Std. Error
#> mu     5.97300     0.1907
#> sigma  1.07400     0.1157
#> xi    -0.05945     0.1177

# Make adjustment using weighting scheme 1
fit1 <- gev_weighted(sdata$data_miss, scheme = 1,
                     block_length = sdata$block_length)
summary(fit1)
#> 
#> Call:
#> gev_weighted(data = sdata$data_miss, scheme = 1, block_length = sdata$block_length)
#> 
#>       Estimate Std. Error
#> mu     5.67500     0.2054
#> sigma  1.10200     0.1494
#> xi    -0.07253     0.1350

# Make adjustment using weighting scheme 2
fit2 <- gev_weighted(sdata$data_miss, scheme = 2,
                     block_length = sdata$block_length)
summary(fit2)
#> 
#> Call:
#> gev_weighted(data = sdata$data_miss, scheme = 2, block_length = sdata$block_length)
#> 
#>       Estimate Std. Error
#> mu     5.97000     0.2023
#> sigma  1.10100     0.1464
#> xi    -0.09054     0.1321