Fit a Generalized Extreme value (GEV) GAMLSS Model

Fits a Generalized Additive Model (GAM) for Location, Scale and Shape with a GEV response distribution, using the function gamlss::gamlss().

Usage

fitGEV(
  formula,
  data,
  scoring = c("fisher", "quasi"),
  mu.link = "identity",
  sigma.link = "log",
  xi.link = "identity",
  stepLength = 1,
  stepAttempts = 2,
  stepReduce = 2,
  steps = FALSE,
  ...
)

Arguments

formula: a formula object, with the response on the left of an ~ operator, and the terms, separated by \(+\) operators, on the right. Nonparametric smoothing terms are indicated by pb() for penalised beta splines, cs for smoothing splines, lo for loess smooth terms and random or ra for random terms, e.g. y~cs(x,df=5)+x1+x2*x3. Additional smoothers can be added by creating the appropriate interface. Interactions with nonparametric smooth terms are not fully supported, but will not produce errors; they will simply produce the usual parametric interaction
data: a data frame containing the variables occurring in the formula, e.g. data=aids. If this is missing, the variables should be on the search list.
scoring: A character scalar. If scoring = "fisher" then the weights used in the fitting algorithm are based on the expected Fisher information, that is, a Fisher's scoring algorithm is used. If scoring = "quasi" then these weights are based on the cross products of the first derivatives of the log-likelihood, leading to a quasi Newton scoring algorithm.
mu.link, sigma.link, xi.link: Character scalars to set the respective link functions for the location (mu), scale (sigma) and shape (xi) parameters. The latter is passed to gamlss::gamlss() as nu.link.
stepLength: A numeric vector of positive values. The initial values of the step lengths mu.step, sigma.step and nu.step passed to gamlss::gamlss.control() in the first attempt to fit the model by calling gamlss::gamlss(). If stepLength has a length that is less than 3 then stepLength is recycled to have length 3.
stepAttempts: A non-negative integer. If the first call to gamlss::gamlss() throws an error then we make stepAttempts further attempts to fit the model, each time dividing by 2 the values of mu.step, sigma.step and nu.step supplied to gamlss::gamlss.control(). If stepAttempts < 1 then no further attempts are made.
stepReduce: A number greater than 1. The factor by which the step lengths in stepLength are reduced for each extra attempt to fit the model. The default, stepReduce = 2 means that the step lengths are halved for each extra attempt.
steps: A logical scalar. Pass steps = TRUE to write to the console the current value of stepLength for each call to gamlss::gamlss().
...: Further arguments passed to gamlss::gamlss(), in particular method, which sets the fitting algorithm, with options RS(), CG() or mixed(). The default, method = RS() seems to work well, as does method = mixed(). In contrast, method = CG() often requires the step length to be reduced before convergence is achieved. fitGEV() attempts to do this automatically. See stepAttempts. Pass trace = FALSE (to gamlss::gamlss.control()) to avoid writing to the console the global deviance after each outer iteration of the gamlss fitting algorithm.

Value

Returns a gamlss object. See the Value section of gamlss::gamlss(). The class of the returned object is c("gamlssx", "gamlss", "gam", "glm", "lm").

Details

See gamlss::gamlss() for information about the model and the fitting algorithm.

Examples

# Load gamlss, for the function pb()
library(gamlss)
#> Loading required package: splines
#> Loading required package: gamlss.data
#> 
#> Attaching package: 'gamlss.data'
#> The following object is masked from 'package:datasets':
#> 
#>     sleep
#> Loading required package: gamlss.dist
#> Loading required package: nlme
#> Loading required package: parallel
#>  **********   GAMLSS Version 5.4-22  ********** 
#> For more on GAMLSS look at https://www.gamlss.com/
#> Type gamlssNews() to see new features/changes/bug fixes.

##### Simulated data

set.seed(17012023)
n <- 100
x <- stats::runif(n)
mu <- 1 + 2 * x
sigma <- 1
xi <- 0.25
y <- nieve::rGEV(n = 1, loc = mu, scale = sigma, shape = xi)
data <- data.frame(y = as.numeric(y), x = x)
plot(x, y)


# Fit model using the default RS method with Fisher's scoring
mod <- fitGEV(y ~ gamlss::pb(x), data = data)
#> GAMLSS-RS iteration 1: Global Deviance = 323.1515 
#> GAMLSS-RS iteration 2: Global Deviance = 320.4994 
#> GAMLSS-RS iteration 3: Global Deviance = 320.1414 
#> GAMLSS-RS iteration 4: Global Deviance = 320.0911 
#> GAMLSS-RS iteration 5: Global Deviance = 320.084 
#> GAMLSS-RS iteration 6: Global Deviance = 320.0831 
# Summary of model fit
summary(mod)
#> ******************************************************************
#> Family:  c("GEV", "Generalized Extreme Value") 
#> 
#> Call:  
#> gamlss::gamlss(formula = y ~ gamlss::pb(x), family = GEVfisher(mu.link = "identity",  
#>     sigma.link = "log", nu.link = "identity"), data = data,  
#>     mu.step = c(1, 1, 1)[1], sigma.step = c(1, 1, 1 
#>     )[2], nu.step = c(1, 1, 1)[3]) 
#> 
#> Fitting method: RS() 
#> 
#> ------------------------------------------------------------------
#> Mu link function:  identity
#> Mu Coefficients:
#>               Estimate Std. Error t value Pr(>|t|)    
#> (Intercept)     1.2726     0.1921   6.626 2.00e-09 ***
#> gamlss::pb(x)   1.3830     0.3193   4.332 3.63e-05 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> ------------------------------------------------------------------
#> Sigma link function:  log
#> Sigma Coefficients:
#>             Estimate Std. Error t value Pr(>|t|)
#> (Intercept) -0.08760    0.09058  -0.967    0.336
#> 
#> ------------------------------------------------------------------
#> Nu link function:  identity 
#> Nu Coefficients:
#>             Estimate Std. Error t value Pr(>|t|)  
#> (Intercept)  0.19264    0.08477   2.272   0.0253 *
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> ------------------------------------------------------------------
#> No. of observations in the fit:  100 
#> Degrees of Freedom for the fit:  4
#>       Residual Deg. of Freedom:  96 
#>                       at cycle:  6 
#>  
#> Global Deviance:     320.0831 
#>             AIC:     328.0831 
#>             SBC:     338.5037 
#> ******************************************************************
# Residual diagnostic plots
plot(mod, xlab = "x", ylab = "y")

#> ******************************************************************
#> 	      Summary of the Quantile Residuals
#>                            mean   =  -0.0005916633 
#>                        variance   =  1.009922 
#>                coef. of skewness  =  -0.007492234 
#>                coef. of kurtosis  =  2.78055 
#> Filliben correlation coefficient  =  0.9973717 
#> ******************************************************************
# Data plus fitted curve
plot(data$x, data$y, xlab = "x", ylab = "y")
lines(data$x, fitted(mod))


# Fit model using the mixed method and quasi-Newton scoring
# Use trace = FALSE to prevent writing the global deviance to the console
mod <- fitGEV(y ~ pb(x), data = data, method = mixed(), scoring = "quasi",
              trace = FALSE)

# Fit model using the CG method
# The default step length of 1 needs to be reduced to enable convergence
# Use steps = TRUE to write the step lengths to the console
mod <- fitGEV(y ~ pb(x), data = data, method = CG(), steps = TRUE)
#> stepLength = 1 1 1 
#> stepLength = 0.5 0.5 0.5 
#> GAMLSS-CG iteration 1: Global Deviance = 337.4163 
#> GAMLSS-CG iteration 2: Global Deviance = 322.5637 
#> GAMLSS-CG iteration 3: Global Deviance = 321.03 
#> GAMLSS-CG iteration 4: Global Deviance = 320.4416 
#> GAMLSS-CG iteration 5: Global Deviance = 320.2167 
#> GAMLSS-CG iteration 6: Global Deviance = 320.1324 
#> GAMLSS-CG iteration 7: Global Deviance = 320.1011 
#> GAMLSS-CG iteration 8: Global Deviance = 320.0896 
#> GAMLSS-CG iteration 9: Global Deviance = 320.0855 
#> GAMLSS-CG iteration 10: Global Deviance = 320.0839 
#> GAMLSS-CG iteration 11: Global Deviance = 320.0834 

##### Fremantle annual maximum sea levels
##### See also the gamlssx package README file

# Transform Year so that it is centred on 0
fremantle <- transform(fremantle, cYear = Year - median(Year))

# Plot sea level against year and against SOI
plot(fremantle$Year, fremantle$SeaLevel, xlab = "year", ylab = "sea level (m)")

plot(fremantle$SOI, fremantle$SeaLevel, xlab = "SOI", ylab = "sea level (m)")


# Fit a model with P-spline effects of cYear and SOI on location and scale
# The default links are identity for location and log for scale
mod <- fitGEV(SeaLevel ~ pb(cYear) + pb(SOI),
             sigma.formula = ~ pb(cYear) + pb(SOI),
             data = fremantle)
#> Error in fitGEV(SeaLevel ~ pb(cYear) + pb(SOI), sigma.formula = ~pb(cYear) +     pb(SOI), data = fremantle): No convergence. An error was thrown from the last call to gamlss()

# Summary of model fit
summary(mod)
#> ******************************************************************
#> Family:  c("GEV", "Generalized Extreme Value") 
#> 
#> Call:  
#> gamlss::gamlss(formula = y ~ pb(x), family = GEVfisher(mu.link = "identity",  
#>     sigma.link = "log", nu.link = "identity"), data = data,  
#>     method = CG(), mu.step = c(0.5, 0.5, 0.5)[1], sigma.step = c(0.5,  
#>     0.5, 0.5)[2], nu.step = c(0.5, 0.5, 0.5)[3]) 
#> 
#> Fitting method: CG() 
#> 
#> ------------------------------------------------------------------
#> Mu link function:  identity
#> Mu Coefficients:
#>             Estimate Std. Error t value Pr(>|t|)    
#> (Intercept)   1.2746     0.1923   6.628 1.99e-09 ***
#> pb(x)         1.3801     0.3198   4.315 3.88e-05 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> ------------------------------------------------------------------
#> Sigma link function:  log
#> Sigma Coefficients:
#>             Estimate Std. Error t value Pr(>|t|)
#> (Intercept) -0.08693    0.09060  -0.959     0.34
#> 
#> ------------------------------------------------------------------
#> Nu link function:  identity 
#> Nu Coefficients:
#>             Estimate Std. Error t value Pr(>|t|)  
#> (Intercept)  0.19171    0.08467   2.264   0.0258 *
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> ------------------------------------------------------------------
#> NOTE: Additive smoothing terms exist in the formulas: 
#>  i) Std. Error for smoothers are for the linear effect only. 
#> ii) Std. Error for the linear terms maybe are not accurate. 
#> ------------------------------------------------------------------
#> No. of observations in the fit:  100 
#> Degrees of Freedom for the fit:  4.000271
#>       Residual Deg. of Freedom:  95.99973 
#>                       at cycle:  11 
#>  
#> Global Deviance:     320.0834 
#>             AIC:     328.0839 
#>             SBC:     338.5053 
#> ******************************************************************
# Model diagnostic plots
plot(mod)

#> ******************************************************************
#> 	      Summary of the Quantile Residuals
#>                            mean   =  -0.0006504536 
#>                        variance   =  1.009723 
#>                coef. of skewness  =  -0.004767227 
#>                coef. of kurtosis  =  2.781026 
#> Filliben correlation coefficient  =  0.99738 
#> ******************************************************************
# Worm plot
wp(mod)
#> Error in as.environment(DaTa): invalid object for 'as.environment'
# Plot of the fitted component smooth functions
# Note: gamlss::term.plot() does not include uncertainty about the intercept
# Location mu
term.plot(mod, rug = TRUE, pages = 1)

# Scale sigma
term.plot(mod, what = "sigma", rug = TRUE, pages = 1)
#> Error in term.plot(mod, what = "sigma", rug = TRUE, pages = 1): The model for sigma has only the constant fitted