vignettes/revdbayes-b-using-rcpp-vignette.Rmd
revdbayes-b-using-rcpp-vignette.Rmd
This vignette introduces a new feature of revdbayes:
reducing posterior simulation time by performing the most time-consuming
tasks using C++ functions. This achieved using a new facility in the
rust package (Northrop 2017),
which in turn uses the Rcpp package (Eddelbuettel and Francois 2011). The result is
a new function rpost_rcpp
, which has the same structure as
the existing function rpost
. From a user’s perspective the
only difference between these two functions occurs if they wish to
supply their own prior distribution: rpost_rcpp
requires an
external pointer to a C++ function (see Providing a
user-defined prior), whereas rpost
requires an input R
function (see the vignette Introducing revdbayes.
Before we deal with user-supplied priors we compare posterior
simulation times using rpost
and rpost_rcpp
for examples based on in-built prior distributions. We use the default
settings of rpost
and rpost_rcpp
throughout.
We also compare the speed of these functions with the function
posterior
in the evdbayes package (Stephenson and Ribatet 2014), using the
microbenchmark package (Mersmann
2015).
library(revdbayes)
# Is the microbenchmark package available?
got_microbenchmark <- requireNamespace("microbenchmark", quietly = TRUE)
if (got_microbenchmark) {
library(microbenchmark)
}
# Set the number of posterior samples required.
n <- 1000
set.seed(46)
We repeat the analysis of the Gulf of Mexico Wave Height Data from the Introducing revdbayes vignette to check that using Rcpp does indeed reduce computation time.
u <- quantile(gom, probs = 0.65)
fp <- set_prior(prior = "flat", model = "gp", min_xi = -1)
if (got_microbenchmark) {
res <- microbenchmark(
rpost = rpost(n = n, model = "gp", prior = fp, thresh = u, data = gom),
rpost_rcpp = rpost_rcpp(n = n, model = "gp", prior = fp, thresh = u,
data = gom)
)
print(res, signif = 3)
options(microbenchmark.unit = "relative")
print(res, signif = 2)
}
#> Unit: milliseconds
#> expr min lq mean median uq max neval
#> rpost 83.8 89.4 105.0 95.1 103.0 409.0 100
#> rpost_rcpp 16.4 17.4 21.5 18.6 21.7 97.4 100
#> Unit: relative
#> expr min lq mean median uq max neval
#> rpost 5.1 5.1 4.9 5.1 4.7 4.2 100
#> rpost_rcpp 1.0 1.0 1.0 1.0 1.0 1.0 100
In this example rpost_rcpp
is indeed much faster than
rpost
.
We repeat the analysis of the Port Pirie annual maximum sea level data from the Introducing revdbayes.
mat <- diag(c(10000, 10000, 100))
pn <- set_prior(prior = "norm", model = "gev", mean = c(0,0,0), cov = mat)
if (got_microbenchmark) {
res <- microbenchmark(
rpost = rpost(n = n, model = "gev", prior = pn, data = portpirie),
rpost_rcpp = rpost_rcpp(n = n, model = "gev", prior = pn,
data = portpirie)
)
}
options(microbenchmark.unit = NULL)
print(res, signif = 3)
#> Unit: milliseconds
#> expr min lq mean median uq max neval
#> rpost 216.0 230.0 248 236.0 252.0 439 100
#> rpost_rcpp 58.3 63.6 71 67.1 72.3 174 100
options(microbenchmark.unit = "relative")
print(res, signif = 2)
#> Unit: relative
#> expr min lq mean median uq max neval
#> rpost 3.7 3.6 3.5 3.5 3.5 2.5 100
#> rpost_rcpp 1.0 1.0 1.0 1.0 1.0 1.0 100
Comparison to the example calculations that feature in the
evdbayes user guide, based on the
posterior
function, are not shown because
evdbayes is archived on CRAN. This comparison shows
that rpost_rcpp
is approximately a factor 3 faster than
posterior
. This comparison is generous to
posterior
because the burn-in was set to zero and
posterior
produces a dependent sample rather than a random
sample. The effective sample size of an MCMC sample from
posterior
varies between simulations and across parameters.
The effectiveSize
function in the coda
package (Plummer et al. 2006) suggests
that the effective sample size in this example is of the order of 100 to
200, whereas the revdbayes functions rpost
and rpost_rcpp
produce random samples of size 1000.
rpost
is a little slower than posterior
.
We compare the computational efficiencies of rpost
and
rpost_rcpp
when performing the analysis of daily rainfall
totals from the Introducing
revdbayes.
# Informative prior set using revdbayes
pr2 <- set_prior(prob = 10^-(1:3), shape = c(38.9, 7.1, 47),
scale = c(1.5, 6.3, 2.6), model = "gev", prior = "quant")
if (got_microbenchmark) {
res <- microbenchmark(
rpost = rpost(n = n, model = "pp", prior = pr2, data = rainfall,
thresh = 40, noy = 54),
rpost_rcpp = rpost_rcpp(n = n, model = "pp", prior = pr2,
data = rainfall, thresh = 40, noy = 54)
)
}
options(microbenchmark.unit = NULL)
print(res, signif = 3)
#> Unit: milliseconds
#> expr min lq mean median uq max neval
#> rpost 357 376.0 418.0 387.0 450.0 625.0 100
#> rpost_rcpp 36 38.8 41.6 40.7 43.5 52.5 100
options(microbenchmark.unit = "relative")
print(res, signif = 2)
#> Unit: relative
#> expr min lq mean median uq max neval
#> rpost 9.9 9.7 10 9.5 10 12 100
#> rpost_rcpp 1.0 1.0 1 1.0 1 1 100
Again, a comparison with the function posterior
in
evdbayes is not shown, but rpost
is slower
and rpost_rcpp
substantially faster than
posterior
.
If the user wishes to supply their own prior to
rpost_rcpp
then they must first write a C++ function that
evaluates the log of the prior density. The general way that
rust (and hence revdbayes) enables
users to provide their own C++ log-prior functions uses external
pointers and is based on the Rcpp
Gallery article Passing
user-supplied C++ functions by Dirk Eddelbuettel.
The implementation in rust requires this C++
function to have a particular structure: it must take a constant
reference to an Rcpp::NumericVector
, say x
, a
constant reference to an Rcpp::List
, say
ppars
, and return a double
precision scalar.
Here x
is the argument of the prior density, i.e. the
parameter vector of the extreme value model, and ppars
is a
list containing the values of prior parameters whose values are not
specified inside the function. Thus values of any parameters in the
prior can be changed without editing the function. If there are no such
parameters then the argument ppars
must still be present in
the C++ function, even though the list provided to the function will be
empty.
A simple way to provide C++ log-prior functions is to put them in a
file, say user_fns.cpp
, perhaps taking advantage of the
R-like syntax made available by Rcpp
sugar. Example content is provided below. This file is available on
the revdbayes
Github page.The functions in this file are compiled and made
available to R, either using the Rcpp::sourceCpp
function
(e.g. Rcpp::sourceCpp("user_fns.cpp")
) or using RStudio’s
Source button on the editor toolbar. The example content below also
includes the function create_prior_xptr
, which creates an
external pointer to a C++ function. See . It is this external pointer
that is passed to set_prior
to set the prior. If the user
has written a C++ function, say new_name
, they need to add
to create_prior_xptr
two lines of code:
else if (fstr == "new_name")
return(Rcpp::XPtr<funcPtr>(new funcPtr(&new_name))) ;
in order that they can create an external pointer for
new_name
using create_xptr
.
The following excerpt from the example user_fns.cpp
file
contains a C++ function user_gp_flat
to evaluate (the log
of) a prior density \(\pi(\sigma, \xi) \propto
\sigma^{-1}, \, \sigma > 0\), with an extra parameter
min_xi
enabling a prior lower bound to be set for \(\xi\). The same prior can be set, using an
in-built prior function, using
set_prior(prior = "flat", model = "gp", min_xi = -1)
, where
we have set min_xi = -1
. Note that . Hence in
user_gp_flat
\(\sigma\)
and \(\xi\) are x[0]
and
x[1]
not x[1]
and x[2]
.
// [[Rcpp::depends(Rcpp)]]
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::interfaces(r, cpp)]]
// Generalized Pareto log-priors
// [[Rcpp::export]]
double user_gp_flat(const Rcpp::NumericVector& x, const Rcpp::List& ppars) {
double min_xi = ppars["min_xi"] ;
if (x[0] <= 0 || x[1] < min_xi)
return R_NegInf ;
return -log(x[0]) ;
}
// [[Rcpp::export]]
SEXP create_prior_xptr(std::string fstr) {
typedef double (*priorPtr)(const Rcpp::NumericVector& x,
const Rcpp::List& ppars) ;
if (fstr == "gp_flat")
return(Rcpp::XPtr<priorPtr>(new priorPtr(&user_gp_flat))) ;
else
return(Rcpp::XPtr<priorPtr>(R_NilValue)) ;
}
// We could create an external pointer when this file is sourced using
// this embedded R code below and/or (re)create them using the relevant
// pointer-creation functions in an R session or R package.
/*** R
ptr_gp_flat <- create_prior_xptr("gp_flat")
*/
Once the external pointer to the user-supplied prior C++ function has
been created it is passed to set_prior
, along with any
required parameter values. The following example repeats the example in
Generalised Pareto (GP) model. The difference
is that now we create the pointer ptr_gp_flat
and pass it
to set_prior
using prior = ptr_gp_flat
rather
than using the arguments prior = "flat", model = "gp"
to
specify the equivalent in-built prior.
# GP model, user-defined prior
ptr_gp_flat <- create_prior_xptr("gp_flat")
p_user <- set_prior(prior = ptr_gp_flat, model = "gp", min_xi = -1)
gpg <- rpost_rcpp(n = 1000, model = "gp", prior = p_user, thresh = u,
data = gom)