Simulates data from a user-supplied distribution and creates missing values
artificially. Functions mcar and mcar2 provides an example mechanisms
for doing this based on a Missing Completely At Random (MCAR) assumption.
Arguments
- blocks
A numeric scalar. The number of blocks of data required. Usually, this will be a positive integer, but
blocks = 0returns a list containing in the input arguments, in particular,distn,distn_argsandblock_length. This feature is provided so that a simulation setup could be replicated in without actually simulating data.- block_length
A numeric scalar. The number of raw observations per block.
- distn
A character scalar. Specifies the distribution from which raw data are simulated. The name in the
xxxpart of thedxxx, pxxx, qxxxandrxxxdistributional functions in thestatspackage. Seestats::Distributions.- missing_fn
A function to simulate the positions of the missing values within each block year. See Details.
- missing_args
Arguments to be passed to
missing_fn. Ifmissing_fnismcarthen a subset ofp0miss,minandmaxmay be supplied in the listmissing_args. The values of the remaining components will be set at their default values.- ...
Further arguments to the function
stats::rxxx. The argumentnis set withinsim_datato be equal toblock_length * blocks.- sim_data
A numeric vector of raw observations into, some of which will be made missing.
Value
If blocks > 0, a list with the following components:
data_full: simulated raw data with no missing values.data_miss: simulated data after missing values have been created.blocks, block_length: the respective input values ofblocksandblock_length.block: a block indicator vector, suitable as an argument togev_mle.distn: the input argumentdistn.distn_args: further arguments tostats::rxxxsupplied via....
If blocks = 0, a list containing all the inputs arguments.
Details
The function missing_fn must return a, possibly empty,
subset of c(1, 2, ..., block_length). This function is applied within
each simulated block, independently of other blocks.
The default function mcar simulates the numbers of missing values in the
blocks as follows.
A proportion
p0missof the blocks have no missing values.In the other blocks, the number of missing values is
ceiling(prop_miss * block_length), whereprob_missis a value simulated from a Uniform(min,max) distribution. The positions of these missing values within the block is random.
The function mcar2 identifies at random a proportion pmiss of the
simulated raw observations to become missing.
Care may need to be taken if these simulated data are used as input to
gev_mle using an approach that discards block maxima based on more
than a certain percentage of missing values, that is, with discard > 0.
For example, using the default argument blocks = 50 and
missing_fn = mcar, with its default missing_args, may result
in a sample size of retained block maxima that contains insufficient
information to make reliable inferences, leading to difficulties finding
an appropriate MLE for the shape parameter \(\xi\) and/or a singular
observed information matrix.