Species richness example

2024-08-20

In this example we will show a quick example of a species Richness model using intSDM. We consider 7 randomly selected species of Vascular plants distibuted across the Netherlands. These species were obtained from two different datasets: Dutch vegetation database, which we treat as structured detection/non-detection data, and iNaturalist, which we treat as unstructured presence-only data.


library(intSDM)
library(INLA)

To begin the workflow, we use the function startWorkflow and specify the names of the species for which we want to obtain estimates for. To construct a richness model, we specify Richness = TRUE, which will construct a multi-species model rather than create models for each species independently. This richness model will include an iid random effect unique for each species, a species-level environmental effect, and a species-level random effect. In order to speed up the estimation procedure, we will assume that the hyperparameters are the same for each of the species-level random effects.


Rich <- startWorkflow(Species = c("Lolium perenne L.","Rubus caesius L.",
                                  "Rosa spinosissima L.",
                                  "Poa trivialis L.",
                                  "Galium verum L.",
                                  "Tanacetum vulgare L.",
                                  "Viola tricolor L.",
                                  "Epilobium L."),
        Projection =  '+proj=utm +zone=32 +ellps=WGS84 +datum=WGS84 +units=km +no_defs',
                      Save = FALSE, Richness = TRUE,
                      saveOptions = list(projectName = 'Richness'))

Next, we need to obtain a boundary object of the Netherlands and simplify it to assist with the estimation.


Ned <- giscoR::gisco_get_countries(country = 'Netherlands', resolution = 60)
Ned <- st_cast(st_as_sf(Ned), 'POLYGON')
Ned <- Ned[which.max(st_area(Ned)),]
Ned <- rmapshaper::ms_simplify(Ned, keep = 0.5)
Rich$addArea(Ned)

We add the data in directly from GBIF using the .$addGBIF function. There are no recorded absences for the Dutch vegetation database for our selected species. Therefore we generate absences using generateAbsences = TRUE, which will pool all the observations for all species within the dataset together, and generate absences where that species was not found.


Rich$addGBIF(datasetName = 'DVD', 
             datasetKey = '740df67d-5663-41a2-9d12-33ec33876c47', 
             datasetType = 'PA', generateAbsences = TRUE)

Rich$addGBIF(datasetName = 'iNat', 
             datasetKey = '50c9509d-22c7-4a22-a47d-8c48425ef4a7')

To estimate the spatial effect, we need to create a triangulated mesh. We create a fine mesh, however you may speed up the analysis by making the mesh coarser.


Rich$addMesh(max.edge = c(5, 10))
Rich$plot(Mesh = TRUE)

We furthermore specify a second spatial effect for the unstructured data to account for the bias sampling. In addition, we specify penalizing complexity priors for all of the random effects (species-level random effect, bias random effect and the precision parameter of the iid intercept term) to reduce overfitting of the model.


Rich$specifySpatial(prior.range = c(0.2,0.1),
                    prior.sigma = c(2, 0.1))

Rich$biasFields('iNat', prior.range = c(0.1, 0.1),
                prior.sigma = c(0.2, 0.1))

Rich$specifyPriors(priorIntercept = list(prior = 'pc.prec', param = c(0.02, 0.01)))

We assume one covariate in the model (average temperature) which we download using .$addCovariates. We then use the function .$modelFormula to specify the formula of the process model, which in this case includes both the linear and quadratic effect of the covariate.


Rich$addCovariates(worldClim = c('tavg'), res = 10, Function = scale)
Rich$modelFormula(covariateFormula = ~ tavg + I(tavg^2))

Estimating richness requires selecting one of the intercepts in the model for which to scale the intensity function. This dataset needs to be one where the sampling bias is assumed to be minimal, and the sampling area for each sampling location is known and available.

Specifying the intercept used to scale the intenstiy function may be done using predictionIntercept in the Richness options. We may then specify the sampling size of the sampling locations using samplingSize, however given that these values are not constant for our dataset, we include it as an offset using the Offset argument in the ISDM options.


Rich$modelOptions(ISDM = list(Offset = 'sampleSizeValue'), 
                  Richness = list(predictionIntercept = 'DVD'))

Finally, we select our output as richness maps, and estimate the model.


Rich$workflowOutput('Maps')

RichModel <- sdmWorkflow(Rich, inlaOptions = list(verbose = TRUE))

And provide maps of the predictions along with estimated measures of uncertainty.


ggplot() + gg(RichModel$Richness$Richness, aes(col = q0.025))
ggplot() + gg(RichModel$Richness$Richness, aes(col = q0.5))
ggplot() + gg(RichModel$Richness$Richness, aes(col = q0.975))

The species-level probabilities of occurrence are also found within this return object. We may thus plot how these vary across space for each species. For example, we may plot the probabilities for Galium verumL. and Tanacetum vulgare L.


ggplot() + gg(RichModel$Richness$Probabilities$Galium_verum_L., aes(col = mean)) 
ggplot() + gg(RichModel$Richness$Probabilities$Tanacetum_vulgare_L., aes(col = mean))