This package provides the boilerplate code for the Correlated Pseudo-Marginal method of (Deligiannidis, Doucet, and Pitt 2015). This vignette demonstrates its use by providing two examples that are closely related to ones given in their paper.
Assume that you are a Bayesian and have some data \(y\), and that you would like to perform inference for a collection of parameters \(\theta\). After specifying a likelihood \(p(y \mid \theta)\) and a prior \(p(\theta)\), you are interested in sampling from the posterior \(p(\theta \mid y)\) using a Markov chain Monte Carlo (MCMC) algorithm. Such an algorithm will draw correlated samples \(\theta^1, \theta^2, \ldots, \theta^{N_{S}}\) from a Markov chain, and under suitable regularity conditions, the average of your large collection of samples will approximate posterior expectations such as the posterior mean: \(E[\theta \mid y]\).
This package implements Metropolis-Hastings (MH) type algorithms, which is a specific sub-class of MCMC methods. MH algorithms moves from iteration to iteration by following a two-step procedure. Step one starts with proposing a new value in the chain. Step two makes a decision on whether or not to accept this new proposal. In the event that the sample is rejected, the chain stays in the same place. other MCMC methods, such as Gibbs sampling, produce samples that cannot stay in the same place.
This software is more specifically tailored to an implementation of the Correlated Pseudo-Marginal sampler (CPM). CPM is an extension of the Pseudo-Marginal Metropolis-Hastings algorithm (PMMH) (Andrieu and Roberts 2009), which is itself an extension of the Metropolis-Hastings (MH) algorithm. CPM and PMMH are useful for sampling from the posterior of a model with an intractable likelihood. While the original MH algorithm requires the ability to evaluate the model’s likelihood, CPM and PMMH only requires unbiased approximations of it.
makeCPMSampler()
FunctionThis package provides a function makeCPMSampler()
that implements all three of these algorithms (although it is more suited for PMMH and CPM algorithms). It abstracts away many implementation details, so, mathematically speaking, only a small degree of familiarity with the algorithms is required to use this software.
Programmatically speaking, makeCPMSampler()
is a function factory. This means that it is a function that creates and returns another function. The returned function will be the function that generates parameter samples.
Furthermore, many of makeCPMSampler()
’s required inputs are functions, as well. Here is a description of all the inputs it takes.
Any MH algorithm (e.g. MH, PMMH, CPM) requires that you are able to sample from a proposal distribution, and to evaluate its density. You provide it these to makeCPMSampler()
in the first two inputs: paramKernSamp=
and logParamKernEval=
.
You will need to provide a function that evaluates the logarithm of the prior distribution. This is the third argument: logPriorEval
.
Unlike the standard Metropolis-Hastings algorithm, you will not be required to provide a function that evaluates the logarithm of the likelihood. Recall from earlier that PMMH and CPM only require an unbiased estimate of the likelihood. The logarithm of an unbiased approximation is provided as the fourth argument: logLikeApproxEval=
. If you’d like, you may provide for this argument a function giving exact evaluations. In this case, makeCPMSampler()
will return a standard MH sampler. This can be useful, for instance, if you are comparing the relative efficiency of pseudo-marginal and non-pseudo-marginal methods.
Finally, you must provide the non-functional inputs. yData
is the entire data set of observations. numU
is the number of standard univariate normal samples that are used for each approximate log-likelihood evaluation. numIters
is the total number of samples drawn. This does not count burn-in or thinning. rho
is the correlation between standard normal variates at each MCMC iteration. Finally, change storeEvery
to some integer greater than \(1\) if you are concerned about memory on your computer, and you’d like to use thinning. If you set it to \(2\), say, then every other sample will be retained.
The first example is the same as the provided in this package’s documentation. This section breaks up the code into chunks in order to explain it more easily. The model has two parameters: \(\theta = (\theta_1, \theta_2)\). The conditional likelihood is
\[ \begin{eqnarray} p(y_{1:N} \mid x_{1:N}, \theta) = \prod_{i=1}^N p(y_{i} \mid x_{i}, \theta) \tag{1} \end{eqnarray} \]
where \[ \begin{eqnarray} y_{i} \mid x_{i}, \theta \sim \text{Normal}(x_i, \theta_1 - \theta_2) \end{eqnarray} \] for \(i=1,\ldots,N\). Here \(x_1, \ldots, x_N := x_{1:N}\) is a collection of latent/hidden/unobserved random variables. They have the following distribution:
\[ \begin{eqnarray} p(x_{1:N} \mid \theta) = \prod_{i=1}^N p( x_{i} \mid \theta) \tag{2} \end{eqnarray} \]
where
\[ \begin{eqnarray} x_{i} \mid \theta \sim \text{Normal}(x_i, \theta_2) \end{eqnarray} \]
for \(i=1,\ldots,N\).
We can simulate \(N=10\) observations from the model with the following code.
<- .2 + .3
realTheta1 <- .2
realTheta2 <- c(realTheta1, realTheta2)
realParams <- 10
numObs <- rnorm(numObs, mean = 0, sd = sqrt(realTheta2))
realX <- rnorm(numObs, mean = realX, sd = sqrt(realTheta1 - realTheta2)) realY
Next, construct the sampler function sampler
using makeCPMSampler()
. We assume a uniform prior over the region \(50 > \theta_1 > \theta_2 > 0\). A simple multivariate normal is used for the proposal distribution.
library(cPseudoMaRg)
<- 1000
numImportanceSamps <- 1000
numMCMCIters <- 1.5
randomWalkScale <- 1
recordEveryTh
# create the function that performs sampling
<- makeCPMSampler(
sampler paramKernSamp = function(params){
+ rnorm(2)*randomWalkScale
params
},logParamKernEval = function(oldTheta, newTheta){
dnorm(newTheta[1], oldTheta[1], sd = randomWalkScale, log = TRUE)
+ dnorm(newTheta[2], oldTheta[2], sd = randomWalkScale, log = TRUE)
},logPriorEval = function(theta){
if( (50 > theta[1]) & (theta[1] > theta[2]) & (theta[2] > 0) ){
-7.130899 # - log of 50^2/2
else{
}-Inf
}
},logLikeApproxEval = function(y, thetaProposal, uProposal){
if( (50 > thetaProposal[1]) & (thetaProposal[1] > thetaProposal[2]) & (thetaProposal[2] > 0) ){
<- uProposal*sqrt(thetaProposal[2])
xSamps <- sapply(xSamps,
logCondLikes function(xsamp) {
sum(dnorm(y,
xsamp, sqrt(thetaProposal[1] - thetaProposal[2]),
log = TRUE)) })
<- max(logCondLikes)
m log(sum(exp(logCondLikes - m))) + m - log(length(y))
else{
}-Inf
}
},
realY,
numImportanceSamps,
numMCMCIters, 99, # change to 0 for original pseudo-marginal method
. recordEveryTh)
The approximate log-likelihood, provided to the logLikeApproxEval=
parameter, uses importance sampling: \[
\begin{align*}
\log \hat{p}(y_{1:N} \mid \theta)
&= \log \left\{
N_X^{-1} \sum_{i=1}^{N_X} \frac{p(y_{1:N} \mid X^i_{1:N}, \theta) p(X^i_{1:N} \mid \theta)}{q(X^i_{1:N} \mid y_{1:N}, \theta)}
\right\}
\end{align*}
\] where \(X^1_{1:N}, \ldots, X^{N_X}_{1:N} \overset{\text{iid}}{\sim} q( \cdot \mid y_{1:N}, \theta)\).
Unlike many importance sampling implementations that call pseudo-random number generating functions (e.g. rnorm
) directly, this calculates the approximation based on standard normal samples generated from within makeCPMSampler()
. We choose \(q(x_{1:N} \mid y_{1:N}, \theta) = p(x_{1:N} \mid \theta)\), despite it not being the optimal proposal/instrumental distribution. Note the use of the “log-sum-exp” trick to avoid numerical underflow.
Finally, sample the Markov chain. Call the samples res
, and examine them.
<- sampler(realParams)
res print(res)
#> CPM Results: at a glance...
#> 1000 iterations performed
#> 1000 samples retained
#> posterior means (in the supplied parameterization): 1.772459 1.357395
#> acceptance rate: 0.076
plot(res)
You might have noticed that this particular model has a tractable likelihood, so it is more efficient to use regular MH. The likelihood is equal to the following
\[ \begin{align*} p(y_{1:N} \mid \theta) &= \int p(y_{1:N} \mid x_{1:N}, \theta) p(x_{1:N} \mid \theta) \text{d}x_{1:N} \\ &= \prod_{i=1}^N \text{Normal}(y_i ; 0, \theta_1) \end{align*} \]
<- makeCPMSampler(
samplerExact paramKernSamp = function(params){
return(params + rnorm(2)*randomWalkScale)
},logParamKernEval = function(oldTheta, newTheta){
dnorm(newTheta[1], oldTheta[1], sd = randomWalkScale, log = TRUE)
+ dnorm(newTheta[2], oldTheta[2], sd = randomWalkScale, log = TRUE)
},logPriorEval = function(theta){
if( (50 > theta[1]) & (theta[1] > theta[2]) & (theta[2] > 0) ){
-7.130899 # - log of 50^2/2
else{
}-Inf
}
},logLikeApproxEval = function(y, thetaProposal, uProposal){
# this is exact now!
if( (50 > thetaProposal[1]) & (thetaProposal[1] > thetaProposal[2]) & (thetaProposal[2] > 0) ){
sum(dnorm(y, mean = 0, sd = sqrt(thetaProposal[1]), log = TRUE))
else{
}-Inf
}
},
realY, # doesn't this matter because Us are not used
numImportanceSamps,
numMCMCIters, 99, # doesn't this matter because Us are not used
.
recordEveryTh)<- samplerExact(realParams)
res2 print(res2)
#> CPM Results: at a glance...
#> 1000 iterations performed
#> 1000 samples retained
#> posterior means (in the supplied parameterization): 0.5457526 0.2342012
#> acceptance rate: 0.034
plot(res2)
Consider the following stochastic volatility (Taylor 1982), which is a model that is simpler than the one explored in (Deligiannidis, Doucet, and Pitt 2015). It assumes a latent AR(1) process. It also assumes that, after conditioning on contemporaneous values of the latent process, that the observed returns are normal.
In other words, let \(Z_1, \ldots, Z_N\) and \(W_1, \ldots, W_N\) be iid standard normal random variables, and for \(t > 1\), define the latent process as
\[ X_t = \phi X_{t-1} + \sigma Z_t, \] and \(X_1 \sim \text{Normal}(0, \sigma^2/(1-\phi^2))\). The distribution of the observed returns is defined by \[ Y_t = \beta \exp(X_t/2) W_t \] for \(t = 1, \ldots, N\). The collection of all unknown parameters is \(\theta = \phi, \beta, \sigma^2\).
\(Y_{1:N} \mid X_{1:N}, \theta\) has the same factorization as equation (1), but unlike equation (2), we have the following Markovian factorization for the distribution of all latent variables:
\[ \begin{eqnarray} p(x_{1:N} \mid \theta) = p(x_1) \prod_{t=2}^N p( x_{t} \mid x_{t-1}, \theta) \tag{3}. \end{eqnarray} \] Assume that all three parameters are independent a priori: \(\pi(\phi)\pi(\beta)\pi(\sigma^2)\), and select a uniform, Gaussian, and Inverse-Gamma distribution for these three distributions, respectively. For the proposal distribution, we use a multivariate normal in a transformed parameter space. \(\phi\) is transformed with \(\text{logit}((\phi+1)/2)\), \(\beta\) is transformed with the identity map, and \(\sigma^2\) is transformed with the natural logarithm.
The primary bottleneck in the previous example was the function passed in to the logLikeApproxEval=
input of makeCPMSampler()
. It was written entirely in R, and so, as a result, was not as fast as it could have been. In this example, we provide a compiled function for this input that makes use of C++ code taken from the PF C++ library (Brown 2020). A growing collection of examples of calling PF code from R is provided as an R package called pfexamplesinr
that is hosted on Github. Interfacing to c++ from R was facilitated by the RcppEigen package (Bates and Eddelbuettel 2013).
The particle filter used to compute the approximate log-likelihood is described in (Deligiannidis, Doucet, and Pitt 2015). Like the example above, it calculates the approximation using common random normal variables generated from within makeCPMSampler()
. As detailed in (Deligiannidis, Doucet, and Pitt 2015), the particle filter that calculates this number uses a resampling technique that makes of the pseudo-inverse of a Hilbert space-filling function. This function was implemented with C++ code lightly edited from the C code provided in (Skilling 2004).
Since version 1.0.1, this package provides an extra input to makeCPMSampler()
called nansInLLFatal=
. If set to FALSE
, then whenever NaN
is returned from the approximate log-likelihood function, the parameter proposal is disregarded. Alternatively, if it is set to TRUE
, then the entire chain comes to a halt, and all the samples obtained up until that iteration are disregarded. When using particle filters to produce approximations to the likelihood, I recommend setting this to FALSE
, especially when the proposal distribution has a high amount of dispersion. This is because, despite one’s best efforts to guard against it, particle filters can frequently generate NaN
s.
Please also note that the number of particles used in this particle filter is selected in two places. One place is the code below, and another in a c++ #define
directive in the file src/likelihoods.cpp
in the pfexamples
library. If you are interested in increasing or decreasing the number of particles, fork pfexamplesinr
, edit your copy of the files, and run the following example after replacing the appropriate line with devtools::install_github("<your Github username>/pfexamplesinr")
.
library(cPseudoMaRg)
::install_github("tbrown122387/pfexamplesinr@e4e2a80")
devtoolslibrary(pfexamplesinr)
<- read.csv("data/return_data.csv", header=F)[,1]
returnsData <- 500 # THIS MUST MATCH "#define NP 500" in src/likelihoods.cpp
numParticles <- 500
numMCMCIters <- .1
randomWalkScale <- 1
recordEveryTh <- length(returnsData)*(numParticles+1)
numUs
# some helper functions
<- function(untrans){
transformParams <- vector(mode = "numeric", length = 3)
p 1] <- boot::logit(.5*(untrans[1] + 1))
p[2] <- untrans[2]
p[3] <- log(untrans[3])
p[return(p)
}<- function(trans){
revTransformParams <- vector(mode = "numeric", length = 3)
p 1] <- 2*boot::inv.logit( trans[1] )-1
p[2] <- trans[2]
p[3] <- exp(trans[3])
p[return(p)
}
<- makeCPMSampler(
sampler paramKernSamp = function(params){
revTransformParams(transformParams(params) + rnorm(3)*randomWalkScale)
},logParamKernEval = function(oldTheta, newTheta){
<- transformParams(newTheta)
unconstrainedNew <- transformParams(oldTheta)
unconstrainedOld dnorm(unconstrainedNew[1], unconstrainedOld[1], sd = randomWalkScale, log = TRUE) #phi
+ 0.6931472 + unconstrainedNew[1] - 2*log(1 + exp(unconstrainedNew[1])) # phi jacobian
+ dnorm(unconstrainedNew[2], unconstrainedOld[2], sd = randomWalkScale, log = TRUE) #beta
+ dnorm(unconstrainedNew[3], unconstrainedOld[3], sd = randomWalkScale, log = TRUE) #sigmaSquared
- unconstrainedNew[3] # jacobian
},logPriorEval = function(theta){
if( (abs(theta[1]) >= 1.0) || theta[3] <= 0.0 ){
-Inf
else{
}log(.5) +
dnorm(theta[2], mean = 0, sd = 10, log = T) +
dgamma(x = 1/theta[3], shape = 1.3, rate = .3, log = T)
}
},logLikeApproxEval = svolApproxLL, # c++ function from pfexamplesinr
99, recordEveryTh, FALSE
returnsData, numUs, numMCMCIters, . )
<- sampler( c(.9, 1, .1))
svolSampleResults mean(svolSampleResults)
print(svolSampleResults)