One of the earliest extensions to incorporate the temporal dimension into Geographically Weighted Regression (GWR) is the Geographically and Temporally Weighted Regression (GTWR). GTWR enables regression coefficients to vary both spatially and temporally by employing spatiotemporal weights that account for spatial and temporal distances. This approach more effectively captures evolving local dynamics compared to traditional GWR (Huang, B., Wu, B., & Barry, M., 2010). The inclusion of temporal elements enhances the understanding of how relationships change over time, in addition to their spatial variations.
In this package, we have extended the functionality to incorporate spatiotemporal (separable) kernels for all available models in the mgwrsar package, significantly expanding the range of temporal kernel options. Existing GTWR packages typically offer only symmetric temporal kernels, which do not account for seasonality. However, when working with spatiotemporal data, it is often more appropriate to use asymmetric spatiotemporal kernels for the temporal dimension. The asymmetric aspect is particularly crucial in predictive frameworks, as it ensures that only past observations are considered in local samples. Additionally, unlike purely spatial models, certain spatiotemporal processes may involve cyclic temporal proximities (Du et al., 2018; Senanayake, O’Callaghan, & Ramos, 2016).
Here, we consider a multiplicative separable kernel composed of two components:
The space-time kernel is then computed as:
\[ K(h_s, h_t, \alpha, cycling, \theta) =
\alpha * (K_s(d_s, h_s) * K_t(d_t, h_t, cycling, \theta)) + (1 - \alpha)
* I_{t_i < t_j} \]
The name of the temporal kernel consists of two to three terms separated by underscores:
gauss
,
bisq
, rectangle
, triangle
,
epane
).past
for an asymmetric kernel or
symmetric
for a kernel that includes all
observations.gauss_past_365
).In the mgwrsar package, to use a spatiotemporal kernel, you must provide a complete parameterization for both the spatial and temporal kernels, including the kernel names, whether they are adaptive, and their corresponding bandwidth values.
A spatiotemporal kernel can be specified using the control parameter
type=GDT
. For example:
kernels = c('gauss', 'gauss_past_365'),
H = c(20000, 30),
control = list(type = 'GDT', adaptive = c(FALSE, FALSE), alpha = 0.85)
The following graph illustrates this kernel with two different
parameterizations. The spatial dimension is constrained to a single axis
(longitude) to facilitate a 2D representation. The grey areas represent
null weights.
library(mgwrsar)
#> Loading required package: Rcpp
#> Loading required package: sp
#> Loading required package: leaflet
#> Loading required package: Matrix
### Simulation of a spatiotemporal DGP
simu2=simu_multiscale(n=1000,myseed=1,type='GG2024',constant=NULL,nuls=NULL,config_beta='spatiotemp',config_snr=0.9,config_eps='normal',ratiotime=1.5)
mydata=simu2$mydata
head(mydata)
#> Y X1 X2 X3 Beta1 Beta2
#> 1 14.485825 1.2879704 0.8500435 -1.8054836 1.38895237 4.0797414
#> 2 7.275414 1.6184181 -0.9253130 -0.6780407 2.17421410 4.2422579
#> 3 8.723902 1.2710352 0.8935812 -0.4733581 1.87402111 4.8731258
#> 4 7.624326 -0.2157605 -0.9410097 1.0274171 4.61026118 0.3023768
#> 5 2.973350 -1.0671588 0.5389521 -0.5973876 -0.04012401 1.4197379
#> 6 5.773858 -1.4469746 -0.1819744 1.1598494 1.82607635 0.2902590
#> Beta3 Beta4 eps x y Intercept
#> 1 1.091891e-01 -3.5080176 1.4158034 0.2655087 0.53080879 1
#> 2 9.714243e-01 0.4348605 -0.5708219 0.3721239 0.68486090 1
#> 3 -4.437261e-01 4.1313048 3.0080579 0.5728534 0.38328339 1
#> 4 -1.833294e+00 3.9273697 -2.6808888 0.9082078 0.95498800 1
#> 5 5.477149e-05 -4.5964812 1.7826499 0.2016819 0.11835658 1
#> 6 -2.724324e+00 4.5418611 -1.3958531 0.8983897 0.03910006 1
#> time
#> 1 0.000000000
#> 2 0.001001001
#> 3 0.002002002
#> 4 0.003003003
#> 5 0.004004004
#> 6 0.005005005
coords=simu2$coords
head(coords)
#> x y
#> [1,] 0.2655087 0.53080879
#> [2,] 0.3721239 0.68486090
#> [3,] 0.5728534 0.38328339
#> [4,] 0.9082078 0.95498800
#> [5,] 0.2016819 0.11835658
#> [6,] 0.8983897 0.03910006
myformula=as.formula('Y~X1+X2+X3')
# storage of coefficients from the data generating process
TRUEBETA<-as.matrix(mydata[,c('Beta1','Beta2','Beta3','Beta4')],ncol=4)
ggplot(mydata[mydata$time<0.3,],aes(x,y))+geom_point(aes(color=Beta1))+ ggtitle('Space time Pattern of coefficient Beta1 for time<0.3')+scale_colour_gradientn(colours = terrain.colors(10),limits=c(0,13))
ggplot(mydata[mydata$time>0.6,],aes(x,y))+geom_point(aes(color=Beta1))+ ggtitle('Space time Pattern of coefficient Beta1 for time>0.6')+scale_colour_gradientn(colours = terrain.colors(10),limits=c(0,13))
The golden_search_bandwidth
function is employed to
determine the optimal spatial bandwidth \(h_s\) for a given temporal bandwidth \(h_t\). This process should be iterated
across various values of \(h_t\) to
identify the best pair of bandwidths.
Note: In the function parameterizations, \(h_t\) corresponds to \(H_2\), while \(h_s\) corresponds to \(H\).
As anticipated, when evaluating the Root Mean Square Error (RMSE) of the coefficient estimates against the true values from the Data Generating Process (DGP), the use of a spatiotemporal kernel yields a significantly better fit compared to a Geographically Weighted Regression (GWR) that solely considers spatial weights.
For estimation and bandwidth search, it is essential
to provide the in-sample time vector through the control
argument. Here is an example:
control = list(..., Z = mydata$time[index_in], ...)
##### Comparing GWR estimation using spatial kernel (type='GD') and spatiotemproal kernel Type='GDT'
verbose=F
# function to compute rmse from a vector of residuals
rmse<-function(x) sqrt(mean(x^2))
### GWR with spatial non-adaptive Gaussian kernel
res_gwr_GD=golden_search_bandwidth(formula=myformula,H2=NULL,data = mydata,coords=coords, fixed_vars=NULL,kernels=c('gauss'), Model = 'GWR',control=list(adaptive=F,verbose=verbose,criterion='AICc',doMC=F,ncore=1),lower.bound=0, upper.bound=1.4,tolerance=0.001)
model_GWR_GD<-res_gwr_GD$model
model_GWR_GD@H # 0.05698921
#> [1] 0.05698921
model_GWR_GD@AICc # 5645.197
#> [1] 5643.34
Beta_RMSE=apply((model_GWR_GD@Betav-TRUEBETA) %>% as.data.frame(),2,rmse)
Beta_RMSE # 1.166121 1.403245 1.352598 1.756254
#> Intercept X1 X2 X3
#> 1.166121 1.403245 1.352598 1.756254
Beta_RMSE%>% mean() # 1.419555
#> [1] 1.419555
### GWR model with a spatio-temporal non-adaptive kernel
# Repeating the following estimation (optimization of the spatial bandwidth)
# with various temporal bandwidth (H2)
# lead to an optimal value for H2 equal 0.55 regarding AICc criteria
# here is the procedure for H2 = 0.55
# you need to provide the insample time vectorthrough the control argument} : control=list(...,Z=mydata$time)
res_gwr_GDT=golden_search_bandwidth(myformula,H2=0.55,data = mydata,coords=coords, fixed_vars=NULL,kernels=c('gauss','gauss'),Model = 'GWR',control=list(doMC=F,ncore=1,Z=mydata$time,SE=F,Type='GDT',adaptive=c(F,F),criterion='AICc',get_ts=TRUE,verbose=verbose,NN=nrow(mydata)),lower.bound=0, upper.bound=1.4)
# results
model_GWR_GDT<-res_gwr_GDT$model
model_GWR_GDT@H # 0.05892351
#> [1] 0.05885385
model_GWR_GDT@H2 # 0.55
#> [1] 0.55
model_GWR_GDT@AICc # 5632.56
#> [1] 5630.537
Beta_RMSE=apply((model_GWR_GDT@Betav-TRUEBETA) %>% as.data.frame(),2,rmse)
Beta_RMSE # 1.027104 1.253701 1.368510 1.786674
#> Intercept X1 X2 X3
#> 1.027352 1.253882 1.368100 1.785387
Beta_RMSE%>% mean() # 1.35899
#> [1] 1.35868
In this analysis, we compare Geographically Weighted Regression (GWR) out-of-sample predictions for a spatiotemporal Data Generating Process (DGP) with a larger sample size using two approaches:
The reduction in out-of-sample Root Mean Square Error (RMSE) for the dependent variable is of the same order of magnitude as the decrease in in-sample beta RMSE (mean), approximately 3%.
Several methods are available for out-of-sample predictions (see the
Prediction section in the vignette:
GWR-and-mixed-GWR-with-spatial-autocorrelation). For this
comparison, we use the method method_pred='tWtp_model'
.
This method extrapolates the Beta coefficients using a
weighting matrix derived from the expected weights of out-of-sample
data, as if they were added to in-sample data to estimate the
corresponding model.
For out-of-sample prediction, ensure the following:
Provide the in-sample time vector through the
control
argument:
control = list(…, Z = mydata$time[index_in], …)
Provide a matrix with three columns for the new data (longitude, latitude, time):
new_data=matrix(cbind(coords[index_out,], mydata$time[index_out]), ncol = 3)
## simulations
simu3=simu_multiscale(n=2000,myseed=1,type='GG2024',constant=NULL,
nuls=NULL,config_beta='spatiotemp',config_snr=0.9,
config_eps='normal',ratiotime=1.5)
mydata=simu3$mydata
head("mydata")
#> [1] "mydata"
coords=simu3$coords
head("coords")
#> [1] "coords"
myformula=as.formula('Y~X1+X2+X3')
# betas from the DGP
TRUEBETA<-as.matrix(mydata[,c('Beta1','Beta2','Beta3','Beta4')],ncol=4)
# build in sample and ousample folds
index_in=which(mydata$time<0.7)
index_out=(1:2000)[-index_in]
### model estimation, insample estimates comparison
## Comparing GWR estimation using spatial kernel (type='GD') and spatiotemproal kernel Type='GDT'
# (same as in the previous part, comparison of insample beta estimates)
verbose=F
## GWR model with a Gaussian non-adaptive kernel
# model estimation
res_gwr_GD2=golden_search_bandwidth(formula=myformula,H2=NULL,data = mydata[index_in,],coords=coords[index_in,], fixed_vars=NULL,kernels=c('gauss'), Model = 'GWR',control=list(adaptive=F,verbose=verbose,criterion='AICc',doMC=F,ncore=1),lower.bound=0, upper.bound=1.4,tolerance=0.001)
#results
model_GWR_GD<-res_gwr_GD2$model
model_GWR_GD@H # 0.0441879
#> [1] 0.0441879
model_GWR_GD@AICc # 7205.019
#> [1] 7205.019
Beta_RMSE=apply((model_GWR_GD@Betav-TRUEBETA[index_in,]) %>% as.data.frame(),2,rmse)
# rmse for insample beta estimates with GWR-GD
Beta_RMSE # 00.8841014 1.1196300 1.0467956 1.2906708
#> Intercept X1 X2 X3
#> 0.8841014 1.1196300 1.0467956 1.2906708
Beta_RMSE%>% mean() # 1.085299
#> [1] 1.085299
## Repeating the following estimation with various H2 lead to an optimal value for H2 equal 0.45 regarding AICc criteria
res_gwr_GDT2=golden_search_bandwidth(myformula,H2=0.45,data = mydata[index_in,],coords=coords[index_in,], fixed_vars=NULL,kernels=c('gauss','gauss'),Model = 'GWR',control=list(doMC=F,ncore=1,Z=mydata$time[index_in],SE=F,Type='GDT',adaptive=c(F,F),criterion='AICc',get_ts=TRUE,verbose=verbose),lower.bound=0, upper.bound=1.4)
model_GWR_GDT<-res_gwr_GDT2$model
model_GWR_GDT@H # 0.04572969
#> [1] 0.04572969
model_GWR_GDT@H2 # 0.45
#> [1] 0.45
model_GWR_GDT@AICc # 7190.152
#> [1] 7190.152
Beta_RMSE=apply((model_GWR_GDT@Betav-TRUEBETA[index_in,]) %>% as.data.frame(),2,rmse)
# rmse for insample beta estimates with GWR-GDT
Beta_RMSE # 0.8042042 1.0365272 1.0625390 1.3247835
#> Intercept X1 X2 X3
#> 0.8042042 1.0365272 1.0625390 1.3247835
Beta_RMSE%>% mean() # 1.057013
#> [1] 1.057013
#### outsample PREDICTIONS comparison
# outsample data
newdata<-mydata[index_out,]
newdata_coords=coords[index_out,]
# you need to make a matrix with space and time coordinates
newdata_coords_time=matrix(cbind(coords[index_out,],mydata$time[index_out]),ncol=3)
### GWR
Y_pred17=predict(model_GWR_GD, newdata=newdata, newdata_coords=newdata_coords,method_pred='tWtp_model')
head(Y_pred17)
#> [1] 2.019934 2.953993 7.316086 6.511827 -2.397841 9.925443
head(mydata$Y[index_out])
#> [1] 5.309515 3.799516 12.770301 9.447239 -5.857138 9.814067
# RMSE for outsample predictions (GWR)
sqrt(mean((mydata$Y[index_out]-Y_pred17)^2)) # RMSE 4.297125
#> [1] 4.297125
### GWR GDT
Y_pred18=predict(model_GWR_GDT, newdata=newdata, newdata_coords=newdata_coords_time,method_pred='tWtp_model')
head(Y_pred18)
#> [1] 2.218985 2.717337 7.233807 6.770594 -2.972868 10.288854
head(mydata$Y[index_out])
#> [1] 5.309515 3.799516 12.770301 9.447239 -5.857138 9.814067
# RMSE for outsample predictions (GWR GDT), method_pred='tWtp_model'
sqrt(mean((mydata$Y[index_out]-Y_pred18)^2)) # RMSE 4.122512
#> [1] 4.122512
Y_pred19=predict(model_GWR_GDT, newdata=newdata, newdata_coords=newdata_coords_time,method_pred='shepard')
head(Y_pred19)
#> [1] 2.221355 2.713575 7.245124 6.772069 -2.976258 10.290390
head(mydata$Y[index_out])
#> [1] 5.309515 3.799516 12.770301 9.447239 -5.857138 9.814067
# RMSE for outsample predictions (GWR GDT), method_pred='tWtp_model'
sqrt(mean((mydata$Y[index_out]-Y_pred19)^2)) # RMSE 4.121472
#> [1] 4.121472
Huang, B., Wu, B., & Barry, M. (2010). Geographically and temporally weighted regression for modeling spatio-temporal variation in house prices. International journal of geographical information science, 24(3), 383-401.
Du et al., 2018, “Extending geographically and temporally weighted regression to account for both spatiotemporal heterogeneity and seasonal variations in coastal seas,” Ecological Informatics, 43, 185-199;
Senanayake, O’Callaghan, & Ramos, 2016, “Predicting spatiotemporal propagation of seasonal influenza using variational Gaussian process regression,” Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 30, No. 1