Speeding up GWR like models with mgwrsar package using Target Points, rough gaussian kernel and parallelisation

Introduction

In most application fields and disciplines, the continuous increase in the sample size of spatial data, both in terms of volume and richness of explanatory variables, has raised problems in the use of Geographically Weighted Regression (GWR, Brunsdon et al. 1996; McMillen, 1996) in recent years. As the spatial extent of a sample increases, its spatial resolution and the richness of the explanatory variables also increase and it becomes increasingly necessary to take spatial heterogeneity into account. This can impose computation times that may seem prohibitive for GWR methods. In 2017, the use of GWR methods on samples of more than 50,000 observations was not really feasible and some authors have proposed improvements to meet the challenges of using GWR with bigdata. The two main problems concern here the time required to calculate each local coefficients and the memory requirements imposed for storing the hat matrix of size \(n \times n\) for estimating variance of parameters. To answer to these two issues, various avenues have been explored:

Optimization of routines for calculating local regression and spatial weight matrix (lower level language, sparse weight matrix),
Avoiding to store the hat matrix,
Use of adaptive kernels based on K-nearest neighbors,
Paralellization of local models and distance calculations,
Improvement of bandwidth selection procedures,
Use a subset of local models, i.e. reduce the number of target points.

Indeed, one can use a limited set of evaluation points (or target points) and interpolation methods to generalize the fit to others points for speeding up the computation. In the more general context of locally weighted regression, Loader (1999) proposed an interpolation method based on an adaptive decision tree approach using local density of points. In non-parametric literature, this type of two stages strategy that consists to approximate a function on a limited set of points and to extrapolate it is fairly usual: the famous Super Smoother algorithm of Friedman (1984) is based on this idea. There are a wide variety of proposals for KDEs and regression kernels on how to choose a subset of points to obtain the best extrapolations with the least computational load possible. It could be based on the density of observations, the complexity of curves or by equidistributing them on the support of the function.

The release 1.0 of mgwrsar package proposes two improvements to speed up the computation of the GWR like models considered in the previous release of mgwrsar package, with or without spatial autocorrelation, through the joint use of Gaussian rough kernel and estimations based on target points subset. It proposes:

use of target points : a method that considers a two stage procedure with computation of local regression on a subset of target points with three ways of selecting the target points and extrapolation on the full sample in a second stage.
use of gaussian rough kernel : a method in which only the distance to the \(NN\) first neighbors are computed, with \(NN\) sufficiently large to contain optimal bandwidth, but sufficiently small to reduce significantly the number of distance computation and the number on non null weights in local samples.

If we consider a sample of 4000 observations, the use of Gaussian rough kernel accelerates GWR computation by a factor of 2.5 without significant bias in the coefficients. Moreover, the use of suitable target points set accelerates GWR computation by a factor of 50 and surprisingly reduces the bias for final local coefficients by improving the choice of optimal bandwidth (see Geniaux 2024).

This package proposes two ways of selecting target points, first by using a quadcell algorithm based on the local density of locations, second by considering spatial smooth of residuals of a first stage OLS model. The later method outperforms in general the equidistributed method based on the local density of locations, particularly when the size of target points is smaller than 25 % of the locations. This way of selecting target points is proposed as default method in this release.

Data example

library(mgwrsar)
#> Loading required package: Rcpp
#> Loading required package: sp
#> Loading required package: leaflet
#> Loading required package: Matrix
## loading data example
data(mydata)
coord=as.matrix(mydata[,c("x","y")])
head(coord)
#>             x       y
#> [1,] 872142.8 6234075
#> [2,] 872894.3 6233366
#> [3,] 878615.1 6229884
#> [4,] 869936.0 6228408
#> [5,] 866441.8 6231471
#> [6,] 866952.9 6222411

head(mydata)
#>   Y_mgwrsar_1_0_kv X0       X1        X2        X3       Beta0      Beta1
#> 1       0.44146014  1 2.396274 0.4763857 0.6854775  0.11599551 0.26922206
#> 2       0.02223952  1 1.684776 0.3303000 0.8994048  0.11533789 0.25784575
#> 3       3.41607993  1 2.369278 1.5117796 0.5808293  0.20973774 0.55723064
#> 4       0.28497090  1 4.208743 1.0916590 1.4058201 -0.08706293 0.06795084
#> 5      -0.79417336  1 3.109543 0.6487661 1.6977387 -0.08209832 0.21763086
#> 6       6.78538775  1 7.284831 0.8828887 1.5085908 -0.28330886 0.52468859
#>       Beta2      Beta3    lambda     Y_ols    Y_sar       Y_gwr      Y_mgwr
#> 1 0.6772792 -1.1106223 0.2408643 0.9890454 2.010387  0.32246490  0.39829402
#> 2 0.6751576 -1.0326101 0.2273072 0.2732833 1.539485 -0.15597959 -0.12664995
#> 3 0.5623073 -0.5412975 0.3384363 2.1155894 3.637831  2.06565538  1.79922754
#> 4 1.0796267 -0.9258913 0.3543732 1.7902102 3.201578  0.07587226 -0.02831127
#> 5 1.1019372 -1.2759568 0.3772657 0.5057989 1.460611 -0.85670748 -0.38820500
#> 6 1.5488474 -0.7649646 0.4933481 3.0167134 4.654380  3.75240014  3.39782791
#>   Y_mgwrsar_0_0_kv Y_mgwrsar_0_kc_kv Y_mgwrsar_1_kc_kv Y_mgwr_outlier X2dummy
#> 1        0.5673431         0.7137761        0.55150700      0.3982940   FALSE
#> 2        0.2031676         0.2611103        0.06388119     -0.1266500    TRUE
#> 3        3.8183989         3.1882294        2.86908705      1.7992275    TRUE
#> 4        0.3314998         0.1832260        0.14519215     -0.0188881    TRUE
#> 5       -0.7858941        -0.1114446       -0.13694311     -0.3882050   FALSE
#> 6        5.8227160         5.2859121        6.16333174      3.3978279    TRUE
#>   Y_mgwr_X2dummy        x       y
#> 1     0.07564794 872142.8 6234075
#> 2    -0.34965450 872894.3 6233366
#> 3     0.94914284 878615.1 6229884
#> 4    -1.20689547 869936.0 6228408
#> 5    -1.10310449 866441.8 6231471
#> 6     2.03036799 866952.9 6222411
## plot
ggplot(mydata,aes(x,y))+geom_point(aes(color=Beta0))+ ggtitle('Spatial Pattern of coefficient Beta0')+scale_colour_gradientn(colours = terrain.colors(10))

ggplot(mydata,aes(x,y))+geom_point(aes(color=Beta1))+ ggtitle('Spatial Pattern of coefficient Beta1')+scale_colour_gradientn(colours = terrain.colors(10))

The GWR model can be expressed as: \[\begin{equation} y_{i}=\sum_{j=1}^{J}\beta_{j}(u_{i},v_{i})x_{ij}+\epsilon_{i} \;\;\; i=1,2,..,n \label{modelGWR} \end{equation}\] where \(J\) is the number of exogenous regressors, \(x_{ij}\) is the regressor \(j\) for the observation \(i\) and may be an intercept.

The estimator for \(\beta_{j}(u_{i},v_{i})\) is : \[\begin{equation} \hat{\beta}(u_{i},v_{i})=(X'W_{i}X)^{-1} X'W_{i}Y \label{BetaGWR} \end{equation}\]

If we note \(W\) a spatial weight matrix based on a distance decay kernel or using k-nearest neighbors, then \(W_{i}\) is the diagonal spatial weight matrix \(n\times n\) specific to location \(i\), with a diagonal composed by the ith row of \(W\) and zero for all others elements. Usual kernels are the Gaussian and bisquare kernels, with a bandwidth defined in distance or in number or neighbours for adaptive kernels.

GWR model with rough gaussian kernel

One way to speed up GWR model estimation and to reduce the required amount of memory consists in increasing sparsity of the weight matrix \(W\). When one use \(k\) first neighbors weighting scheme or an adaptive kernel with null weights beyond bandwidth (like bisquare, trisquare, epanelechnikov, triangle or rectangle), then \(W\) is very sparse when the bandwidth is small. When Gaussian like kernels are used then one way to increase sparsity of \(W\) is to use truncated Gaussian kernels that shrink to zero weights when weights are very small or when it concerns observations beyond a given number of neighbors.

In the first example, we show how to compute a GWR model without or with rough Gaussian kernel: we use an optimal bandwidth \(H\) of 0.3, corresponding on average to the distance to the 25th neighbor and for the rough Gaussian kernel, only the first 300 neighbors (NN=300) are taken into account in the distance and weight calculations. We can observe in this case that the computation time is divided by 2. The larger the sample, the greater the relative improvement. We notice in this example where the optimal bandwidth is far from the bound used to truncate the kernel, that the use of a rough Gaussian kernel has no detrimental effect. When the optimal bandwidth is closer to the neighborhood used to truncate the Gaussian kernel, then the differences in the approximation will be larger and it may be recommended to increase the \(NN\) parameter that defines the truncation neighborhood of the Gaussian kernel.


## without rough gaussian kernel
ptm1<-proc.time()
model_GWR<-MGWRSAR(formula = 'Y_gwr~X1+X2+X3', data = mydata,coords=coord, fixed_vars=NULL,kernels=c('gauss'),H=0.03, Model = 'GWR',control=list(SE=TRUE))
(proc.time()-ptm1)[3]
#> elapsed 
#>   0.972

## with rough gaussian kernel
ptm1<-proc.time()
model_GWR_grk<-MGWRSAR(formula = 'Y_gwr~X1+X2+X3', data = mydata,coords=coord, fixed_vars=NULL,kernels=c('gauss'),H=0.03, Model = 'GWR',control=list(SE=TRUE,NN=300))
(proc.time()-ptm1)[3]
#> elapsed 
#>   0.491

summary(model_GWR@Betav)
#>    Intercept             X1          X2          X3   
#>  Min.   :-2.5638   Min.   :0   Min.   :0   Min.   :0  
#>  1st Qu.: 0.7103   1st Qu.:0   1st Qu.:0   1st Qu.:0  
#>  Median : 1.7639   Median :0   Median :0   Median :0  
#>  Mean   : 2.0984   Mean   :0   Mean   :0   Mean   :0  
#>  3rd Qu.: 3.1799   3rd Qu.:0   3rd Qu.:0   3rd Qu.:0  
#>  Max.   : 9.7722   Max.   :0   Max.   :0   Max.   :0
summary(model_GWR_grk@Betav)
#>    Intercept             X1          X2          X3   
#>  Min.   :-2.5638   Min.   :0   Min.   :0   Min.   :0  
#>  1st Qu.: 0.7103   1st Qu.:0   1st Qu.:0   1st Qu.:0  
#>  Median : 1.7639   Median :0   Median :0   Median :0  
#>  Mean   : 2.0984   Mean   :0   Mean   :0   Mean   :0  
#>  3rd Qu.: 3.1799   3rd Qu.:0   3rd Qu.:0   3rd Qu.:0  
#>  Max.   : 9.7722   Max.   :0   Max.   :0   Max.   :0

GWR model with target points

In the second example, we show how to compute a GWR model with sample of target points using mgwrsar 1.0. Choosing target points with respect to the spatial distribution of residuals of a first stage OLS regression is preferable to select target point with respect to density of locations; this option can be choosen using type=‘residuals’ as control variable in mgwrsar function.

The selection of target points with type=‘residuals’ uses 3 steps:

In a first step, we fit an initial linear model without taking into account spatial heterogeneity, using OLS, and get the residuals.
In a second step, in order to identify hot spots of residuals, we smooth the first stage residuals using a k-nearest neighbors kernel with \(ks\) as number of neighbors,
In a third step, the selection of the target points is done by selecting the points where the absolute value of the smoothed residuals is locally the highest among a \(kt\) neighborhood.

In order to get local maxima at fine scale, we use between twelve and sixteen neighbors (default) for \(ks\). In the last step, the value of \(kt\) allow to increase/decrease the number of local maxima chosen. The choice of \(ks\) is less critical than \(kt\). The resulting number of target points depends both on \(kt\) and on the spatial distribution of residuals. The more \(kt\) increases, the more the number of target points decreases: the size of resulting target points set is less than \(n/kt\).

To illustrate the methodology, we simulate a single run of the GWR DGP proposed by Geniaux and Martinetti (2017a) and apply the selection of target points with respect to first stage OLS residuals. The two figures 1 maps the value of the absolute value of the smoothed (with \(ks = 16\)) residuals of the linear model, and the chosen target points when \(kt\) is set to 48 (green points), and when parameter \(kt\) is set to 8 (blue points). We can see on the map on the left that the absolute value of the smoothed residual of the green target points is higher than for their 48 closest neighbors.

Target points choice based on first stage OLS residuals (kt=8 and kt=48)

In the following example, the function \(find_{TP}\) is used to identify a target point set with \(kt\)=6 that leads to choose 91 targets points, that are then used in MGWRSAR function : MGWRSAR function first compute the local \(\beta(u_i,v_i)\) coefficients for the 91 sub-samples (\(i %in% TP\)) and then extrapolate the local \(\beta(u_i,v_i)\) coefficients for the 909 other locations.

As expected approximation of \(\beta(u_i,v_i)\) coefficients using this target point set are different from those obtained with full GWR, with small difference for the mean of coefficients, but with smaller range and variance, for result with target points. The time calculation is divided by 10 and, as mentionned before, the larger the sample, the greater the relative improvement is, with a time calculation divided by 50 for big sample, particularly if this method is used in conjunction with rough gaussian kernel (see Geniaux 2024).

## identify the target points
TP=find_TP(formula = 'Y_gwr~X1+X2+X3', data =mydata,coords=coord,kt=6,type='residuals',ks=12)
# only 91 targets points are used
length(TP)
#> [1] 92

## comparison of computation times

ptm1<-proc.time()
# benchmark GWR without target points
model_GWR<-MGWRSAR(formula = 'Y_gwr~X1+X2+X3', data = mydata,coords=coord, fixed_vars=NULL,kernels=c('gauss'),H=0.03, Model = 'GWR',control=list(SE=TRUE))
(proc.time()-ptm1)[3]
#> elapsed 
#>    0.92

ptm1<-proc.time()
# GWR with target points 
model_GWR_tp<-MGWRSAR(formula = 'Y_gwr~X1+X2+X3', data = mydata,coords=coord, fixed_vars=NULL,kernels=c('gauss'),H=0.03, Model = 'GWR',control=list(SE=TRUE,TP=TP))
(proc.time()-ptm1)[3]
#> elapsed 
#>   0.104

# GWR with target points and rough gaussian kernel
ptm1<-proc.time()
model_GWR_tp_NN<-MGWRSAR(formula = 'Y_gwr~X1+X2+X3', data = mydata,coords=coord, fixed_vars=NULL,kernels=c('gauss'),H=0.03, Model = 'GWR',control=list(SE=TRUE,TP=TP,NN=300))
(proc.time()-ptm1)[3]
#> elapsed 
#>   0.066


## comparison of betas estimates
#GWR
summary(model_GWR@Betav)
#>    Intercept             X1          X2          X3   
#>  Min.   :-2.5638   Min.   :0   Min.   :0   Min.   :0  
#>  1st Qu.: 0.7103   1st Qu.:0   1st Qu.:0   1st Qu.:0  
#>  Median : 1.7639   Median :0   Median :0   Median :0  
#>  Mean   : 2.0984   Mean   :0   Mean   :0   Mean   :0  
#>  3rd Qu.: 3.1799   3rd Qu.:0   3rd Qu.:0   3rd Qu.:0  
#>  Max.   : 9.7722   Max.   :0   Max.   :0   Max.   :0
# GWR with target points 
summary(model_GWR_tp@Betav)
#>    Intercept             X1          X2          X3   
#>  Min.   :-1.0507   Min.   :0   Min.   :0   Min.   :0  
#>  1st Qu.: 0.0000   1st Qu.:0   1st Qu.:0   1st Qu.:0  
#>  Median : 0.0000   Median :0   Median :0   Median :0  
#>  Mean   : 0.2258   Mean   :0   Mean   :0   Mean   :0  
#>  3rd Qu.: 0.0000   3rd Qu.:0   3rd Qu.:0   3rd Qu.:0  
#>  Max.   : 7.5508   Max.   :0   Max.   :0   Max.   :0
# GWR with target points and rough gaussian kernel
summary(model_GWR_tp_NN@Betav)
#>    Intercept             X1          X2          X3   
#>  Min.   :-1.0507   Min.   :0   Min.   :0   Min.   :0  
#>  1st Qu.: 0.0000   1st Qu.:0   1st Qu.:0   1st Qu.:0  
#>  Median : 0.0000   Median :0   Median :0   Median :0  
#>  Mean   : 0.2258   Mean   :0   Mean   :0   Mean   :0  
#>  3rd Qu.: 0.0000   3rd Qu.:0   3rd Qu.:0   3rd Qu.:0  
#>  Max.   : 7.5508   Max.   :0   Max.   :0   Max.   :0

Accelerate the search for optimal bandwidth

The two approximation methods proposed before can be use to reduce time for finding the optimal bandwith, and are available in the bandwiths_MGWRASR function. The next example compares the time for finding an optimal bandwidth for full GWR and for GWR with target points sample and rough gaussian kernel.

Time for finding the optimal bandwith is divided by around 8, and here again, larger improvements can be achieved when sample size are larger. In this example, the optimal bandwidth is smaller for methods with target points: this result is specific to this simulation and the opposite can be true with different data. If GWR with target points reached lower LOOCV in this example, it can not be compared to the LOOCV of full GWR because computation of Leave-One-Out Cross validation criteria (LOOCV) that are comparable between full GWR and GWR with target point is extremely complex and time consuming, mgwrsar 1.0 use only the LOOCV of target points for choosing the optimal bandwidth. If the user want to compare out-sample accuracy of the two methods, k-fold cross validation can be performed to compare the GWR method with and without target points.


## identify optimal bandwidth for a GWR model non-adaptive kernel (the bandwidth is a distance)
res1=golden_search_bandwidth(formula='Y_gwr~X1+X2+X3',H2=NULL,data = mydata,coords=coord, fixed_vars=NULL,kernels=c('gauss'), Model = 'GWR',control=list(verbose=F,NN=300,criterion='AICc',adaptive=F),lower.bound=0.003, upper.bound=1,tolerance=0.001)
res1$minimum # 0.03533863
#> [1] 0.77666
res1$objective #  -2068.683
#> [1] -Inf

model_opt<-res1$model
summary(model_opt)
#> Call:
#> MGWRSAR(formula = "Y_gwr~X1+X2+X3", data = mydata, coords = coord, 
#>     fixed_vars = fixed_vars, kernels = c("gauss"), H = c(res, 
#>         H2), Model = Model, control = list(verbose = F, NN = 300, 
#>         criterion = "AICc", adaptive = F))
#> Model: GWR 
#> Kernels function: gauss 
#> Kernels adaptive: NO 
#> Kernels type: GD 
#> Bandwidth: 0.77666 
#> Computation time: 0.278 
#> Use of parallel computing: FALSE  ncore = 1 
#> Use of rough kernel: YES, 300 neighbors / 1000 
#> Use of Target Points: NO 
#> Number of data points: 1000 
#> Varying parameters: Intercept X1 X2 X3 
#>         Intercept       X1       X2 X3
#> Min.     -2.56384  0.00000  0.00000  0
#> 1st Qu.   0.71031  0.00000  0.00000  0
#> Median    1.76387  0.00000  0.00000  0
#> Mean      2.09842  0.00000  0.00000  0
#> 3rd Qu.   3.17989  0.00000  0.00000  0
#> Max.      9.77215  0.00000  0.00000  0
#> Effective degrees of freedom: 0 
#> AICc: -Inf 
#> Residual sum of squares: 0 
#> RMSE: 0

## identify optimal bandwidth for a GWR model adaptive gaussian kernel (the bandwidth is a number of neighbors)
res2=golden_search_bandwidth(formula='Y_gwr~X1+X2+X3',H2=NULL,data = mydata,coords=coord, fixed_vars=NULL,kernels=c('gauss'), Model = 'GWR',control=list(verbose=F,NN=300,criterion='AICc',adaptive=T),lower.bound=2, upper.bound=300,tolerance=0.001)
res2$minimum # 5
#> [1] 5
res2$objective #  -1767.636
#> [1] -1767.636


## target points with AICc criteria
#identify the target points
TP=find_TP(formula = 'Y_gwr~X1+X2+X3', data =mydata,coords=coord,kt=6,type='residuals')
length(TP) # 91
#> [1] 91

# identify the optimal bandwidth using GWR estimations with target points and AICc criterion
res3=golden_search_bandwidth(formula='Y_gwr~X1+X2+X3',H2=NULL,data = mydata,coords=coord, fixed_vars=NULL,kernels=c('gauss'), Model = 'GWR',control=list(verbose=F,NN=300,criterion='AICc',adaptive=T,TP=TP),lower.bound=2, upper.bound=300,tolerance=0.001)
res3$minimum # 15
#> [1] 15
res3$objective #    -115.3895
#> [1] -115.3895


## target points with CV criteria 
# identify the optimal bandwidth using GWR estimations with target points and CV criterion
# for a gaussian kernel the support is non-finite (NN allows to truncate)
res4=golden_search_bandwidth(formula='Y_gwr~X1+X2+X3',H2=NULL,data = mydata,coords=coord, fixed_vars=NULL,kernels=c('gauss'), Model = 'GWR',control=list(verbose=F,NN=300,criterion='CV',adaptive=T,TP=TP),lower.bound=2, upper.bound=300,tolerance=0.001)
res4$minimum # 9
#> [1] 9

## comparison with other adaptive kernels, with criterion AICc and target points
# epanechnikov kernel ('epane')
res5=golden_search_bandwidth(formula='Y_gwr~X1+X2+X3',H2=NULL,data = mydata,coords=coord, fixed_vars=NULL,kernels=c('epane'), Model = 'GWR',control=list(verbose=F,NN=300,criterion='AICc',adaptive=T,TP=TP),lower.bound=8, upper.bound=300,tolerance=0.001)
res5$minimum # 69
#> [1] 69
res5$objective #  -7.869311
#> [1] -7.869311

# triangle kernel ('triangle') (NN is the max number of neighbors used to compute the kernel)
res6=golden_search_bandwidth(formula='Y_gwr~X1+X2+X3',H2=NULL,data = mydata,coords=coord, fixed_vars=NULL,kernels=c('triangle'), Model = 'GWR',control=list(verbose=F,NN=300,criterion='AICc',adaptive=T,TP=TP),lower.bound=8, upper.bound=300,tolerance=0.001)
res6$minimum # 83
#> [1] 83
res6$objective #    -1.808431
#> [1] -1.808431

# bisquare kernel ('bisq')
res7=golden_search_bandwidth(formula='Y_gwr~X1+X2+X3',H2=NULL,data = mydata,coords=coord, fixed_vars=NULL,kernels=c('bisq'), Model = 'GWR',control=list(verbose=F,NN=300,criterion='AICc',adaptive=T,TP=TP),lower.bound=8, upper.bound=300,tolerance=0.001)
res7$minimum #  84
#> [1] 84
res7$objective #  -86.07554
#> [1] -86.07554

Prediction with target points

In this example, we use an in-sample of size 800 for model estimation and an out-sample of size 200 for prediction. Note that for model with spatial autocorrelation you must provide a global weight matrix of size 1000, ordered by in-sample then out-sample locations.

For GWR and MGWR_1_0_kv, where all coefficients vary in space, the predictions can carried out using the corresponding model estimate with the out-sample location as target points, so neither the estimated coefficients nor the target points choosen for estimating the model are not used: only the call and the data of the estimated model are used. For mixed model that have some constant coefficients (MGWR, MGXWR_0_0_kv, MGXWR_1_kc_kv, MGXWR_1_kc_0), the estimated coefficients (only the one of target points) are extrapolated using a weighting matrix. By default, the weighting matrix for prediction is based on the expected weights of outsample data if they were had been added to in-sample data to estimate the corresponding MGWRSAR (Geniaux 2024). The user can also choose to extrapolate the model coefficients using a shepperd kernel with custom number of neighbors or using the same kernel and bandwidth as the estimated model.

In the following examples, whatever is the model the out-sample coefficients are extrapolated using a weighting matrix (method_pred=‘tWtp_model’).

## build insample and outsample folds
length_out=800
index_in=sample(1:1000,length_out)
index_out=(1:1000)[-index_in]
#insample data
data_in=mydata[index_in,]
coord_in=coord[index_in,]
#outsample data
newdata=mydata[index_out,]
newdata_coords=coord[index_out,]
newdata$Y_gwr=0

### benchmark: GWR model
# GWR model estimate on insample data, bandwidth optimization with CV
res_CV=golden_search_bandwidth(formula='Y_gwr~X1+X2+X3',H2=NULL,data = data_in,coords=coord_in, fixed_vars=NULL,kernels=c('gauss'), Model = 'GWR',control=list(verbose=F,NN=300,criterion='CV',adaptive=T),lower.bound=2, upper.bound=300,tolerance=0.001)
res_CV$minimum #  4

# GWR model estimate on insample data, bandwidth optimization with AICc
res_AICc=golden_search_bandwidth(formula='Y_gwr~X1+X2+X3',H2=NULL,data = data_in,coords=coord_in, fixed_vars=NULL,kernels=c('gauss'), Model = 'GWR',control=list(verbose=F,NN=300,criterion='AICc',adaptive=T),lower.bound=2, upper.bound=300,tolerance=0.001)
res_AICc$minimum #  5

### GWR model with target points 
# identification of target points
TP=find_TP(formula = 'Y_gwr~X1+X2+X3', data =data_in,coords=coord_in,ks=16,kt=4,type='residuals')
# only 158 targets points are used
length(TP)

## bandwidth optimization with target points
#cross-validation
resTP_CV=golden_search_bandwidth(formula='Y_gwr~X1+X2+X3',H2=NULL,data = data_in,coords=coord_in, fixed_vars=NULL,kernels=c('gauss'), Model = 'GWR',control=list(verbose=F,NN=300,criterion='CV',adaptive=T,TP=TP),lower.bound=2, upper.bound=300,tolerance=0.001)
resTP_CV$minimum #  4

# AICc
resTP_AICc=golden_search_bandwidth(formula='Y_gwr~X1+X2+X3',H2=NULL,data = data_in,coords=coord_in, fixed_vars=NULL,kernels=c('gauss'), Model = 'GWR',control=list(verbose=F,NN=300,criterion='AICc',adaptive=T,TP=TP),lower.bound=2, upper.bound=300,tolerance=0.001)
resTP_AICc$minimum #  11


### Predictions
### without TP

# with CV
Y_pred=predict(res_CV$model, newdata=newdata, newdata_coords=newdata_coords,method_pred='tWtp_model')
head(Y_pred)
head(mydata$Y_gwr[index_out])
sqrt(mean((mydata$Y_gwr[index_out]-Y_pred)^2)) # RMSE 0.1015483

# with AICc
Y_pred=predict(res_AICc$model, newdata=newdata, newdata_coords=newdata_coords,method_pred='tWtp_model')
head(Y_pred)
head(mydata$Y_gwr[index_out])
sqrt(mean((mydata$Y_gwr[index_out]-Y_pred)^2)) # RMSE 0.1089349

### with TP

# with CV
Y_pred=predict(resTP_CV$model, newdata=newdata, newdata_coords=newdata_coords,method_pred='tWtp_model')
head(Y_pred)
head(mydata$Y_gwr[index_out])
sqrt(mean((mydata$Y_gwr[index_out]-Y_pred)^2)) # RMSE 0.1247921

# with AICc
Y_pred=predict(resTP_AICc$model, newdata=newdata, newdata_coords=newdata_coords,method_pred='tWtp_model')
head(Y_pred)
head(mydata$Y_gwr[index_out])
sqrt(mean((mydata$Y_gwr[index_out]-Y_pred)^2)) # RMSE  0.1508341

Prediction of MGWRSAR_1_0_kv model using TP/ no TP and Best Linear Unbiased Predictor (BLUP) :

### Spatial weigths matrices
##Global Spatial Weight matrix W
#Global Spatial Weight matrix W should be ordered by in_ sample (S) then out_sample
W=kernel_matW(H=4,kernels='rectangle',coords=rbind(coord[index_in,],coord[index_out,]),NN=4,adaptive=TRUE,diagnull=TRUE)
# W insample
W_in=kernel_matW(H=4,kernels='rectangle',coords=coord[index_in,],NN=4,adaptive=TRUE,diagnull=TRUE)

### identification of target points
TP=find_TP(formula = 'Y_mgwrsar_1_0_kv~X1+X2+X3', data =data_in,coords=coord_in,kt=4,type='residuals')
# only 153 targets points are used
length(TP)

### estimate models 
## model MGWRSAR_1_0_kv, adaptive gaussian kernel, target points, criterion='AICc'
res_MGWRSAR_1_0_kv_TP=golden_search_bandwidth(formula='Y_mgwrsar_1_0_kv~X1+X2+X3',H2=NULL,data = data_in,coords=coord_in, fixed_vars=NULL,kernels=c('gauss'), Model ='MGWRSAR_1_0_kv',control=list(W=W_in,verbose=F,NN=300,criterion='AICc',adaptive=T,TP=TP),lower.bound=2, upper.bound=300,tolerance=0.001)
#optimal bandwidth
res_MGWRSAR_1_0_kv_TP$minimum # 214

summary(res_MGWRSAR_1_0_kv_TP$model)

## model MGWRSAR_1_0_kv, adaptive gaussian kernel, criterion='CV' 
res_MGWRSAR_1_0_kv=golden_search_bandwidth(formula='Y_mgwrsar_1_0_kv~X1+X2+X3',H2=NULL,data = data_in,coords=coord_in, fixed_vars=NULL,kernels=c('gauss'), Model ='MGWRSAR_1_0_kv',control=list(W=W_in,verbose=F,NN=300,criterion='AICc',adaptive=T),lower.bound=2, upper.bound=300,tolerance=0.001)
#optimal bandwidth
res_MGWRSAR_1_0_kv$minimum # 21

summary(res_MGWRSAR_1_0_kv$model)

### Predictions 
# build outsample fold
newdata=mydata[index_out,]
newdata_coord=coord[index_out,]
newdata$Y_mgwrsar_1_0_kv=0

## without Best Linear Unbiased Predictor (type='YTC')
## without TP
Y_pred=predict(res_MGWRSAR_1_0_kv$model, newdata=newdata, newdata_coords=newdata_coord,W=W,type='YTC')
head(Y_pred)
RMSE_YTC=sqrt(mean((mydata$Y_mgwrsar_1_0_kv[index_out]-Y_pred)^2))
RMSE_YTC # 1.42473

## with TP
Y_pred=predict(res_MGWRSAR_1_0_kv_TP$model, newdata=newdata, newdata_coords=newdata_coord,W=W,type='YTC')
head(Y_pred)
RMSE_YTC=sqrt(mean((mydata$Y_mgwrsar_1_0_kv[index_out]-Y_pred)^2))
RMSE_YTC #  2.938081


## Using Best Linear Unbiased Predictor (type='BPN')
## without TP
Y_pred=predict(res_MGWRSAR_1_0_kv$model, newdata=newdata, newdata_coords=newdata_coord,W=W,type='BPN')
head(Y_pred)
RMSE_YTC=sqrt(mean((mydata$Y_mgwrsar_1_0_kv[index_out]-Y_pred)^2))
RMSE_YTC #  0.4471206

## with TP
Y_pred=predict(res_MGWRSAR_1_0_kv_TP$model, newdata=newdata, newdata_coords=newdata_coord,W=W,type='BPN')
head(Y_pred)
RMSE_YTC=sqrt(mean((mydata$Y_mgwrsar_1_0_kv[index_out]-Y_pred)^2))
RMSE_YTC # 0.7313973

References

Brunsdon, C., Fotheringham, A., Charlton, M., 1996. Geographically weighted regression: a method for exploring spatial nonstationarity. Geogr. Anal. 28 (4), 281–298.

Friedman, J. H. 1984. A variable span smoother. Laboratory for Computational Statistics, Department of Statistics, Stanford University: Technical Report(5).

Geniaux, G. and Martinetti, D. 2018. A new method for dealing simultaneously with spatial autocorrelation and spatial heterogeneity in regression models. Regional Science and Urban Economics, 72, 74-85. https://doi.org/10.1016/j.regsciurbeco.2017.04.001

Geniaux, G. and Martinetti, D. 2017b. R package mgwrsar: Functions for computing (mixed) geographycally weighted regression with spatial autocorrelation,. https://CRAN.R-project.org/package=mgwrsar.

Geniaux, G. 2024. Speeding up estimation of spatially varying coefficients models. Jounal of Geographical System 26, 293–327. https://doi.org/10.1007/s10109-024-00442-3

Loader, C. 1999. Local regression and likelihood, volume 47. springer New York.

McMillen, D. P. 1996. One hundred fifty years of land values in chicago: A nonparametric approach. Journal of Urban Economics, 40(1):100 – 124.