One of the most widely used data analysis techniques under fairly regularity conditions is the general linear regression analysis. In general, it proposes to model the relation between a dependent variable and a set of independent variables by means of a suitable mathematical model in order to understand whether the independent variables predict the dependent variable and how they explain the empirical data which are modeled. In the modelling process some assumptions should be imposed [1]. In practice, however, data are frequently limited and some of these conditions are usually not satisfied.
[…] the available information is most often insufficient to provide a unique answer or solution for most interesting decisions or inferences we wish to make. In fact, insufficient information - including limited, incomplete, complex, noisy and uncertain information - is the norm for most problems across all disciplines. — Golan [2]
Given this, regression models may become ill-posed which implies that
the application of traditional estimation methods, such as the Ordinary
Least Squares (OLS) estimators, may lead to non-unique or highly
unstable solutions, unless simplifying assumptions/procedures are
imposed to generate seemingly well-posed statistical models [3]. One of the
major concerns in univariate multiple linear regression is the
collinearity problem, which is one of the responsible for inflating the
variance associated with the regression coefficients estimates and, in
general, to affect the signs of the estimates, as well as statistical
inference [4,5].
Supported by the maximum entropy principle [6], which advocates to evaluate
events’ probabilities using a distribution that maximizes entropy among
those that satisfy certain expectations’ constraints, Golan, Judge and
Miller (from now on referred as GJM) [3] proposed a generalization to the
regression framework and called it the generalized maximum entropy (GME)
method. The GME is a suitable and flexible method, widely used, where no
parametric assumptions are necessary, and a global measure of fit is
available.
Consider the general linear model given by
\[\begin{align} \qquad \qquad \qquad \qquad \qquad \mathbf{y} = \mathbf{X}\boldsymbol{\beta} + \boldsymbol{\epsilon}, \qquad \qquad \qquad \qquad \qquad (1) \end{align}\]
where \(\mathbf{y}\) denotes a \((N \times 1)\) vector of noisy
observations, \(N \in \mathbb{N}\),
\(\boldsymbol{\beta}\) is a \(((K+1) \times 1)\) vector of unknown
parameters or coefficients, \(K \in
\mathbb{N}\), \(\mathbf{X}\) is
a known \((N \times (K+1))\) design
matrix and \(\boldsymbol{\epsilon}\) is
a \((N \times 1)\) vector of random
disturbances (errors), usually assumed to have a conditional expected
value of zero and representing spherical disturbances, i.e, \(E\left[ \boldsymbol{\epsilon} |
\mathbf{X}\right]=\mathbf{0}\) and \(E[\boldsymbol{\epsilon
\epsilon}'|\mathbf{X}]=\sigma^2\mathbf{I}\), where \(\mathbf{I}\) is a (\(N \times N\)) identity matrix and \(\sigma^2\) is the error variance.
GJM [3]
generalized the Maximum Entropy formalism [6] to linear inverse problems with
noise, expressed in model (1). The idea is to treat each \(\beta_k,\space k\in\left\lbrace
0,\dots,K\right\rbrace\), as a discrete random variable with a
compact support and \(2 \leq M <
\infty\) possible outcomes, and each \(\epsilon_n, \space n\in\left\lbrace
1,\dots,N\right\rbrace\), as a finite and discrete random
variable with \(2 \leq J < \infty\)
possible outcomes. Assuming that both the unknown parameters and the
unknown error terms may be bounded a priori, the linear model
(1) can be presented as \[\begin{equation}
    \mathbf{y}=\mathbf{XZp} + \mathbf{Vw},
\end{equation}\]
where
\[\begin{equation}
    \boldsymbol{\beta}=\mathbf{Zp}= \left[
    \begin{array}{cccc}
        \mathbf{z}'_0   & \mathbf{0}      & \cdots &
\mathbf{0}   \\
        \mathbf{0}      & \mathbf{z}'_1   & \cdots &
\mathbf{0}   \\
        \vdots & \vdots & \ddots & \vdots\\
        \mathbf{0}      & \mathbf{0}      & \cdots &
\mathbf{z}'_K
    \end{array}\right]
    \left[
    \begin{array}{c}
        \mathbf{p}_0 \\
        \mathbf{p}_1 \\
        \vdots\\
        \mathbf{p}_K
    \end{array}\right],
\end{equation}\]
with \(\mathbf{Z}\) a \(((K+1) \times (K+1)M)\) matrix of support
values and \(\mathbf{p}\) a \(((K+1)M \times 1)\) vector of unknown
probabilities, and
\[\begin{equation}
    \boldsymbol{\epsilon}=\mathbf{Vw}= \left[
    \begin{array}{cccc}
        \mathbf{v}'_1   & \mathbf{0}      & \cdots &
\mathbf{0}   \\
        \mathbf{0}      & \mathbf{v}'_2   & \cdots &
\mathbf{0}   \\
        \vdots & \vdots & \ddots & \vdots\\
        \mathbf{0}      & \mathbf{0}      & \cdots &
\mathbf{v}'_N
    \end{array}\right]
    \left[ \begin{array}{c}
        \mathbf{w}_1 \\
        \mathbf{w}_2 \\
        \vdots\\
        \mathbf{w}_N
    \end{array}\right],
\end{equation}\]
with \(\mathbf{V}\) a \((N \times NJ)\) matrix of support values
and \(\mathbf{w}\) a \((NJ \times 1)\) vector of unknown
probabilities. For the linear regression model specified in (1), the
Generalized Maximum Entropy (GME) estimator is given by
\[\begin{equation}
   \hat{\boldsymbol{\beta}}^{GME}(\mathbf{Z},\mathbf{V}) =
\underset{\mathbf{p},\mathbf{w}}{\operatorname{argmax}}
   \left\{-\mathbf{p}' \ln \mathbf{p} - \mathbf{w}' \ln
\mathbf{w} \right\},
\end{equation}\]
subject to the model constraint
\[\begin{equation}
   \mathbf{y}=\mathbf{XZp} + \mathbf{Vw},
\end{equation}\]
and the additivity constraints for \(\mathbf{p}\) and \(\mathbf{w}\), respectively,
\[\begin{equation}
        \begin{array}{c}
            \mathbf{1}_{K+1}=(\mathbf{I}_{K+1} \otimes
\mathbf{1}'_M)\mathbf{p},\\
            \mathbf{1}_N=(\mathbf{I}_N \otimes
\mathbf{1}'_J)\mathbf{w},
        \end{array}
\end{equation}\]
where \(\otimes\) represents the
Kronecker product, \(\mathbf{1}\) is a
column vector of ones with a specific dimension, \(\mathbf{I}\) is an identity matrix with a
specific dimension and \(\mathbf{Z}\)
and \(\mathbf{V}\) are the matrices of
supports, and \(\mathbf{p}>\mathbf{0}\) and \(\mathbf{w}>\mathbf{0}\) are probability
vectors to be estimated.
These equations can be rewritten using set notation as follows: \[\begin{align}
  &\text{maximize} & H(p,w) &=-\sum_{m=1}^M\sum_{k=0}^{K}
p_{km}ln(p_{km}) -\sum_{j=1}^J\sum_{n=1}^N w_{nj}ln(w_{nj}) \\
  &\text{subject to} & y_n &= \sum_{m=1}^M\sum_{k=0}^{K}
X_{kn}Z_{kj}p_{kj} + \sum_{m=1}^M V_{nm}w_{nm} \\
  & & \sum_{m=1}^M p_{km} = 1, \forall k\\
  & & \sum_{j=1}^J w_{kj} = 1, \forall k.
\end{align}\]
The Lagrangian equation \[\begin{equation}
    \mathcal{L}=-\mathbf{p}' \ln \mathbf{p} - \mathbf{w}' \ln
\mathbf{w} + \boldsymbol{\lambda}' \left( \mathbf{y} - \mathbf{XZp}
- \mathbf{Vw}  \right) + \boldsymbol{\theta}'\left(
\mathbf{1}_{K+1}-(\mathbf{I}_{K+1} \otimes \mathbf{1}'_M)\mathbf{p}
\right) + \boldsymbol{\tau}'\left( \mathbf{1}_N-(\mathbf{I}_N
\otimes \mathbf{1}'_J)\mathbf{w}\right)
\end{equation}\]
can be used to find the interior solution, where \(\lambda\), \(\theta\), and \(\tau\) are \((N\times 1)\), \(((K+1)\times 1)\), \((N\times 1)\) associated vectors of
Lagrangian multipliers, respectively.
Taking the gradient of the Lagrangian and solving the first-order
conditions yields the solutions for \(\mathbf{\hat p}\) and \(\mathbf{\hat w}\)
\[\begin{equation} \hat p_{km} = \frac{exp(-z_{km}\sum_{n=1}^N \hat\lambda_n x_{nk})}{\sum_{m=1}^M exp(-z_{km}\sum_{n=1}^N \hat\lambda_n x_{nk})} \end{equation}\] and \[\begin{equation} \hat w_{nj} = \frac{exp(-\hat\lambda_n v_{n})}{\sum_{j=1}^J exp(-\hat\lambda_n v_{n})}. \end{equation}\]
The GME estimator generates the probability vectors \(\hat{\mathbf{p}}\) and \(\hat{\mathbf{w}}\) that can be used to form point estimates of the unknown parameters and the unknown random errors through the reparameterizations. Since the objective function is strictly concave in the interior of the additivity constraint set, a unique solution for the GME estimator is guaranteed if the intersection of the model and the additivity constraint sets is non-empty [3]. However, there is no closed-from solution for \(\hat\lambda_n\), and hence no closed form solution for \(\hat p\) and \(\hat w\), so numerical optimization techniques must be used.
The supports in matrices \(\mathbf{Z}\) and \(\mathbf{V}\) are defined as being closed and bounded intervals within which each parameter or error is restricted to lie, implying that researchers need to provide exogenous information. Unfortunately, researchers generally do not know the true values of the coefficients they are attempting to estimate. This is considered the main weakness of the GME estimator [7]. GJM [3] discuss these issues in the case of minimal prior information: for the unknown parameters, the authors recommend the use of “wide bounds” \(\left(z_k^{lower},z_k^{upper}\right)\) for the supports in \(\mathbf{Z}\), without extreme risk consequences. Furthermore, in order to reflect that the analyst has essentially no a priori information about the true values of the coefficients and errors, they assume that the support interval of each coefficient is symmetrically centered about the origin
\[\begin{eqnarray} \mathbf{z}_k = \left[ \begin{array}{c} -z_k^{upper} \\ z_k^{upper}\frac{2\times 1 - \left( M-1 \right)} {M-1} \\ \vdots\\ 0\\ \vdots\\ z_k^{upper}\frac{2\times \left( M-2 \right) - \left( M-1 \right)} {M-1} \\ z_k^{upper} \end{array}\right]. \end{eqnarray}\]
The number of points \(M\) in the supports is less controversial and are usually used in the literature between \(3\) and \(7\) points, since authors claim that there is likely no significant improvement in the estimation with more points in the supports [3].
For the unknown errors, the authors suggest the use of the three-sigma rule with a sample scale parameter [8] and \(J=3\) points in the symmetrically centered about the origin supports
\[\begin{eqnarray} \mathbf{v}_n = \left[ \begin{array}{c} -3 \hat\sigma_\mathbf{y} \\ 3 \hat\sigma_\mathbf{y} \times \frac{2\times 1 - \left( J-1 \right)} {J-1} \\ \vdots\\ 0\\ \vdots\\ 3\hat\sigma_\mathbf{y}\frac{2\times \left( J-2 \right) - \left( J-1 \right)} {J-1} \\ 3\hat\sigma_\mathbf{y} \end{array}\right], \end{eqnarray}\] where \(\hat\sigma_\mathbf{y}\) is the sample standard deviation of \(\mathbf{y}\).
The parameters will also be referred to as the signal and the errors will also be referred to as the noise.
Using fngendata a data set under certain conditions will
be generated. To define an ill-conditioned design matrix, \(\mathbf{X}\), with a specific condition
number value, namely \(cn=50\), the
traditional singular value decomposition will be obtained and the
singular values in \(\mathbf{S}\), a
diagonal matrix with the same dimension of \(\mathbf{X}\), will be modified such that
\(cond(\mathbf{X})=cond(\mathbf{USV}')=cn\),
where \(\mathbf{U}\) and \(\mathbf{V}\) are square unitary matrices,
and \(cond(\mathbf{X'X})=cn^2\).
Errors will be defined by a normal distribution \(e_i \sim N(0,\sigma^2)\) were \(\sigma = \sqrt{var(\mathbf{X}\boldsymbol{\beta}) /
snr}\) and \(snr\) is the signal
to noise ratio.
The following matrix shows the Pearson coefficients of correlation between the variables generated.
#>             X001       X002        X003        X004        X005           y
#> X001  1.00000000 -0.5398470  0.26099144  0.07050249  0.48089010  0.55658908
#> X002 -0.53984698  1.0000000 -0.32644880 -0.33715433 -0.63996338 -0.85454082
#> X003  0.26099144 -0.3264488  1.00000000 -0.17977652  0.04219345  0.19096240
#> X004  0.07050249 -0.3371543 -0.17977652  1.00000000 -0.38741387  0.08334328
#> X005  0.48089010 -0.6399634  0.04219345 -0.38741387  1.00000000  0.77417585
#> y     0.55658908 -0.8545408  0.19096240  0.08334328  0.77417585  1.00000000
 Although obviously questionable, under a “no a priori
information” scenario, one can assume that \(z_k^{upper}=100000\), \(k\in\left\lbrace 0,\dots,6\right\rbrace\)
is a “wide upper bound” for the signal support space. Using
lmgce a model can be fitted using the GME framework. The
formula is defined as in stats::lm, for
instance. support.signal is used to define \(\mathbf{Z}\). If the same lower limit (LL)
and upper limit (UL) for the signal support space is assumed for each
\(\boldsymbol{\beta}_k\),
support.signal = c(LL, UL) should be defined.
support.signal.points can be an integer and in that case it
corresponds to \(M\). The noise support
space is set by support.noise. If one wants to apply the
three-sigma rule then the default support.noise = NULL
should be used. support.noise.points can be an integer and
in that case it corresponds to \(J\).
The optimization method primal.solnp uses the augmented
Lagrange multiplier method with an Sequential Quadratic Programming
(SQP) interior algorithm (see Rsolnp) and tries
to solve the primal version of the optimization problem previously
defined.
res.lmgce.100000 <-
  GCEstim::lmgce(
    y ~ .,
    data = dataGCE,
    support.signal = c(-100000, 100000),
    support.signal.points = 5,
    support.noise = NULL,
    support.noise.points = 3,
    twosteps.n = 0,
    method = "primal.solnp"
  )The estimated GME coefficients are \(\widehat{\boldsymbol{\beta}}^{GME_{(100000)}}=\) (1.022, -0.594, 6.822, 6.084, 14.641, 17.448).
(coef.res.lmgce.100000 <- coef(res.lmgce.100000))
#> (Intercept)        X001        X002        X003        X004        X005 
#>   1.0221150  -0.5942798   6.8215367   6.0841492  14.6412458  17.4483566They can be compared with the OLS estimation given by \(\widehat{\boldsymbol{\beta}}^{OLS}=(\mathbf{X}'\mathbf{X})^{-1}\mathbf{X}'\mathbf{y}\).
The estimated OLS coefficients are \(\widehat{\boldsymbol{\beta}}^{OLS}=\)
(1.022, -0.589, 6.846, 6.089, 14.668, 17.469).
Those results can be obtained using stats::lm
res.lm <- lm(y ~ ., data = dataGCE)
(coef.res.lm <- coef(res.lm))
#> (Intercept)        X001        X002        X003        X004        X005 
#>   1.0219926  -0.5888888   6.8461549   6.0887435  14.6678589  17.4690058or setting OLS = TRUE in lmgce, which is
the default.
res.lmgce.100000 <-
  GCEstim::lmgce(
    y ~ .,
    data = dataGCE,
    support.signal = c(-100000, 100000),
    support.signal.points = 5,
    support.noise = NULL,
    support.noise.points = 3,
    twosteps.n = 0,
    method = "primal.solnp",
    OLS = TRUE
  )coef(res.lmgce.100000$results$OLS$res)
#> (Intercept)        X001        X002        X003        X004        X005 
#>   1.0219926  -0.5888888   6.8461549   6.0887435  14.6678589  17.4690058Consider the root mean squared error as a measure of fit of prediction (\(RMSE_{\mathbf{\hat y}}\)) defined as
\[\begin{equation} RMSE_{\mathbf{\hat y}} = \sqrt{\frac{\sum_{n=1}^N {\left( y_n - \hat y_n \right)^2}}{N}}, \end{equation}\] where \(\mathbf{\hat y}=\mathbf{X}\boldsymbol{\hat\beta}\).
The prediction errors are approximately equal: \(RMSE_{\mathbf{\hat y}}^{GME_{(100000)}} \approx\) 0.405 and \(RMSE_{\mathbf{\hat y}}^{OLS} \approx\) 0.405.
(RMSE_y.lmgce.100000 <- 
  GCEstim::accmeasure(fitted(res.lmgce.100000), dataGCE$y, which = "RMSE"))
#> [1] 0.404874
# or
# res.lmgce.100000$error.measure
(RMSE_y.lm <-
    GCEstim::accmeasure(fitted(res.lm), dataGCE$y, which = "RMSE"))
#> [1] 0.4048724In order to provide an estimate of how well the model generalizes to
an independent data set one can compute the root mean squared error
using cross-validation, which involves randomly partition the data set
into \(\kappa\) folds of approximately
the same size, training the model on \(\kappa-1\) folds, and validating it on the
remaining one.
Let
\[\begin{eqnarray}
  \mathbf{X} = \left[ \begin{array}{c}
        \mathbf{X_{(1)}} \\
        \mathbf{X_{(2)}} \\
        \vdots\\
        \mathbf{X_{(\kappa)}}
    \end{array}\right],
\end{eqnarray}\] where \(\mathbf{X_{(cv)}}\) is a \((N_{cv} \times (K+1))\) design matrix such
that \(\sum_{cv=1}^\kappa N_{cv}=N\)
and \(N_{cv}\approx \frac N \kappa\),
\(cv\in\left\lbrace
1,\dots,\kappa\right\rbrace\)
The prediction \(\kappa\)-fold
cross-validation root mean squared error can be defined as
\[\begin{equation} CV\text{-}RMSE_{\mathbf{\hat y}}=\frac{\sum_{cv=1}^\kappa RMSE_{\mathbf{\hat y_{cv}}}}{\kappa}, \end{equation}\] where \(\mathbf{\hat y_{cv}}=X_{(cv)}\boldsymbol{\hat\beta_{(cv)}}\) and \(\boldsymbol{\hat\beta_{(cv)}}\) was estimated in \[\begin{eqnarray} \mathbf{X_{(cv)}} = \left[ \begin{array}{c} \mathbf{X_{(1)}} \\ \vdots\\ \mathbf{X_{(cv-1)}} \\ \mathbf{X_{(cv+1)}} \\ \vdots\\ \mathbf{X_{(\kappa)}} \end{array}\right]. \end{eqnarray}\]
To perform \(\kappa\)-fold
cross-validation with lmgce, the default, one should set
cv = TRUE and cv.nfolds = k. The default value
for \(\kappa\) is \(5\). For reproducibility a
seed should also be defined: the default is
seed = 230676.
res.lmgce.100000 <-
  GCEstim::lmgce(
    y ~ .,
    data = dataGCE,
    cv = TRUE,
    cv.nfolds = 5,
    support.signal = c(-100000, 100000),
    support.signal.points = 5,
    support.noise = NULL,
    support.noise.points = 3,
    twosteps.n = 0,
    method = "primal.solnp",
    OLS = TRUE,
    seed = 230676
  )The prediction cross-validation errors are again approximately equal: \(CV\text{-}RMSE_{\mathbf{\hat y}}^{GME_{(100000)}} \approx\) 0.436 and \(CV\text{-}RMSE_{\mathbf{\hat y}}^{OLS} \approx\) 0.436.
(CV_RMSE_y.lmgce.100000 <- res.lmgce.100000$error.measure.cv.mean)
#> [1] 0.4362289
(CV_RMSE_y.lm <- mean(res.lmgce.100000$results$OLS$error))
#> [1] 0.4361665In the example given, the “true” value of \(\boldsymbol{\beta}\) is known so the root
mean squared error can be used as a measure of fit of precision.
Let \(RMSE_{\boldsymbol{\hat\beta}}\)
be
\[\begin{equation} RMSE_{\boldsymbol{\hat\beta}} = \sqrt{\frac{\sum_{k=0}^K {\left( \beta_k - \hat\beta_k \right)^2}}{K+1}}. \end{equation}\]
The precision errors are similar in magnitude with a slight advantage for the GME approach: \(RMSE_{\boldsymbol{\hat\beta}}^{GME_{(100000)}} \approx\) 5.809 and \(RMSE_{\boldsymbol{\hat\beta}}^{OLS} \approx\) 5.825.
(RMSE_beta.lmgce.100000 <-
   GCEstim::accmeasure(coef.res.lmgce.100000, coef.dataGCE, which = "RMSE"))
#> [1] 5.808684
(RMSE_beta.lm <-
    GCEstim::accmeasure(coef.res.lm, coef.dataGCE, which = "RMSE"))
#> [1] 5.825423From the given example it seems that OLS and GME have similar
performance. But there as been a great deal of concern about the
sensitivity of the GME coefficient estimates to the choice of their
support intervals [7,9,10]. Some state that estimates are not
invariant to the design of the support intervals [9].
Suppose that one had assumed that \(z_k^{upper}=100\), \(k\in\left\lbrace 0,\dots,6\right\rbrace\)
was a “wide upper bound” for the signal support space.
res.lmgce.100 <-
  GCEstim::lmgce(
    y ~ .,
    data = dataGCE,
    cv = TRUE,
    cv.nfolds = 5,
    support.signal = c(-100, 100),
    support.signal.points = 5,
    support.noise = NULL,
    support.noise.points = 3,
    twosteps.n = 0,
    method = "primal.solnp",
    OLS = TRUE,
    seed = 230676
  )The estimated GME coefficients under these conditions are \(\widehat{\boldsymbol{\beta}}^{GME_{(100)}}=\) (1.026, -0.158, 1.823, 3.321, 8.397, 11.469).
The prediction errors are still approximately the same, \(RMSE_{\mathbf{\hat y}}^{GME_{(100)}} \approx\) 0.407 and \(CV\text{-}RMSE_{\mathbf{\hat y}}^{GME_{(100)}} \approx\) 0.437, but a significant decrease was found for the precision error, \(RMSE_{\boldsymbol{\hat\beta}}^{GME_{(100)}} \approx\) 1.597.
If \(z_k^{upper}=50\) then \(\widehat{\boldsymbol{\beta}}^{GME_{(50)}}=\) (1.028, 0.168, -1.842, 1.293, 3.808, 7.067), \(RMSE_{\mathbf{\hat y}}^{GME_{(50)}} \approx\) 0.411, \(CV\text{-}RMSE_{\mathbf{\hat y}}^{GME_{(50)}} \approx\) 0.441 and \(RMSE_{\boldsymbol{\hat\beta}}^{GME_{(50)}} \approx\) 1.575.
As stated before, the “true” value of \(\boldsymbol{\beta}\) was predefined so there is a prior information. That information can be incorporated in the GME estimations. If one wants to have symmetrically centered supports it is possible to define, for instance, the following: \(\mathbf{z}_0'= \left[ -5, -2.5, 0, 2.5, 5\right]\), \(\mathbf{z}_1' = \mathbf{z}_2'= \left[ -2, -1, 0, 1, 2\right]\), \(\mathbf{z}_3'= \left[ -6, -3, 0, 3, 6\right]\), \(\mathbf{z}_4'= \left[ -10, -5, 0, 5, 10\right]\) and \(\mathbf{z}_5'= \left[ -15, -7.5, 0, 7.5, 15\right]\).
res.lmgce.apriori.centered.zero <-
  GCEstim::lmgce(
    y ~ .,
    data = dataGCE,
    support.signal = matrix(c(-5, 5,
                       -2, 2,
                       -2, 2,
                       -6, 6,
                       -10, 10,
                       -10, 15),
                     ncol = 2,
                     byrow = TRUE), 
    support.signal.points = 5,
    support.noise = NULL,
    support.noise.points = 3,
    twosteps.n = 0,
    method = "primal.solnp",
    OLS = TRUE,
    seed = 230676
  )The estimated GME coefficients with a prior information and zero centered signal supports are \(\widehat{\boldsymbol{\beta}}^{GME_{(apriori.centered.zero)}}=\) (1.009, 0.232, -0.437, 1.38, 4.094, 7.797).
The prediction error is slightly higher, \(RMSE_{\mathbf{\hat y}}^{GME_{(apriori.centered.zero)}} \approx\) 0.426, as well as the cross-validation prediction error, \(CV\text{-}RMSE_{\mathbf{\hat y}}^{GME_{(apriori.centered.zero)}} \approx\) 0.454, but again a significant decrease was found for the precision error, \(RMSE_{\boldsymbol{\hat\beta}}^{GME_{(apriori.centered.zero)}} \approx\) 1.151.
Some authors state that a narrow support around this a priori knowledge will provide a far better fit than its OLS counterpart, particularly in small sample sizes [11]. So, a different approach could be centering the signal support spaces at the “true” values of \(\boldsymbol{\beta}\) and setting, for instance, a range equal to \(4\) for each support space: \(\mathbf{z}_0'= \left[ -1, 0, 1, 2, 3\right]\), \(\mathbf{z}_1' = \mathbf{z}_2'= \left[ -2, -1, 0, 1, 2\right]\), \(\mathbf{z}_3'= \left[ 1, 2, 3, 4, 5\right]\), \(\mathbf{z}_4'= \left[ 4, 5, 6, 7, 8\right]\) and \(\mathbf{z}_5'= \left[ 7, 8, 9, 10, 11\right]\).
res.lmgce.apriori.centered.beta <-
  GCEstim::lmgce(
    y ~ .,
    data = dataGCE,
    support.signal = matrix(c(-1, 3,
                       -2, 2,
                       -2, 2,
                        1, 5,
                        4, 8,
                        7, 11),
                     ncol = 2,
                     byrow = TRUE), 
    support.signal.points = 5,
    support.noise = NULL,
    support.noise.points = 3,
    twosteps.n = 0,
    method = "primal.solnp",
    OLS = TRUE,
    seed = 230676
  )The estimated GME coefficients when the support is centered in \(\boldsymbol\beta\) are \(\widehat{\boldsymbol{\beta}}^{GME_{(apriori.centered.beta)}}=\) (1.027, -0.006, -0.001, 2.823, 6.027, 9.068).
The prediction error (\(RMSE_{\mathbf{\hat y}}^{GME_{(apriori.centered.beta)}} \approx\) 0.413), the cross-validation prediction error (\(CV\text{-}RMSE_{\mathbf{\hat y}}^{GME_{(apriori.centered.beta)}} \approx\) 0.423) and precision error (\(RMSE_{\boldsymbol{\hat\beta}}^{GME_{(apriori.centered.beta)}} \approx\) 0.079) improved when compared with the centered at zero approach.
The following table summarizes the previous results.
| \(OLS\) | \(GME_{(100000)}\) | \(GME_{(100)}\) | \(GME_{(50)}\) | \(GME_{(apriori.centered.zero)}\) | \(GME_{(apriori.centered.beta)}\) | |
|---|---|---|---|---|---|---|
| Prediction RMSE | 0.405 | 0.405 | 0.407 | 0.411 | 0.426 | 0.413 | 
| Prediction CV-RMSE | 0.436 | 0.436 | 0.437 | 0.441 | 0.454 | 0.423 | 
| Precision RMSE | 5.825 | 5.809 | 1.597 | 1.575 | 1.151 | 0.079 | 
Given that the absence of knowledge about the true value of coefficients is generally the observed situation in applied work, it is manifestly important to come to some general procedure to define those supports.
This work was supported by Fundação para a Ciência e Tecnologia (FCT) through CIDMA and projects https://doi.org/10.54499/UIDB/04106/2020 and https://doi.org/10.54499/UIDP/04106/2020.