Title: | Distance-Based Statistics |
---|---|
Description: | Prediction methods where explanatory information is coded as a matrix of distances between individuals. Distances can either be directly input as a distances matrix, a squared distances matrix, an inner-products matrix or computed from observed predictors. |
Authors: | Boj, Eva <[email protected]>, Caballe, Adria <[email protected]>, Delicado, Pedro <[email protected]> and Fortiana, Josep <[email protected]>. |
Maintainer: | Eva Boj <[email protected]> |
License: | GPL-2 |
Version: | 2.0.2 |
Built: | 2024-12-22 06:20:51 UTC |
Source: | CRAN |
This package contains functions for distance-based prediction methods.
These are methods for prediction where predictor information is coded as a matrix of distances between individuals.
In the currently implemented methods the response is a univariate variable as in the ordinary linear model or in the generalized linear model.
Distances can either be directly input as an distances matrix,
a squared distances matrix, an inner-products matrix
(see GtoD2
) or computed from observed
explanatory variables.
Notation convention: in distance-based methods we must distinguish observed explanatory variables which we denote by Z or z, from Euclidean coordinates which we denote by X or x. For explanation on the meaning of both terms see the bibliography references below.
Observed explanatory variables z are possibly a mixture of continuous and qualitative explanatory variables or more general quantities.
dbstats does not provide specific functions for computing distances, depending instead on other functions and packages, such as:
dist
in the stats package.
dist
in the proxy package. When the
proxy package is loaded, its dist
function
supersedes the one in the stats package.
daisy
in the cluster package.
Compared to both instances of dist
above whose input must be
numeric variables, the main feature of daisy
is
its ability to handle other variable types as well (e.g. nominal, ordinal,
(a)symmetric binary) even when different types occur in the same data set.
Actually the last statement is not hundred percent true: it refers only to
the default behaviour of both dist
functions, whereas the
dist
function in the proxy package can
evaluate distances between observations with a user-provided function,
entered as a parameter, hence it can deal with any type of data. See the
examples in pr_DB
.
Functions of dbstats package:
Linear and local linear models with a continuous response:
dblm
for distance-based linear models.
ldblm
for local distance-based linear models.
dbplsr
for distance-based partial least squares.
Generalized linear and local generalized linear models with a numeric response:
dbglm
for distance-based generalized linear models.
ldbglm
for local distance-based generalized linear models.
Package: | dbstats |
Type: | Package |
Version: | 2.0.2 |
Date: | 2024-01-26 |
License: | GPL-2 |
LazyLoad: | yes |
Boj, Eva <[email protected]>, Caballe, Adria <[email protected]>, Delicado, Pedro <[email protected]> and Fortiana, Josep <[email protected]>
Boj E, Caballe, A., Delicado P, Esteve, A., Fortiana J (2016). Global and local distance-based generalized linear models. TEST 25, 170-195.
Boj E, Delicado P, Fortiana J (2010). Distance-based local linear regression for functional predictors. Computational Statistics and Data Analysis 54, 429-437.
Boj E, Grane A, Fortiana J, Claramunt MM (2007). Implementing PLS for distance-based regression: computational issues. Computational Statistics 22, 237-248.
Boj E, Grane A, Fortiana J, Claramunt MM (2007). Selection of predictors in distance-based regression. Communications in Statistics B - Simulation and Computation 36, 87-98.
Cuadras CM, Arenas C, Fortiana J (1996). Some computational aspects of a distance-based model for prediction. Communications in Statistics B - Simulation and Computation 25, 593-609.
Cuadras C, Arenas C (1990). A distance-based regression model for prediction with mixed data. Communications in Statistics A - Theory and Methods 19, 2261-2279.
Cuadras CM (1989). Distance analysis in discrimination and classification using both continuous and categorical variables. In: Y. Dodge (ed.), Statistical Data Analysis and Inference. Amsterdam, The Netherlands: North-Holland Publishing Co., pp. 459-473.
as.D2
attempts to turn its argument into a D2
class object.
is.D2
tests if its argument is a (strict) D2
class object.
as.D2(x) is.D2(x)
as.D2(x) is.D2(x)
x |
an R object. |
An object of class D2
containing the squared distances
matrix between individuals.
Boj, Eva <[email protected]>, Caballe, Adria <[email protected]>, Delicado, Pedro <[email protected]> and Fortiana, Josep <[email protected]>
D2toG
, disttoD2
, D2toDist
and
GtoD2
for conversions.
as.Gram
attempts to turn its argument into a Gram
class object.
is.Gram
tests if its argument is a (strict) Gram
class object.
as.Gram(x) is.Gram(x)
as.Gram(x) is.Gram(x)
x |
an R object. |
A Gram
class object. Weighted centered inner products matrix of the
squared distances matrix.
Boj, Eva <[email protected]>, Caballe, Adria <[email protected]>, Delicado, Pedro <[email protected]> and Fortiana, Josep <[email protected]>
D2toG
, disttoD2
, D2toDist
and
GtoD2
for conversions.
Converts D2
class object into dist
class object.
D2toDist(D2)
D2toDist(D2)
D2 |
|
An object of class dist
. See function dist
for
details.
Boj, Eva <[email protected]>, Caballe, Adria <[email protected]>, Delicado, Pedro <[email protected]> and Fortiana, Josep <[email protected]>
X <- matrix(rnorm(100*3),nrow=100) distance <- daisy(X,"manhattan") D2 <- disttoD2(distance) distance2 <- D2toDist(D2)
X <- matrix(rnorm(100*3),nrow=100) distance <- daisy(X,"manhattan") D2 <- disttoD2(distance) distance2 <- D2toDist(D2)
Converts D2
class object into Gram
class object.
D2toG(D2,weights)
D2toG(D2,weights)
D2 |
|
weights |
an optional numeric vector of weights. By default all individuals have the same weight. |
An object of class Gram
containing the Doubly centered
inner product matrix of D2
.
Boj, Eva <[email protected]>, Caballe, Adria <[email protected]>, Delicado, Pedro <[email protected]> and Fortiana, Josep <[email protected]>
X <- matrix(rnorm(100*3),nrow=100) D2 <- as.matrix(dist(X)^2) class(D2) <- "D2" G <- D2toG(D2,weights=NULL)
X <- matrix(rnorm(100*3),nrow=100) D2 <- as.matrix(dist(X)^2) class(D2) <- "D2" G <- D2toG(D2,weights=NULL)
dbglm
is a variety of generalized linear model where explanatory
information is coded as distances between individuals. These distances
can either be computed from observed explanatory variables or directly
input as a squared distances matrix.
Response and link function as in the glm
function for ordinary
generalized linear models.
Notation convention: in distance-based methods we must distinguish observed explanatory variables which we denote by Z or z, from Euclidean coordinates which we denote by X or x. For explanation on the meaning of both terms see the bibliography references below.
## S3 method for class 'formula' dbglm(formula, data, family=gaussian, method ="GCV", full.search=TRUE,..., metric="euclidean", weights, maxiter=100, eps1=1e-10, eps2=1e-10, rel.gvar=0.95, eff.rank=NULL, offset, mustart=NULL, range.eff.rank) ## S3 method for class 'dist' dbglm(distance,y,family=gaussian, method ="GCV", full.search=TRUE, weights, maxiter=100,eps1=1e-10,eps2=1e-10,rel.gvar=0.95,eff.rank=NULL, offset,mustart=NULL, range.eff.rank,...) ## S3 method for class 'D2' dbglm(D2,y,...,family=gaussian, method ="GCV", full.search=TRUE, weights,maxiter=100, eps1=1e-10,eps2=1e-10,rel.gvar=0.95,eff.rank=NULL,offset, mustart=NULL, range.eff.rank) ## S3 method for class 'Gram' dbglm(G,y,...,family=gaussian, method ="GCV", full.search=TRUE, weights,maxiter=100, eps1=1e-10,eps2=1e-10,rel.gvar=0.95,eff.rank=NULL, offset,mustart=NULL, range.eff.rank)
## S3 method for class 'formula' dbglm(formula, data, family=gaussian, method ="GCV", full.search=TRUE,..., metric="euclidean", weights, maxiter=100, eps1=1e-10, eps2=1e-10, rel.gvar=0.95, eff.rank=NULL, offset, mustart=NULL, range.eff.rank) ## S3 method for class 'dist' dbglm(distance,y,family=gaussian, method ="GCV", full.search=TRUE, weights, maxiter=100,eps1=1e-10,eps2=1e-10,rel.gvar=0.95,eff.rank=NULL, offset,mustart=NULL, range.eff.rank,...) ## S3 method for class 'D2' dbglm(D2,y,...,family=gaussian, method ="GCV", full.search=TRUE, weights,maxiter=100, eps1=1e-10,eps2=1e-10,rel.gvar=0.95,eff.rank=NULL,offset, mustart=NULL, range.eff.rank) ## S3 method for class 'Gram' dbglm(G,y,...,family=gaussian, method ="GCV", full.search=TRUE, weights,maxiter=100, eps1=1e-10,eps2=1e-10,rel.gvar=0.95,eff.rank=NULL, offset,mustart=NULL, range.eff.rank)
formula |
an object of class |
data |
an optional data frame containing the variables in the model (both response and explanatory variables, either the observed ones, Z, or a Euclidean configuration X). |
y |
(required if no formula is given as the principal argument). Response (dependent variable) must be numeric, factor, matrix or data.frame. |
distance |
a |
D2 |
a |
G |
a |
family |
a description of the error distribution and link function to be used
in the model.
This can be a character string naming a family function, a family
function or the result of a call to a family function.
(See |
metric |
metric function to be used when computing distances from observed
explanatory variables.
One of |
weights |
an optional numeric vector of prior weights to be used in the fitting process. By default all individuals have the same weight. |
method |
sets the method to be used in deciding the effective rank,
which is defined as the number of linearly independent Euclidean
coordinates used in prediction.
There are five different methods: |
full.search |
sets which optimization procedure will be used to
minimize the modelling criterion specified in |
maxiter |
maximum number of iterations in the iterated |
eps1 |
stopping criterion 1, |
eps2 |
stopping criterion 2, |
rel.gvar |
relative geometric variability (a real number between 0 and 1).
In each |
eff.rank |
integer between 1 and the number of observations minus one.
Number of Euclidean coordinates used for model fitting in
each |
offset |
this can be used to specify an a priori known component to be included in the linear predictor during fitting. This should be NULL or a numeric vector of length equal to the number of cases. |
mustart |
starting values for the vector of means. |
range.eff.rank |
vector of size two defining the range of values for the effective rank with which the dblm iterations
will be evaluated (must be specified when |
... |
arguments passed to or from other methods to the low level. |
The various possible ways for inputting the model explanatory
information through distances, or their squares, etc., are the
same as in dblm
.
For gamma distributions, the domain of the canonical link function
is not the same as the permitted range of the mean. In particular,
the linear predictor might be negative, obtaining an impossible
negative mean. Should that event occur, dbglm
stops with
an error message. Proposed alternative is to use a non-canonical link
function.
A list of class dbglm
containing the following components:
residuals |
the |
fitted.values |
the fitted mean values, results of final |
family |
the |
deviance |
measure of discrepancy or badness of fit. Proportional to twice the difference between the maximum achievable log-likelihood and that achieved by the current model. |
aic.model |
a version of Akaike's Information Criterion. Equal to minus twice the maximized log-likelihood plus twice the number of parameters. Computed by the aic component of the family. For binomial and Poison families the dispersion is fixed at one and the number of parameters is the number of coefficients. For gaussian, Gamma and inverse gaussian families the dispersion is estimated from the residual deviance, and the number of parameters is the number of coefficients plus one. For a gaussian family the MLE of the dispersion is used so this is a valid value of AIC, but for Gamma and inverse gaussian families it is not. For families fitted by quasi-likelihood the value is NA. |
bic.model |
a version of the Bayessian Information Criterion. Equal to minus twice the maximized log-likelihood plus the logarithm of the number of observations by the number of parameters (see, e.g., Wood 2006). |
gcv.model |
a version of the Generalized Cross-Validation Criterion. We refer to Wood (2006) pp. 177-178 for details. |
null.deviance |
the deviance for the null model. The null model will include the offset, and an intercept if there is one in the model. Note that this will be incorrect if the link function depends on the data other than through the fitted mean: specify a zero offset to force a correct calculation. |
iter |
number of Fisher scoring ( |
prior.weights |
the original weights. |
weights |
the |
df.residual |
the residual degrees of freedom. |
df.null |
the residual degrees of freedom for the null model. |
y |
the response vector used. |
convcrit |
convergence criterion. One of: |
H |
hat matrix projector of the last |
rel.gvar |
the relative geometric variabiliy in the last |
eff.rank |
the |
varmu |
vector of estimated variance of each observation. |
dev.resids |
deviance residuals |
call |
the matched call. |
Objects of class "dbglm"
are actually of class
c("dbglm", "dblm")
, inheriting the plot.dblm
method
from class "dblm"
.
When the Euclidean distance is used the dbglm
model reduces
to the generalized linear model (glm
).
Boj, Eva <[email protected]>, Caballe, Adria <[email protected]>, Delicado, Pedro <[email protected]> and Fortiana, Josep <[email protected]>
Boj E, Caballe, A., Delicado P, Esteve, A., Fortiana J (2016). Global and local distance-based generalized linear models. TEST 25, 170-195.
Boj E, Delicado P, Fortiana J (2010). Distance-based local linear regression for functional predictors. Computational Statistics and Data Analysis 54, 429-437.
Boj E, Grane A, Fortiana J, Claramunt MM (2007). Selection of predictors in distance-based regression. Communications in Statistics B - Simulation and Computation 36, 87-98.
Cuadras CM, Arenas C, Fortiana J (1996). Some computational aspects of a distance-based model for prediction. Communications in Statistics B - Simulation and Computation 25, 593-609.
Cuadras C, Arenas C (1990). A distance-based regression model for prediction with mixed data. Communications in Statistics A - Theory and Methods 19, 2261-2279.
Cuadras CM (1989). Distance analysis in discrimination and classification using both continuous and categorical variables. In: Y. Dodge (ed.), Statistical Data Analysis and Inference. Amsterdam, The Netherlands: North-Holland Publishing Co., pp. 459-473.
Wood SN (2006). Generalized Additive Models: An Introduction with R. Chapman & Hall, Boca Raton.
summary.dbglm
for summary.plot.dbglm
for plots.predict.dbglm
for predictions.dblm
for distance-based linear models.
## CASE POISSON z <- rnorm(100) y <- rpois(100, exp(1+z)) glm1 <- glm(y ~z, family = poisson(link = "log")) D2 <- as.matrix(dist(z))^2 class(D2) <- "D2" dbglm1 <- dbglm(D2,y,family = poisson(link = "log"), method="rel.gvar") plot(z,y) points(z,glm1$fitted.values,col=2) points(z,dbglm1$fitted.values,col=3) sum((glm1$fitted.values-y)^2) sum((dbglm1$fitted.values-y)^2) ## CASE BINOMIAL y <- rbinom(100, 1, plogis(z)) # needs to set a starting value for the next fit glm2 <- glm(y ~z, family = binomial(link = "logit")) D2 <- as.matrix(dist(z))^2 class(D2) <- "D2" dbglm2 <- dbglm(D2,y,family = binomial(link = "logit"), method="rel.gvar") plot(z,y) points(z,glm2$fitted.values,col=2) points(z,dbglm2$fitted.values,col=3) sum((glm2$fitted.values-y)^2) sum((dbglm2$fitted.values-y)^2)
## CASE POISSON z <- rnorm(100) y <- rpois(100, exp(1+z)) glm1 <- glm(y ~z, family = poisson(link = "log")) D2 <- as.matrix(dist(z))^2 class(D2) <- "D2" dbglm1 <- dbglm(D2,y,family = poisson(link = "log"), method="rel.gvar") plot(z,y) points(z,glm1$fitted.values,col=2) points(z,dbglm1$fitted.values,col=3) sum((glm1$fitted.values-y)^2) sum((dbglm1$fitted.values-y)^2) ## CASE BINOMIAL y <- rbinom(100, 1, plogis(z)) # needs to set a starting value for the next fit glm2 <- glm(y ~z, family = binomial(link = "logit")) D2 <- as.matrix(dist(z))^2 class(D2) <- "D2" dbglm2 <- dbglm(D2,y,family = binomial(link = "logit"), method="rel.gvar") plot(z,y) points(z,glm2$fitted.values,col=2) points(z,dbglm2$fitted.values,col=3) sum((glm2$fitted.values-y)^2) sum((dbglm2$fitted.values-y)^2)
dblm
is a variety of linear model where explanatory information
is coded as distances between individuals. These distances can either
be computed from observed explanatory variables or directly input as
a squared distances matrix. The response is a continuous variable as
in the ordinary linear model. Since distances can be computed from a mixture
of continuous and qualitative explanatory variables or,
in fact, from more general quantities, dblm
is a proper extension of
lm
.
Notation convention: in distance-based methods we must distinguish observed explanatory variables which we denote by Z or z, from Euclidean coordinates which we denote by X or x. For explanation on the meaning of both terms see the bibliography references below.
## S3 method for class 'formula' dblm(formula,data,...,metric="euclidean",method="OCV",full.search=TRUE, weights,rel.gvar=0.95,eff.rank) ## S3 method for class 'dist' dblm(distance,y,...,method="OCV",full.search=TRUE, weights,rel.gvar=0.95,eff.rank) ## S3 method for class 'D2' dblm(D2,y,...,method="OCV",full.search=TRUE,weights,rel.gvar=0.95, eff.rank) ## S3 method for class 'Gram' dblm(G,y,...,method="OCV",full.search=TRUE,weights,rel.gvar=0.95, eff.rank)
## S3 method for class 'formula' dblm(formula,data,...,metric="euclidean",method="OCV",full.search=TRUE, weights,rel.gvar=0.95,eff.rank) ## S3 method for class 'dist' dblm(distance,y,...,method="OCV",full.search=TRUE, weights,rel.gvar=0.95,eff.rank) ## S3 method for class 'D2' dblm(D2,y,...,method="OCV",full.search=TRUE,weights,rel.gvar=0.95, eff.rank) ## S3 method for class 'Gram' dblm(G,y,...,method="OCV",full.search=TRUE,weights,rel.gvar=0.95, eff.rank)
formula |
an object of class |
data |
an optional data frame containing the variables in the model (both response and explanatory variables, either the observed ones, Z, or a Euclidean configuration X). |
y |
(required if no formula is given as the principal argument). Response (dependent variable) must be numeric, matrix or data.frame. |
distance |
a |
D2 |
a |
G |
a |
metric |
metric function to be used when computing distances from observed
explanatory variables.
One of |
method |
sets the method to be used in deciding the effective rank,
which is defined as the number of linearly independent Euclidean
coordinates used in prediction.
There are six different methods: When method is When method is |
full.search |
sets which optimization procedure will be used to
minimize the modelling criterion specified in |
weights |
an optional numeric vector of weights to be used in the fitting process. By default all individuals have the same weight. |
rel.gvar |
relative geometric variability (real between 0 and 1). Take the
lowest effective rank with a relative geometric variability higher
or equal to |
eff.rank |
integer between 1 and the number of observations minus one.
Number of Euclidean coordinates used for model fitting. Applies only
if |
... |
arguments passed to or from other methods to the low level. |
The dblm
model uses the distance matrix between individuals
to find an appropriate prediction method.
There are many ways to compute and calculate this matrix, besides
the three included as parameters in this function.
Several packages in R also study this problem. In particular
dist
in the package stats
and daisy
in the package cluster
(the three metrics in dblm
call
the daisy
function).
Another way to enter a distance matrix to the model is through an object
of class "D2"
(containing the squared distances matrix).
An object of class "dist"
or "dissimilarity"
can
easily be transformed into one of class "D2"
. See disttoD2
.
Reciprocally, an object of class "D2"
can be transformed into one
of class "dist"
. See D2toDist
.
S3 method Gram uses the Doubly centered inner product matrix G=XX'.
Its also easily to transformed into one of class "D2"
.
See D2toG
and GtoD2
.
The weights array is adequate when responses for different individuals have different variances. In this case the weights array should be (proportional to) the reciprocal of the variances vector.
When using method method="eff.rank"
or method="rel.gvar"
,
a compromise between possible consequences of a bad choice has to be
reached. If the rank is too large, the model can be overfitted, possibly
leading to an increased prediction error for new cases
(even though R2 is higher). On the other hand, a small rank suggests
a model inadequacy (R2 is small). The other four methods are less error
prone (but still they do not guarantee good predictions).
A list of class dblm
containing the following components:
residuals |
the residuals (response minus fitted values). |
fitted.values |
the fitted mean values. |
df.residuals |
the residual degrees of freedom. |
weights |
the specified weights. |
y |
the response used to fit the model. |
H |
the hat matrix projector. |
call |
the matched call. |
rel.gvar |
the relative geometric variabiliy, used to fit the model. |
eff.rank |
the dimensions chosen to estimate the model. |
ocv |
the ordinary cross-validation estimate of the prediction error. |
gcv |
the generalized cross-validation estimate of the prediction error. |
aic |
the Akaike Value Criterium of the model (only if |
bic |
the Bayesian Value Criterium of the model (only if |
When the Euclidean distance is used the dblm
model reduces to the linear
model (lm
).
Boj, Eva <[email protected]>, Caballe, Adria <[email protected]>, Delicado, Pedro <[email protected]> and Fortiana, Josep <[email protected]>
Boj E, Caballe, A., Delicado P, Esteve, A., Fortiana J (2016). Global and local distance-based generalized linear models. TEST 25, 170-195.
Boj E, Delicado P, Fortiana J (2010). Distance-based local linear regression for functional predictors. Computational Statistics and Data Analysis 54, 429-437.
Boj E, Grane A, Fortiana J, Claramunt MM (2007). Selection of predictors in distance-based regression. Communications in Statistics B - Simulation and Computation 36, 87-98.
Cuadras CM, Arenas C, Fortiana J (1996). Some computational aspects of a distance-based model for prediction. Communications in Statistics B - Simulation and Computation 25, 593-609.
Cuadras C, Arenas C (1990). A distance-based regression model for prediction with mixed data. Communications in Statistics A - Theory and Methods 19, 2261-2279.
Cuadras CM (1989). Distance analysis in discrimination and classification using both continuous and categorical variables. In: Y. Dodge (ed.), Statistical Data Analysis and Inference. Amsterdam, The Netherlands: North-Holland Publishing Co., pp. 459-473.
summary.dblm
for summary.plot.dblm
for plots.predict.dblm
for predictions.ldblm
for distance-based local linear models.
# easy example to illustrate usage of the dblm function n <- 100 p <- 3 k <- 5 Z <- matrix(rnorm(n*p),nrow=n) b <- matrix(runif(p)*k,nrow=p) s <- 1 e <- rnorm(n)*s y <- Z%*%b + e D<-dist(Z) dblm1 <- dblm(D,y) lm1 <- lm(y~Z) # the same fitted values with the lm mean(lm1$fitted.values-dblm1$fitted.values)
# easy example to illustrate usage of the dblm function n <- 100 p <- 3 k <- 5 Z <- matrix(rnorm(n*p),nrow=n) b <- matrix(runif(p)*k,nrow=p) s <- 1 e <- rnorm(n)*s y <- Z%*%b + e D<-dist(Z) dblm1 <- dblm(D,y) lm1 <- lm(y~Z) # the same fitted values with the lm mean(lm1$fitted.values-dblm1$fitted.values)
dbplsr
is a variety of partial least squares regression
where explanatory information is coded as distances between individuals.
These distances can either be computed from observed explanatory variables
or directly input as a squared distances matrix.
Since distances can be computed from a mixture of continuous and
qualitative explanatory variables or, in fact, from more general
quantities, dbplsr
is a proper extension of plsr
.
Notation convention: in distance-based methods we must distinguish observed explanatory variables which we denote by Z or z, from Euclidean coordinates which we denote by X or x. For explanation on the meaning of both terms see the bibliography references below.
## S3 method for class 'formula' dbplsr(formula,data,...,metric="euclidean", method="ncomp",weights,ncomp) ## S3 method for class 'dist' dbplsr(distance,y,...,weights,ncomp=ncomp,method="ncomp") ## S3 method for class 'D2' dbplsr(D2,y,...,weights,ncomp=ncomp,method="ncomp") ## S3 method for class 'Gram' dbplsr(G,y,...,weights,ncomp=ncomp,method="ncomp")
## S3 method for class 'formula' dbplsr(formula,data,...,metric="euclidean", method="ncomp",weights,ncomp) ## S3 method for class 'dist' dbplsr(distance,y,...,weights,ncomp=ncomp,method="ncomp") ## S3 method for class 'D2' dbplsr(D2,y,...,weights,ncomp=ncomp,method="ncomp") ## S3 method for class 'Gram' dbplsr(G,y,...,weights,ncomp=ncomp,method="ncomp")
formula |
an object of class |
data |
an optional data frame containing the variables in the model (both response and explanatory variables, either the observed ones, Z, or a Euclidean configuration X). |
y |
(required if no formula is given as the principal argument). Response (dependent variable) must be numeric, matrix or data.frame. |
distance |
a |
D2 |
a |
G |
a |
metric |
metric function to be used when computing distances from observed
explanatory variables.
One of |
method |
sets the method to be used in deciding how many components needed to fit
the best model for new predictions.
There are five different methods, |
weights |
an optional numeric vector of weights to be used in the fitting process. By default all individuals have the same weight. |
ncomp |
the number of components to include in the model. |
... |
arguments passed to or from other methods to the low level. |
Partial least squares (PLS) is a method for constructing
predictive models when the factors (Z) are many and highly collinear.
A PLS model will try to find the multidimensional direction
in the Z space that explains the maximum multidimensional variance direction
in the Y space. dbplsr
is particularly suited when the matrix of
predictors has more variables than observations.
By contrast, standard regression (dblm
) will fail in these cases.
The various possible ways for inputting the model explanatory
information through distances, or their squares, etc., are the
same as in dblm
.
The number of components to fit is specified with the argument ncomp
.
A list of class dbplsr
containing the following components:
residuals |
a list containing the residuals (response minus fitted values) for each iteration. |
fitted.values |
a list containing the fitted values for each iteration. |
fk |
a list containing the scores for each iteration. |
bk |
regression coefficients. |
Pk |
orthogonal projector on the one-dimensional linear space by |
ncomp |
number of components included in the model. |
ncomp.opt |
optimum number of components according to the selected method. |
weights |
the specified weights. |
method |
the using method. |
y |
the response used to fit the model. |
H |
the hat matrix projector. |
G0 |
initial weighted centered inner products matrix of the squared distance matrix. |
Gk |
weighted centered inner products matrix in last iteration. |
gvar |
total weighted geometric variability. |
gvec |
the diagonal entries in |
gvar.iter |
geometric variability for each iteration. |
ocv |
the ordinary cross-validation estimate of the prediction error. |
gcv |
the generalized cross-validation estimate of the prediction error. |
aic |
the Akaike Value Criterium of the model. |
bic |
the Bayesian Value Criterium of the model. |
When the Euclidean distance is used the dbplsr
model reduces to the
traditional partial least squares (plsr
).
Boj, Eva <[email protected]>, Caballe, Adria <[email protected]>, Delicado, Pedro <[email protected]> and Fortiana, Josep <[email protected]>
Boj E, Delicado P, Fortiana J (2010). Distance-based local linear regression for functional predictors. Computational Statistics and Data Analysis 54, 429-437.
Boj E, Grane A, Fortiana J, Claramunt MM (2007). Implementing PLS for distance-based regression: computational issues. Computational Statistics 22, 237-248.
Boj E, Grane A, Fortiana J, Claramunt MM (2007). Selection of predictors in distance-based regression. Communications in Statistics B - Simulation and Computation 36, 87-98.
Cuadras CM, Arenas C, Fortiana J (1996). Some computational aspects of a distance-based model for prediction. Communications in Statistics B - Simulation and Computation 25, 593-609.
Cuadras C, Arenas C (1990). A distance-based regression model for prediction with mixed data. Communications in Statistics A - Theory and Methods 19, 2261-2279.
Cuadras CM (1989). Distance analysis in discrimination and classification using both continuous and categorical variables. In: Y. Dodge (ed.), Statistical Data Analysis and Inference. Amsterdam, The Netherlands: North-Holland Publishing Co., pp. 459-473.
summary.dbplsr
for summary.plot.dbplsr
for plots.predict.dbplsr
for predictions.
#require(pls) library(pls) data(yarn) ## Default methods: yarn.dbplsr <- dbplsr(density ~ NIR, data = yarn, ncomp=6, method="GCV")
#require(pls) library(pls) data(yarn) ## Default methods: yarn.dbplsr <- dbplsr(density ~ NIR, data = yarn, ncomp=6, method="GCV")
Converts dist
or dissimilarity
class object into D2
class object.
disttoD2(distance)
disttoD2(distance)
distance |
|
An object of class D2
containing the squared distances matrix
between individuals.
Boj, Eva <[email protected]>, Caballe, Adria <[email protected]>, Delicado, Pedro <[email protected]> and Fortiana, Josep <[email protected]>
X <- matrix(rnorm(100*3),nrow=100) distance <- daisy(X,"manhattan") D2 <- disttoD2(distance)
X <- matrix(rnorm(100*3),nrow=100) distance <- daisy(X,"manhattan") D2 <- disttoD2(distance)
Converts Gram
class object into D2
class object
GtoD2(G)
GtoD2(G)
G |
|
An object of class D2
containing the squared distances matrix
between individuals.
Boj, Eva <[email protected]>, Caballe, Adria <[email protected]>, Delicado, Pedro <[email protected]> and Fortiana, Josep <[email protected]>
X <- matrix(rnorm(100*3),nrow=100) D2 <- as.matrix(dist(X)^2) class(D2) <- "D2" G <- D2toG(D2,weights=NULL) class(G) <- "Gram" D22 <- GtoD2(G)
X <- matrix(rnorm(100*3),nrow=100) D2 <- as.matrix(dist(X)^2) class(D2) <- "D2" G <- D2toG(D2,weights=NULL) class(G) <- "Gram" D22 <- GtoD2(G)
ldbglm
is a localized version of a distance-based generalized linear
model. As in the global model dbglm
, explanatory information is
coded as distances between individuals.
Neighborhood definition for localizing is done by the (semi)metric
dist1
whereas a second (semi)metric dist2
(which may coincide
with dist1
) is used for distance-based prediction.
Both dist1
and dist2
can either be computed from observed
explanatory variables or directly input as a squared distances
matrix or as a Gram
matrix. Response and link function are as in the
dbglm
function for ordinary generalized linear models.
The model allows for a mixture of continuous and qualitative explanatory
variables or, in fact, from more general quantities such as functional data.
Notation convention: in distance-based methods we must distinguish observed explanatory variables which we denote by Z or z, from Euclidean coordinates which we denote by X or x. For explanation on the meaning of both terms see the bibliography references below.
## S3 method for class 'formula' ldbglm(formula,data,...,family=gaussian(),kind.of.kernel=1, metric1="euclidean",metric2=metric1,method.h="GCV",weights, user.h=NULL,h.range=NULL,noh=10,k.knn=3, rel.gvar=0.95,eff.rank=NULL,maxiter=100,eps1=1e-10, eps2=1e-10) ## S3 method for class 'dist' ldbglm(dist1,dist2=dist1,y,family=gaussian(),kind.of.kernel=1, method.h="GCV",weights,user.h=quantile(dist1,.25), h.range=quantile(as.matrix(dist1),c(.05,.5)),noh=10,k.knn=3, rel.gvar=0.95,eff.rank=NULL,maxiter=100,eps1=1e-10,eps2=1e-10,...) ## S3 method for class 'D2' ldbglm(D2.1,D2.2=D2.1,y,family=gaussian(),kind.of.kernel=1, method.h="GCV",weights,user.h=quantile(D2.1,.25)^.5, h.range=quantile(as.matrix(D2.1),c(.05,.5))^.5,noh=10, k.knn=3,rel.gvar=0.95,eff.rank=NULL,maxiter=100,eps1=1e-10, eps2=1e-10,...) ## S3 method for class 'Gram' ldbglm(G1,G2=G1,y,kind.of.kernel=1,user.h=NULL, family=gaussian(),method.h="GCV",weights,h.range=NULL,noh=10, k.knn=3,rel.gvar=0.95,eff.rank=NULL,maxiter=100,eps1=1e-10, eps2=1e-10,...)
## S3 method for class 'formula' ldbglm(formula,data,...,family=gaussian(),kind.of.kernel=1, metric1="euclidean",metric2=metric1,method.h="GCV",weights, user.h=NULL,h.range=NULL,noh=10,k.knn=3, rel.gvar=0.95,eff.rank=NULL,maxiter=100,eps1=1e-10, eps2=1e-10) ## S3 method for class 'dist' ldbglm(dist1,dist2=dist1,y,family=gaussian(),kind.of.kernel=1, method.h="GCV",weights,user.h=quantile(dist1,.25), h.range=quantile(as.matrix(dist1),c(.05,.5)),noh=10,k.knn=3, rel.gvar=0.95,eff.rank=NULL,maxiter=100,eps1=1e-10,eps2=1e-10,...) ## S3 method for class 'D2' ldbglm(D2.1,D2.2=D2.1,y,family=gaussian(),kind.of.kernel=1, method.h="GCV",weights,user.h=quantile(D2.1,.25)^.5, h.range=quantile(as.matrix(D2.1),c(.05,.5))^.5,noh=10, k.knn=3,rel.gvar=0.95,eff.rank=NULL,maxiter=100,eps1=1e-10, eps2=1e-10,...) ## S3 method for class 'Gram' ldbglm(G1,G2=G1,y,kind.of.kernel=1,user.h=NULL, family=gaussian(),method.h="GCV",weights,h.range=NULL,noh=10, k.knn=3,rel.gvar=0.95,eff.rank=NULL,maxiter=100,eps1=1e-10, eps2=1e-10,...)
formula |
an object of class |
data |
an optional data frame containing the variables in the model (both response and explanatory variables, either the observed ones, Z, or a Euclidean configuration X). |
y |
(required if no formula is given as the principal argument). Response (dependent variable) must be numeric, matrix or data.frame. |
dist1 |
a |
dist2 |
a |
D2.1 |
a |
D2.2 |
a |
G1 |
a |
G2 |
a |
family |
a description of the error distribution and link function to be used
in the model. This can be a character string naming a family
function, a family function or the result of a call to a family function.
(See |
kind.of.kernel |
integer number between 1 and 6 which determines the user's choice of smoothing kernel. (1) Epanechnikov (Default), (2) Biweight, (3) Triweight, (4) Normal, (5) Triangular, (6) Uniform. |
metric1 |
metric function to be used when computing |
metric2 |
metric function to be used when computing |
method.h |
sets the method to be used in deciding the optimal bandwidth h.
There are four different methods, |
weights |
an optional numeric vector of weights to be used in the fitting process. By default all individuals have the same weight. |
user.h |
global bandwidth |
h.range |
a vector of length 2 giving the range for automatic bandwidth
choice. (Default: quantiles 0.05 and 0.5 of d(i,j) in |
noh |
number of bandwidth |
k.knn |
minimum number of observations with positive weight
in neighborhood localizing. To avoid runtime errors
due to a too small bandwidth originating neighborhoods
with only one observation. By default |
rel.gvar |
relative geometric variability (a real number between 0 and 1).
In each |
eff.rank |
integer between 1 and the number of observations minus one.
Number of Euclidean coordinates used for model fitting in
each |
maxiter |
maximum number of iterations in the iterated |
eps1 |
stopping criterion 1, |
eps2 |
stopping criterion 2, |
... |
arguments passed to or from other methods to the low level. |
The various possible ways for inputting the model explanatory
information through distances, or their squares, etc., are the
same as in dblm
.
The set of bandwidth h
values checked in automatic
bandwidth choice is defined by h.range
and noh
,
together with k.knn
. For each h
in it a local generalized
linear model is fitted and the optimal h
is decided according to the
statistic specified in method.h
.
kind.of.kernel
designates which kernel function is to be used
in determining individual weights from dist1
values.
See density
for more information.
For gamma distributions, the domain of the canonical link function
is not the same as the permitted range of the mean. In particular,
the linear predictor might be negative, obtaining an impossible
negative mean. Should that event occur, dbglm
stops with
an error message. Proposed alternative is to use a non-canonical link
function.
A list of class ldbglm
containing the following components:
residuals |
the residuals (response minus fitted values). |
fitted.values |
the fitted mean values. |
h.opt |
the optimal bandwidth |
family |
the |
y |
the response variable used. |
S |
the Smoother hat projector. |
weights |
the specified weights. |
call |
the matched call. |
dist1 |
the distance matrix (object of class |
dist2 |
the distance matrix (object of class |
Objects of class "ldbglm"
are actually of class
c("ldbglm", "ldblm")
, inheriting the plot.ldblm
and
summary.ldblm
method from class "ldblm"
.
Model fitting is repeated n
times (n=
number of observations)
for each bandwidth (noh*n
times).
For a noh
too large or a sample with many observations, the time of
this function can be very high.
Boj, Eva <[email protected]>, Caballe, Adria <[email protected]>, Delicado, Pedro <[email protected]> and Fortiana, Josep <[email protected]>
Boj E, Caballe, A., Delicado P, Esteve, A., Fortiana J (2016). Global and local distance-based generalized linear models. TEST 25, 170-195.
Boj E, Delicado P, Fortiana J (2010). Distance-based local linear regression for functional predictors. Computational Statistics and Data Analysis 54, 429-437.
Boj E, Grane A, Fortiana J, Claramunt MM (2007). Selection of predictors in distance-based regression. Communications in Statistics B - Simulation and Computation 36, 87-98.
Cuadras CM, Arenas C, Fortiana J (1996). Some computational aspects of a distance-based model for prediction. Communications in Statistics B - Simulation and Computation 25, 593-609.
Cuadras C, Arenas C (1990). A distance-based regression model for prediction with mixed data. Communications in Statistics A - Theory and Methods 19, 2261-2279.
Cuadras CM (1989). Distance analysis in discrimination and classification using both continuous and categorical variables. In: Y. Dodge (ed.), Statistical Data Analysis and Inference. Amsterdam, The Netherlands: North-Holland Publishing Co., pp. 459-473.
dbglm
for distance-based generalized linear models.ldblm
for local distance-based linear models.summary.ldbglm
for summary.plot.ldbglm
for plots.predict.ldbglm
for predictions.
# example of ldbglm usage z <- rnorm(100) y <- rbinom(100, 1, plogis(z)) D2 <- as.matrix(dist(z))^2 class(D2) <- "D2" # Distance-based generalized linear model dbglm2 <- dbglm(D2,y,family=binomial(link = "logit"), method="rel.gvar") # Local Distance-based generalized linear model ldbglm2 <- ldbglm(D2,y=y,family=binomial(link = "logit"),noh=3) # check the difference of both sum((y-ldbglm2$fit)^2) sum((y-dbglm2$fit)^2) plot(z,y) points(z,ldbglm2$fit,col=3) points(z,dbglm2$fit,col=2)
# example of ldbglm usage z <- rnorm(100) y <- rbinom(100, 1, plogis(z)) D2 <- as.matrix(dist(z))^2 class(D2) <- "D2" # Distance-based generalized linear model dbglm2 <- dbglm(D2,y,family=binomial(link = "logit"), method="rel.gvar") # Local Distance-based generalized linear model ldbglm2 <- ldbglm(D2,y=y,family=binomial(link = "logit"),noh=3) # check the difference of both sum((y-ldbglm2$fit)^2) sum((y-dbglm2$fit)^2) plot(z,y) points(z,ldbglm2$fit,col=3) points(z,dbglm2$fit,col=2)
ldblm
is a localized version of a distance-based linear model.
As in the global model dblm
, explanatory information is coded as
distances between individuals.
Neighborhood definition for localizing is done by the (semi)metric
dist1
whereas a second (semi)metric dist2
(which may coincide
with dist1
) is used for distance-based prediction.
Both dist1
and dist2
can either be computed from observed
explanatory variables or directly input as a squared distances
matrix or as a Gram
matrix. The response is a continuous variable
as in the ordinary linear model. The model allows for a mixture of
continuous and qualitative explanatory variables or, in fact, from more
general quantities such as functional data.
Notation convention: in distance-based methods we must distinguish observed explanatory variables which we denote by Z or z, from Euclidean coordinates which we denote by X or x. For explanation on the meaning of both terms see the bibliography references below.
## S3 method for class 'formula' ldblm(formula,data,...,kind.of.kernel=1, metric1="euclidean",metric2=metric1,method.h="GCV",weights, user.h=NULL,h.range=NULL,noh=10,k.knn=3,rel.gvar=0.95,eff.rank=NULL) ## S3 method for class 'dist' ldblm(dist1,dist2=dist1,y,kind.of.kernel=1, method.h="GCV",weights,user.h=quantile(dist1,.25), h.range=quantile(as.matrix(dist1),c(.05,.5)),noh=10, k.knn=3,rel.gvar=0.95,eff.rank=NULL,...) ## S3 method for class 'D2' ldblm(D2.1,D2.2=D2.1,y,kind.of.kernel=1,method.h="GCV", weights,user.h=quantile(D2.1,.25)^.5, h.range=quantile(as.matrix(D2.1),c(.05,.5))^.5,noh=10,k.knn=3, rel.gvar=0.95,eff.rank=NULL,...) ## S3 method for class 'Gram' ldblm(G1,G2=G1,y,kind.of.kernel=1,method.h="GCV", weights,user.h=NULL,h.range=NULL,noh=10,k.knn=3,rel.gvar=0.95, eff.rank=NULL,...)
## S3 method for class 'formula' ldblm(formula,data,...,kind.of.kernel=1, metric1="euclidean",metric2=metric1,method.h="GCV",weights, user.h=NULL,h.range=NULL,noh=10,k.knn=3,rel.gvar=0.95,eff.rank=NULL) ## S3 method for class 'dist' ldblm(dist1,dist2=dist1,y,kind.of.kernel=1, method.h="GCV",weights,user.h=quantile(dist1,.25), h.range=quantile(as.matrix(dist1),c(.05,.5)),noh=10, k.knn=3,rel.gvar=0.95,eff.rank=NULL,...) ## S3 method for class 'D2' ldblm(D2.1,D2.2=D2.1,y,kind.of.kernel=1,method.h="GCV", weights,user.h=quantile(D2.1,.25)^.5, h.range=quantile(as.matrix(D2.1),c(.05,.5))^.5,noh=10,k.knn=3, rel.gvar=0.95,eff.rank=NULL,...) ## S3 method for class 'Gram' ldblm(G1,G2=G1,y,kind.of.kernel=1,method.h="GCV", weights,user.h=NULL,h.range=NULL,noh=10,k.knn=3,rel.gvar=0.95, eff.rank=NULL,...)
formula |
an object of class |
data |
an optional data frame containing the variables in the model (both response and explanatory variables, either the observed ones, Z, or a Euclidean configuration X). |
y |
(required if no formula is given as the principal argument). Response (dependent variable) must be numeric, matrix or data.frame. |
dist1 |
a |
dist2 |
a |
D2.1 |
a |
D2.2 |
a |
G1 |
a |
G2 |
a |
kind.of.kernel |
integer number between 1 and 6 which determines the user's choice of smoothing kernel. (1) Epanechnikov (Default), (2) Biweight, (3) Triweight, (4) Normal, (5) Triangular, (6) Uniform. |
metric1 |
metric function to be used when computing |
metric2 |
metric function to be used when computing |
method.h |
sets the method to be used in deciding the optimal bandwidth h.
There are five different methods, |
weights |
an optional numeric vector of weights to be used in the fitting process. By default all individuals have the same weight. |
user.h |
global bandwidth |
h.range |
a vector of length 2 giving the range for automatic bandwidth
choice. (Default: quantiles 0.05 and 0.5 of d(i,j) in |
noh |
number of bandwidth |
k.knn |
minimum number of observations with positive weight
in neighborhood localizing. To avoid runtime errors
due to a too small bandwidth originating neighborhoods
with only one observation. By default |
rel.gvar |
relative geometric variability (a real number between 0 and 1).
In each |
eff.rank |
integer between 1 and the number of observations minus one.
Number of Euclidean coordinates used for model fitting in
each |
... |
arguments passed to or from other methods to the low level. |
There are two semi-metrics involved in local linear distance-based estimation:
dist1
and dist2
. Both semi-metrics can coincide.
For instance, when dist1=||xi-xj||
and
dist2=||(xi,xi^2,xi^3)-(xj,xj^2,xj^3)||
the estimator
for new observations coincides with fitting a local cubic polynomial
regression.
The set of bandwidth h
values checked in automatic
bandwidth choice is defined by h.range
and noh
,
together with k.knn
. For each h
in it a local linear
model is fitted and the optimal h
is decided according to the
statistic specified in method.h
.
kind.of.kernel
designates which kernel function is to be used
in determining individual weights from dist1
values.
See density
for more information.
A list of class ldblm
containing the following components:
residuals |
the residuals (response minus fitted values). |
fitted.values |
the fitted mean values. |
h.opt |
the optimal bandwidth h used in the fitting proces
( |
S |
the Smoother hat projector. |
weights |
the specified weights. |
y |
the response variable used. |
call |
the matched call. |
dist1 |
the distance matrix (object of class |
dist2 |
the distance matrix (object of class |
Model fitting is repeated n
times (n=
number of observations)
for each bandwidth (noh*n
times).
For a noh
too large or a sample with many observations, the time of
this function can be very high.
Boj, Eva <[email protected]>, Caballe, Adria <[email protected]>, Delicado, Pedro <[email protected]> and Fortiana, Josep <[email protected]>
Boj E, Caballe, A., Delicado P, Esteve, A., Fortiana J (2016). Global and local distance-based generalized linear models. TEST 25, 170-195.
Boj E, Delicado P, Fortiana J (2010). Distance-based local linear regression for functional predictors. Computational Statistics and Data Analysis 54, 429-437.
Boj E, Grane A, Fortiana J, Claramunt MM (2007). Selection of predictors in distance-based regression. Communications in Statistics B - Simulation and Computation 36, 87-98.
Cuadras CM, Arenas C, Fortiana J (1996). Some computational aspects of a distance-based model for prediction. Communications in Statistics B - Simulation and Computation 25, 593-609.
Cuadras C, Arenas C (1990). A distance-based regression model for prediction with mixed data. Communications in Statistics A - Theory and Methods 19, 2261-2279.
Cuadras CM (1989). Distance analysis in discrimination and classification using both continuous and categorical variables. In: Y. Dodge (ed.), Statistical Data Analysis and Inference. Amsterdam, The Netherlands: North-Holland Publishing Co., pp. 459-473.
dblm
for distance-based linear models.ldbglm
for local distance-based generalized linear models.summary.ldblm
for summary.plot.ldblm
for plots.predict.ldblm
for predictions.
# example to use of the ldblm function n <- 100 p <- 1 k <- 5 Z <- matrix(rnorm(n*p),nrow=n) b1 <- matrix(runif(p)*k,nrow=p) b2 <- matrix(runif(p)*k,nrow=p) b3 <- matrix(runif(p)*k,nrow=p) s <- 1 e <- rnorm(n)*s y <- Z%*%b1 + Z^2%*%b2 +Z^3%*%b3 + e D2 <- as.matrix(dist(Z)^2) class(D2) <- "D2" ldblm1 <- ldblm(y~Z,kind.of.kernel=1,method="GCV",noh=3,k.knn=3) ldblm2 <- ldblm(D2.1=D2,D2.2=D2,y,kind.of.kernel=1,method="user.h",k.knn=3)
# example to use of the ldblm function n <- 100 p <- 1 k <- 5 Z <- matrix(rnorm(n*p),nrow=n) b1 <- matrix(runif(p)*k,nrow=p) b2 <- matrix(runif(p)*k,nrow=p) b3 <- matrix(runif(p)*k,nrow=p) s <- 1 e <- rnorm(n)*s y <- Z%*%b1 + Z^2%*%b2 +Z^3%*%b3 + e D2 <- as.matrix(dist(Z)^2) class(D2) <- "D2" ldblm1 <- ldblm(y~Z,kind.of.kernel=1,method="GCV",noh=3,k.knn=3) ldblm2 <- ldblm(D2.1=D2,D2.2=D2,y,kind.of.kernel=1,method="user.h",k.knn=3)
Six plots (selected by which
) are available: a plot of residual vs
fitted values, the Q-Qplot of normality, a Scale-Location plot of
sqrt(|residuals|)
against fitted values. A plot of Cook's distances
versus row labels, a plot of residuals against leverages, and the optimal
effective rank of "OCV"
, "GCV"
, "AIC"
or "BIC"
method (only if one of these four methods have been chosen in function dblm
).
By default, only the first three and 5
are provided.
## S3 method for class 'dblm' plot(x,which=c(1:3, 5),id.n=3,main="", cook.levels = c(0.5, 1),cex.id = 0.75, type.pred=c("link","response"),...)
## S3 method for class 'dblm' plot(x,which=c(1:3, 5),id.n=3,main="", cook.levels = c(0.5, 1),cex.id = 0.75, type.pred=c("link","response"),...)
x |
|
which |
if a subset of the plots is required, specify a subset of the numbers 1:6. |
id.n |
number of points to be labelled in each plot, starting with the most extreme. |
main |
an overall title for the plot. Only if one of the six plots is selected. |
cook.levels |
levels of Cook's distance at which to draw contours. |
cex.id |
magnification of point labels. |
type.pred |
the type of prediction (required only for a |
... |
other parameters to be passed through to plotting functions. |
The five first plots are very useful to the residual analysis and are
the same that plot.lm
. A plot of residuals against fitted
values sees if the variance is constant. The qq-plot checks if the residuals
are normal (see qqnorm
).
The plot between "Scale-Location"
and the fitted values takes the
square root of the absolute residuals in order to diminish skewness.
The Cook's distance against the row labels, measures the effect of deleting a
given observation (estimate of the influence of a data point). Points with a
large Cook's distance are considered to merit closer examination in the analysis.
Finally, the Residual-Leverage plot also shows the most influence points
(labelled by Cook's distance). See cooks.distance
.
The last plot, allows to view the "OCV"
(just for dblm
), "GCV"
, "AIC"
or "BIC"
criterion according to the used rank in the
dblm
or dbglm
functions, and chosen the minimum. Applies only if
the parameter full.search
its TRUE
.
Boj, Eva <[email protected]>, Caballe, Adria <[email protected]>, Delicado, Pedro <[email protected]> and Fortiana, Josep <[email protected]>
Boj E, Delicado P, Fortiana J (2010). Distance-based local linear regression for functional predictors. Computational Statistics and Data Analysis 54, 429-437.
Boj E, Grane A, Fortiana J, Claramunt MM (2007). Selection of predictors in distance-based regression. Communications in Statistics B - Simulation and Computation 36, 87-98.
Cuadras CM, Arenas C, Fortiana J (1996). Some computational aspects of a distance-based model for prediction. Communications in Statistics B - Simulation and Computation 25, 593-609.
Cuadras C, Arenas C (1990). A distance-based regression model for prediction with mixed data. Communications in Statistics A - Theory and Methods 19, 2261-2279.
Cuadras CM (1989). Distance analysis in discrimination and classification using both continuous and categorical variables. In: Y. Dodge (ed.), Statistical Data Analysis and Inference. Amsterdam, The Netherlands: North-Holland Publishing Co., pp. 459-473.
Belsley, D. A., Kuh, E. and Welsch, R. E. (1980). Regression Diagnostics. New York: Wiley.
dblm
for distance-based linear models.dbglm
for distance-based generalized linear models.
n <- 64 p <- 4 k <- 3 Z <- matrix(rnorm(n*p),nrow=n) b <- matrix(runif(p)*k,nrow=p) s <- 1 e <- rnorm(n)*s y <- Z%*%b + e dblm1 <- dblm(y~Z,metric="gower",method="GCV", full.search=FALSE) plot(dblm1) plot(dblm1,which=4)
n <- 64 p <- 4 k <- 3 Z <- matrix(rnorm(n*p),nrow=n) b <- matrix(runif(p)*k,nrow=p) s <- 1 e <- rnorm(n)*s y <- Z%*%b + e dblm1 <- dblm(y~Z,metric="gower",method="GCV", full.search=FALSE) plot(dblm1) plot(dblm1,which=4)
Four plots (selected by which
) are available: plot of scores,
response vs scores, R2 contribution in each component and the value of
"OCV"
, "GCV"
, "AIC"
or "BIC"
vs the number
of component chosen.
## S3 method for class 'dbplsr' plot(x,which=c(1L:4L),main="",scores.comps=1:2, component=1,method=c("OCV","GCV","AIC","BIC"),...)
## S3 method for class 'dbplsr' plot(x,which=c(1L:4L),main="",scores.comps=1:2, component=1,method=c("OCV","GCV","AIC","BIC"),...)
x |
an object of class |
which |
if a subset of the plots is required, specify a subset of the numbers 1:4. |
main |
an overall title for the plot. Only if one of the four plots is selected. |
scores.comps |
array containing the component scores crossed in the first plot (default the first two). |
component |
numeric value. Component vs response in the second plot (Default the first component). |
method |
choosen method |
... |
other parameters to be passed through to plotting functions. |
Boj, Eva <[email protected]>, Caballe, Adria <[email protected]>, Delicado, Pedro <[email protected]> and Fortiana, Josep <[email protected]>
Boj E, Delicado P, Fortiana J (2010). Distance-based local linear regression for functional predictors. Computational Statistics and Data Analysis 54, 429-437.
Boj E, Grane A, Fortiana J, Claramunt MM (2007). Implementing PLS for distance-based regression: computational issues. Computational Statistics 22, 237-248.
Boj E, Grane A, Fortiana J, Claramunt MM (2007). Selection of predictors in distance-based regression. Communications in Statistics B - Simulation and Computation 36, 87-98.
Cuadras CM, Arenas C, Fortiana J (1996). Some computational aspects of a distance-based model for prediction. Communications in Statistics B - Simulation and Computation 25, 593-609.
Cuadras C, Arenas C (1990). A distance-based regression model for prediction with mixed data. Communications in Statistics A - Theory and Methods 19, 2261-2279.
Cuadras CM (1989). Distance analysis in discrimination and classification using both continuous and categorical variables. In: Y. Dodge (ed.), Statistical Data Analysis and Inference. Amsterdam, The Netherlands: North-Holland Publishing Co., pp. 459-473.
Belsley, D. A., Kuh, E. and Welsch, R. E. (1980). Regression Diagnostics. New York: Wiley.
dbplsr
for distance-based partial least squares.
#require(pls) library(pls) data(yarn) ## Default methods: yarn.dbplsr <- dbplsr(density ~ NIR, data = yarn, ncomp=6, method="GCV") plot(yarn.dbplsr,scores.comps=1:3)
#require(pls) library(pls) data(yarn) ## Default methods: yarn.dbplsr <- dbplsr(density ~ NIR, data = yarn, ncomp=6, method="GCV") plot(yarn.dbplsr,scores.comps=1:3)
Three plots (selected by which
) are available: a plot of
fitted values vs response, a plot of residuals vs fitted and the
optimal bandwidth h
of "OCV"
, "GCV"
, "AIC"
or
"BIC"
criterion (only if one of these four methods have been chosen
in the ldblm
function). By default, only the first and the second
are provided.
## S3 method for class 'ldblm' plot(x,which=c(1,2),id.n=3,main="",...)
## S3 method for class 'ldblm' plot(x,which=c(1,2),id.n=3,main="",...)
x |
|
which |
if a subset of the plots is required, specify a subset of the numbers 1:3. |
id.n |
number of points to be labelled in each plot, starting with the most extreme. |
main |
an overall title for the plot. Only if one of the three plots is selected. |
... |
other parameters to be passed through to plotting functions. |
Boj, Eva <[email protected]>, Caballe, Adria <[email protected]>, Delicado, Pedro <[email protected]> and Fortiana, Josep <[email protected]>
Boj E, Delicado P, Fortiana J (2010). Distance-based local linear regression for functional predictors. Computational Statistics and Data Analysis 54, 429-437.
Boj E, Grane A, Fortiana J, Claramunt MM (2007). Selection of predictors in distance-based regression. Communications in Statistics B - Simulation and Computation 36, 87-98.
Cuadras CM, Arenas C, Fortiana J (1996). Some computational aspects of a distance-based model for prediction. Communications in Statistics B - Simulation and Computation 25, 593-609.
Cuadras C, Arenas C (1990). A distance-based regression model for prediction with mixed data. Communications in Statistics A - Theory and Methods 19, 2261-2279.
Cuadras CM (1989). Distance analysis in discrimination and classification using both continuous and categorical variables. In: Y. Dodge (ed.), Statistical Data Analysis and Inference. Amsterdam, The Netherlands: North-Holland Publishing Co., pp. 459-473.
Belsley, D. A., Kuh, E. and Welsch, R. E. (1980). Regression Diagnostics. New York: Wiley.
ldblm
for local distance-based linear models.ldbglm
for local distance-based generalized linear models.
# example to use of the ldblm function n <- 100 p <- 1 k <- 5 Z <- matrix(rnorm(n*p),nrow=n) b1 <- matrix(runif(p)*k,nrow=p) b2 <- matrix(runif(p)*k,nrow=p) b3 <- matrix(runif(p)*k,nrow=p) s <- 1 e <- rnorm(n)*s y <- Z%*%b1 + Z^2%*%b2 +Z^3%*%b3 + e D2 <- as.matrix(dist(Z))^2 class(D2) <- "D2" ldblm1 <- ldblm(D2,y=y,kind.of.kernel=1,method.h="AIC",noh=5,h.knn=NULL) plot(ldblm1) plot(ldblm1,which=3)
# example to use of the ldblm function n <- 100 p <- 1 k <- 5 Z <- matrix(rnorm(n*p),nrow=n) b1 <- matrix(runif(p)*k,nrow=p) b2 <- matrix(runif(p)*k,nrow=p) b3 <- matrix(runif(p)*k,nrow=p) s <- 1 e <- rnorm(n)*s y <- Z%*%b1 + Z^2%*%b2 +Z^3%*%b3 + e D2 <- as.matrix(dist(Z))^2 class(D2) <- "D2" ldblm1 <- ldblm(D2,y=y,kind.of.kernel=1,method.h="AIC",noh=5,h.knn=NULL) plot(ldblm1) plot(ldblm1,which=3)
predict.dbglm
returns the predicted values, obtained by tested the
generalized distance regression function in the new data (newdata
).
## S3 method for class 'dbglm' predict(object,newdata,type.pred=c("link", "response"), type.var="Z",...)
## S3 method for class 'dbglm' predict(object,newdata,type.pred=c("link", "response"), type.var="Z",...)
object |
an object of class |
newdata |
data.frame or matrix which contains the values of Z (if |
type.pred |
the type of prediction (required). The default |
type.var |
set de type of newdata. Can be |
... |
arguments passed to or from other methods to the low level. |
The predicted values may be the expected mean values of response for the new
data (type.pred="response"
), or the linear predictors evaluated in the
estimated dblm
of the last iteration.
In classical linear models the mean and the linear predictor are the same
(makes use of the identity link). However, other distributions such as
Poisson or binomial, the link could change. It's easy to get the predicted
mean values, as these are calculated by the inverse link of linear predictors.
See family
to view how to use linkfun
and linkinv
.
predict.dbglm
produces a vector of predictions for the k new individuals.
Look at which way (or type.var
) was made the dbglm call.
The parameter type.var
must be consistent with the data type
that is introduced to dbglm
.
Boj, Eva <[email protected]>, Caballe, Adria <[email protected]>, Delicado, Pedro <[email protected]> and Fortiana, Josep <[email protected]>
Boj E, Delicado P, Fortiana J (2010). Distance-based local linear regression for functional predictors. Computational Statistics and Data Analysis 54, 429-437.
Boj E, Grane A, Fortiana J, Claramunt MM (2007). Selection of predictors in distance-based regression. Communications in Statistics B - Simulation and Computation 36, 87-98.
Cuadras CM, Arenas C, Fortiana J (1996). Some computational aspects of a distance-based model for prediction. Communications in Statistics B - Simulation and Computation 25, 593-609.
Cuadras C, Arenas C (1990). A distance-based regression model for prediction with mixed data. Communications in Statistics A - Theory and Methods 19, 2261-2279.
Cuadras CM (1989). Distance analysis in discrimination and classification using both continuous and categorical variables. In: Y. Dodge (ed.), Statistical Data Analysis and Inference. Amsterdam, The Netherlands: North-Holland Publishing Co., pp. 459-473.
dbglm
for distance-based generalized linear models.
z <- rnorm(100) y <- rpois(100, exp(1+z)) glm1 <- glm(y ~z, family=quasi("identity")) dbglm1 <- dbglm(y~z,family=quasi("identity"), method="rel.gvar") newdata<-0 pr1 <- predict(dbglm1,newdata,type.pred="response",type.var="Z") print(pr1) plot(z,y) points(z,dbglm1$fitt,col=2) points(0,pr1,col=2) abline(v=0,col=2) abline(h=pr1,col=2)
z <- rnorm(100) y <- rpois(100, exp(1+z)) glm1 <- glm(y ~z, family=quasi("identity")) dbglm1 <- dbglm(y~z,family=quasi("identity"), method="rel.gvar") newdata<-0 pr1 <- predict(dbglm1,newdata,type.pred="response",type.var="Z") print(pr1) plot(z,y) points(z,dbglm1$fitt,col=2) points(0,pr1,col=2) abline(v=0,col=2) abline(h=pr1,col=2)
predict.dblm
returns the predicted values, obtained by evaluating the
distance regression function in the new data (newdata
).
newdata
can be the values of the explanatory variables of these new
cases, the squared distances between these new individuals
and the originals ones, or rows of new doubly weighted and centered inner
products matrix G.
## S3 method for class 'dblm' predict(object,newdata,type.var="Z",...)
## S3 method for class 'dblm' predict(object,newdata,type.var="Z",...)
object |
an object of class |
newdata |
data.frame or matrix which contains the values of Z (if |
type.var |
set de type.var of newdata. Can be |
... |
arguments passed to or from other methods to the low level. |
predict.dblm
produces a vector of predictions for the k new individuals.
Look at which way (or type.var
) was made the dblm
call.
The parameter type.var
must be consistent with the data type that is
introduced to dblm
.
Boj, Eva <[email protected]>, Caballe, Adria <[email protected]>, Delicado, Pedro <[email protected]> and Fortiana, Josep <[email protected]>
Boj E, Delicado P, Fortiana J (2010). Distance-based local linear regression for functional predictors. Computational Statistics and Data Analysis 54, 429-437.
Boj E, Grane A, Fortiana J, Claramunt MM (2007). Selection of predictors in distance-based regression. Communications in Statistics B - Simulation and Computation 36, 87-98.
Cuadras CM, Arenas C, Fortiana J (1996). Some computational aspects of a distance-based model for prediction. Communications in Statistics B - Simulation and Computation 25, 593-609.
Cuadras C, Arenas C (1990). A distance-based regression model for prediction with mixed data. Communications in Statistics A - Theory and Methods 19, 2261-2279.
Cuadras CM (1989). Distance analysis in discrimination and classification using both continuous and categorical variables. In: Y. Dodge (ed.), Statistical Data Analysis and Inference. Amsterdam, The Netherlands: North-Holland Publishing Co., pp. 459-473.
dblm
for distance-based linear models.
# prediction of new observations newdata n <- 100 p <- 3 k <- 5 Z <- matrix(rnorm(n*p),nrow=n) b <- matrix(runif(p)*k,nrow=p) s <- 1 e <- rnorm(n)*s y <- Z%*%b + e D <- dist(Z) D2 <- disttoD2(D) D2_train <- D2[1:90,1:90] class(D2_train)<-"D2" dblm1 <- dblm(D2_train,y[1:90]) newdata <- D2[91:100,1:90] predict(dblm1,newdata,type.var="D2")
# prediction of new observations newdata n <- 100 p <- 3 k <- 5 Z <- matrix(rnorm(n*p),nrow=n) b <- matrix(runif(p)*k,nrow=p) s <- 1 e <- rnorm(n)*s y <- Z%*%b + e D <- dist(Z) D2 <- disttoD2(D) D2_train <- D2[1:90,1:90] class(D2_train)<-"D2" dblm1 <- dblm(D2_train,y[1:90]) newdata <- D2[91:100,1:90] predict(dblm1,newdata,type.var="D2")
predict.dbplsr
returns the predicted values, obtained by evaluating the
Distance-based partial least squares function in the new data (newdata
).
newdata
can be the values of the explanatory variables of these new
cases, the squared distances between these new individuals
and the originals ones, or rows of new doubly weighted and centered inner
products matrix G.
## S3 method for class 'dbplsr' predict(object,newdata,type.var="Z",...)
## S3 method for class 'dbplsr' predict(object,newdata,type.var="Z",...)
object |
an object of class |
newdata |
data.frame or matrix which contains the values of Z (if |
type.var |
set de type of newdata. Can be |
... |
arguments passed to or from other methods to the low level. |
predict.dbplsr
produces a vector of predictions for the k new individuals.
Look at which way (or type.var
) was made the dbplsr
call.
The parameter type.var
must be consistent with the data type that is
introduced to dbplsr
.
Boj, Eva <[email protected]>, Caballe, Adria <[email protected]>, Delicado, Pedro <[email protected]> and Fortiana, Josep <[email protected]>
Boj E, Delicado P, Fortiana J (2010). Distance-based local linear regression for functional predictors. Computational Statistics and Data Analysis 54, 429-437.
Boj E, Grane A, Fortiana J, Claramunt MM (2007). Implementing PLS for distance-based regression: computational issues. Computational Statistics 22, 237-248.
Boj E, Grane A, Fortiana J, Claramunt MM (2007). Selection of predictors in distance-based regression. Communications in Statistics B - Simulation and Computation 36, 87-98.
Cuadras CM, Arenas C, Fortiana J (1996). Some computational aspects of a distance-based model for prediction. Communications in Statistics B - Simulation and Computation 25, 593-609.
Cuadras C, Arenas C (1990). A distance-based regression model for prediction with mixed data. Communications in Statistics A - Theory and Methods 19, 2261-2279.
Cuadras CM (1989). Distance analysis in discrimination and classification using both continuous and categorical variables. In: Y. Dodge (ed.), Statistical Data Analysis and Inference. Amsterdam, The Netherlands: North-Holland Publishing Co., pp. 459-473.
dbplsr
for distance-based partial least squares.
#require(pls) # prediction of new observations newdata library(pls) data(yarn) ## Default methods: yarn.dbplsr <- dbplsr(density[1:27] ~ NIR[1:27,], data = yarn, ncomp=6, method="GCV") pr_yarn_28 <- predict(yarn.dbplsr,newdata=t(as.matrix(yarn$NIR[28,]))) print(pr_yarn_28) print(yarn$density[28])
#require(pls) # prediction of new observations newdata library(pls) data(yarn) ## Default methods: yarn.dbplsr <- dbplsr(density[1:27] ~ NIR[1:27,], data = yarn, ncomp=6, method="GCV") pr_yarn_28 <- predict(yarn.dbplsr,newdata=t(as.matrix(yarn$NIR[28,]))) print(pr_yarn_28) print(yarn$density[28])
predict.ldbglm
returns the predicted values, obtained by evaluating
the local distance-based generalized linear model in the new data
(newdata2
), using newdata1
to estimate the "kernel weights".
## S3 method for class 'ldbglm' predict(object,newdata1,newdata2=newdata1, new.k.knn=3,type.pred=c("link","response"), type.var="Z",...)
## S3 method for class 'ldbglm' predict(object,newdata1,newdata2=newdata1, new.k.knn=3,type.pred=c("link","response"), type.var="Z",...)
object |
an object of class |
newdata1 |
data.frame or matrix which contains the values of Z (if |
newdata2 |
the same logic as |
new.k.knn |
setting a minimum bandwidth in order to check that a candidate bandwidth h
doesn't contains DB linear models with only one observation.
If |
type.pred |
the type of prediction (required). The default |
type.var |
set de type of the newdata paramater. Can be |
... |
arguments passed to or from other methods to the low level. |
A list of class predict.ldbglm
containing the following components:
fit |
predicted values for the k new individuals. |
newS |
matrix (with dimension (k,n)) of weights used to compute the predictions. |
Look at which way (or type.var
) was made the ldbglm
call. The parameter
type.var
must be consistent with the data type that is introduced
to ldbglm
.
Boj, Eva <[email protected]>, Caballe, Adria <[email protected]>, Delicado, Pedro <[email protected]> and Fortiana, Josep <[email protected]>
Boj E, Delicado P, Fortiana J (2010). Distance-based local linear regression for functional predictors. Computational Statistics and Data Analysis 54, 429-437.
Boj E, Grane A, Fortiana J, Claramunt MM (2007). Selection of predictors in distance-based regression. Communications in Statistics B - Simulation and Computation 36, 87-98.
Cuadras CM, Arenas C, Fortiana J (1996). Some computational aspects of a distance-based model for prediction. Communications in Statistics B - Simulation and Computation 25, 593-609.
Cuadras C, Arenas C (1990). A distance-based regression model for prediction with mixed data. Communications in Statistics A - Theory and Methods 19, 2261-2279.
Cuadras CM (1989). Distance analysis in discrimination and classification using both continuous and categorical variables. In: Y. Dodge (ed.), Statistical Data Analysis and Inference. Amsterdam, The Netherlands: North-Holland Publishing Co., pp. 459-473.
ldbglm
for local distance-based generalized linear models.
# example to use of the predict.ldbglm function z <- rnorm(100) y <- rpois(100, exp(1+z)) glm5 <- glm(y ~z, family=quasi("identity")) ldbglm5 <- ldbglm(dist(z),y=y,family=quasi("identity"),noh=3) plot(z,y) points(z,glm5$fitt,col=2) points(z,ldbglm5$fitt,col=3) pr_ldbglm5 <- predict(ldbglm5,as.matrix(dist(z)^2),type.pred="response",type.var="D2") max(pr_ldbglm5$fit-ldbglm5$fitt)
# example to use of the predict.ldbglm function z <- rnorm(100) y <- rpois(100, exp(1+z)) glm5 <- glm(y ~z, family=quasi("identity")) ldbglm5 <- ldbglm(dist(z),y=y,family=quasi("identity"),noh=3) plot(z,y) points(z,glm5$fitt,col=2) points(z,ldbglm5$fitt,col=3) pr_ldbglm5 <- predict(ldbglm5,as.matrix(dist(z)^2),type.pred="response",type.var="D2") max(pr_ldbglm5$fit-ldbglm5$fitt)
predict.ldblm
returns the predicted values, obtained by evaluating
the local distance-based linear model in the new data
(newdata2
), using newdata1
to estimate the "kernel weights".
## S3 method for class 'ldblm' predict(object,newdata1,newdata2=newdata1, new.k.knn=3,type.var="Z",...)
## S3 method for class 'ldblm' predict(object,newdata1,newdata2=newdata1, new.k.knn=3,type.var="Z",...)
object |
an object of class |
newdata1 |
data.frame or matrix which contains the values of Z (if |
newdata2 |
the same logic as |
new.k.knn |
setting a minimum bandwidth in order to check that a candidate bandwidth h
doesn't contains DB linear models with only one observation.
If |
type.var |
set de type of the newdata paramater. Can be |
... |
arguments passed to or from other methods to the low level. |
A list of class predict.ldblm
containing the following components:
fit |
predicted values for the k new individuals. |
newS |
matrix (with dimension (k,n)) of weights used to compute the predictions. |
Look at which way (or type.var
) was made the ldblm
call.
The parameter type.var
must be consistent with the data type that
is introduced to ldblm
.
Boj, Eva <[email protected]>, Caballe, Adria <[email protected]>, Delicado, Pedro <[email protected]> and Fortiana, Josep <[email protected]>
Boj E, Delicado P, Fortiana J (2010). Distance-based local linear regression for functional predictors. Computational Statistics and Data Analysis 54, 429-437.
Boj E, Grane A, Fortiana J, Claramunt MM (2007). Selection of predictors in distance-based regression. Communications in Statistics B - Simulation and Computation 36, 87-98.
Cuadras CM, Arenas C, Fortiana J (1996). Some computational aspects of a distance-based model for prediction. Communications in Statistics B - Simulation and Computation 25, 593-609.
Cuadras C, Arenas C (1990). A distance-based regression model for prediction with mixed data. Communications in Statistics A - Theory and Methods 19, 2261-2279.
Cuadras CM (1989). Distance analysis in discrimination and classification using both continuous and categorical variables. In: Y. Dodge (ed.), Statistical Data Analysis and Inference. Amsterdam, The Netherlands: North-Holland Publishing Co., pp. 459-473.
ldblm
for local distance-based linear models.
# example to use of the predict.ldblm function n <- 100 p <- 1 k <- 5 Z <- matrix(rnorm(n*p),nrow=n) b1 <- matrix(runif(p)*k,nrow=p) b2 <- matrix(runif(p)*k,nrow=p) b3 <- matrix(runif(p)*k,nrow=p) s <- 1 e <- rnorm(n)*s y <- Z%*%b1 + Z^2%*%b2 +Z^3%*%b3 + e D <- as.matrix(dist(Z)) D2 <- D^2 newdata1 <- 0 ldblm1 <- ldblm(y~Z,kind.of.kernel=1,method="GCV",noh=3,k.knn=3) pr1 <- predict(ldblm1,newdata1) print(pr1) plot(Z,y) points(0,pr1$fit,col=2) abline(v=0,col=2) abline(h=pr1$fit,col=2)
# example to use of the predict.ldblm function n <- 100 p <- 1 k <- 5 Z <- matrix(rnorm(n*p),nrow=n) b1 <- matrix(runif(p)*k,nrow=p) b2 <- matrix(runif(p)*k,nrow=p) b3 <- matrix(runif(p)*k,nrow=p) s <- 1 e <- rnorm(n)*s y <- Z%*%b1 + Z^2%*%b2 +Z^3%*%b3 + e D <- as.matrix(dist(Z)) D2 <- D^2 newdata1 <- 0 ldblm1 <- ldblm(y~Z,kind.of.kernel=1,method="GCV",noh=3,k.knn=3) pr1 <- predict(ldblm1,newdata1) print(pr1) plot(Z,y) points(0,pr1$fit,col=2) abline(v=0,col=2) abline(h=pr1$fit,col=2)
summary
method for class "dbglm"
## S3 method for class 'dbglm' summary(object,dispersion,...)
## S3 method for class 'dbglm' summary(object,dispersion,...)
object |
an object of class |
dispersion |
the dispersion parameter for the family used.
Either a single numerical value or |
... |
arguments passed to or from other methods to the low level. |
A list of class summary.dbglm
containing the following components:
call |
the matched call. |
family |
the |
deviance |
measure of discrepancy or goodness of fitt. Proportional to twice the difference between the maximum log likelihood achievable and that achieved by the model under investigation. |
aic |
Akaike's An Information Criterion. |
df.residual |
the residual degrees of freedom. |
null.deviance |
the deviance for the null model. |
df.null |
the residual degrees of freedom for the null model. |
iter |
number of Fisher Scoring ( |
deviance.resid |
the deviance residuals for each observation: sign(y-mu)*sqrt(di). |
pears.resid |
the raw residual scaled by the estimated standard
deviation of |
dispersion |
the dispersion is taken as 1 for the binomial and Poisson families, and otherwise estimated by the residual Chisquared statistic (calculated from cases with non-zero weights) divided by the residual degrees of freedom. |
gvar |
weighted geometric variability of the squared distance matrix. |
gvec |
diagonal entries in weighted inner products matrix G. |
convcrit |
convergence criterion. One of: |
Boj, Eva <[email protected]>, Caballe, Adria <[email protected]>, Delicado, Pedro <[email protected]> and Fortiana, Josep <[email protected]>
Boj E, Delicado P, Fortiana J (2010). Distance-based local linear regression for functional predictors. Computational Statistics and Data Analysis 54, 429-437.
Boj E, Grane A, Fortiana J, Claramunt MM (2007). Selection of predictors in distance-based regression. Communications in Statistics B - Simulation and Computation 36, 87-98.
Cuadras CM, Arenas C, Fortiana J (1996). Some computational aspects of a distance-based model for prediction. Communications in Statistics B - Simulation and Computation 25, 593-609.
Cuadras C, Arenas C (1990). A distance-based regression model for prediction with mixed data. Communications in Statistics A - Theory and Methods 19, 2261-2279.
Cuadras CM (1989). Distance analysis in discrimination and classification using both continuous and categorical variables. In: Y. Dodge (ed.), Statistical Data Analysis and Inference. Amsterdam, The Netherlands: North-Holland Publishing Co., pp. 459-473.
dbglm
for distance-based generalized linear models.
summary
method for class "dblm"
## S3 method for class 'dblm' summary(object,...)
## S3 method for class 'dblm' summary(object,...)
object |
an object of class |
... |
arguments passed to or from other methods to the low level. |
A list of class summary.dblm
containing the following components:
residuals |
the residuals (response minus fitted values). |
sigma |
the residual standard error. |
r.squared |
the coefficient of determination R2. |
adj.r.squared |
adjusted R-squared. |
rdf |
the residual degrees of freedom. |
call |
the matched call. |
gvar |
weighted geometric variability of the squared distance matrix. |
gvec |
diagonal entries in weighted inner products matrix G. |
method |
method used to decide the effective rank. |
eff.rank |
integer between 1 and the number of observations minus one.
Number of Euclidean coordinates used for model fitting. Applies only
if |
rel.gvar |
relative geometric variability (real between 0 and 1). Take the
lowest effective rank with a relative geometric variability higher
or equal to |
crit.value |
value of criterion defined in |
Boj, Eva <[email protected]>, Caballe, Adria <[email protected]>, Delicado, Pedro <[email protected]> and Fortiana, Josep <[email protected]>
Boj E, Delicado P, Fortiana J (2010). Distance-based local linear regression for functional predictors. Computational Statistics and Data Analysis 54, 429-437.
Boj E, Grane A, Fortiana J, Claramunt MM (2007). Selection of predictors in distance-based regression. Communications in Statistics B - Simulation and Computation 36, 87-98.
Cuadras CM, Arenas C, Fortiana J (1996). Some computational aspects of a distance-based model for prediction. Communications in Statistics B - Simulation and Computation 25, 593-609.
Cuadras C, Arenas C (1990). A distance-based regression model for prediction with mixed data. Communications in Statistics A - Theory and Methods 19, 2261-2279.
Cuadras CM (1989). Distance analysis in discrimination and classification using both continuous and categorical variables. In: Y. Dodge (ed.), Statistical Data Analysis and Inference. Amsterdam, The Netherlands: North-Holland Publishing Co., pp. 459-473.
dblm
for distance-based linear models.
summary
method for class "dbplsr"
## S3 method for class 'dbplsr' summary(object,...)
## S3 method for class 'dbplsr' summary(object,...)
object |
an object of class |
... |
arguments passed to or from other methods to the low level. |
A list of class summary.dbplsr
containing the following components:
ncomp |
the number of components of the model. |
r.squared |
the coefficient of determination R2. |
adj.r.squared |
adjusted R-squared. |
call |
the matched call. |
residuals |
a list containing the residuals for each iteration (response minus fitted values). |
sigma |
the residual standard error. |
gvar |
total weighted geometric variability. |
gvec |
the diagonal entries in G0. |
gvar.iter |
geometric variability for each iteration. |
method |
the using method to set |
crit.value |
value of criterion defined in |
ncomp.opt |
optimum number of components according to the selected method. |
Boj, Eva <[email protected]>, Caballe, Adria <[email protected]>, Delicado, Pedro <[email protected]> and Fortiana, Josep <[email protected]>
Boj E, Delicado P, Fortiana J (2010). Distance-based local linear regression for functional predictors. Computational Statistics and Data Analysis 54, 429-437.
Boj E, Grane A, Fortiana J, Claramunt MM (2007). Implementing PLS for distance-based regression: computational issues. Computational Statistics 22, 237-248.
Boj E, Grane A, Fortiana J, Claramunt MM (2007). Selection of predictors in distance-based regression. Communications in Statistics B - Simulation and Computation 36, 87-98.
Cuadras CM, Arenas C, Fortiana J (1996). Some computational aspects of a distance-based model for prediction. Communications in Statistics B - Simulation and Computation 25, 593-609.
Cuadras C, Arenas C (1990). A distance-based regression model for prediction with mixed data. Communications in Statistics A - Theory and Methods 19, 2261-2279.
Cuadras CM (1989). Distance analysis in discrimination and classification using both continuous and categorical variables. In: Y. Dodge (ed.), Statistical Data Analysis and Inference. Amsterdam, The Netherlands: North-Holland Publishing Co., pp. 459-473.
dbplsr
for distance-based partial least squares.
# require(pls) library(pls) data(yarn) ## Default methods: yarn.dbplsr <- dbplsr(density ~ NIR, data = yarn, ncomp=6, method="GCV") summary(yarn.dbplsr)
# require(pls) library(pls) data(yarn) ## Default methods: yarn.dbplsr <- dbplsr(density ~ NIR, data = yarn, ncomp=6, method="GCV") summary(yarn.dbplsr)
summary
method for class "ldbglm"
.
## S3 method for class 'ldbglm' summary(object,dispersion = NULL,...)
## S3 method for class 'ldbglm' summary(object,dispersion = NULL,...)
object |
an object of class |
dispersion |
the dispersion parameter for the family used.
Either a single numerical value or |
... |
arguments passed to or from other methods to the low level. |
A list of class summary.ldgblm
containing the following components:
nobs |
number of observations. |
trace.hat |
Trace of smoother matrix. |
call |
the matched call. |
family |
the |
deviance |
measure of discrepancy or goodness of fitt. Proportional to twice the difference between the maximum log likelihood achievable and that achieved by the model under investigation. |
df.residual |
the residual degrees of freedom. |
null.deviance |
the deviance for the null model. |
df.null |
the residual degrees of freedom for the null model. |
iter |
number of Fisher Scoring ( |
deviance.resid |
the deviance residuals for each observation: sign(y-mu)*sqrt(di). |
pears.resid |
the raw residual scaled by the estimated standard
deviation of |
dispersion |
the dispersion is taken as 1 for the binomial and Poisson families, and otherwise estimated by the residual Chisquared statistic (calculated from cases with non-zero weights) divided by the residual degrees of freedom. |
kind.kernel |
smoothing kernel function. |
method.h |
method used to decide the optimal bandwidth. |
h.opt |
the optimal bandwidth h used in the fitting proces
( |
crit.value |
value of criterion defined in |
Boj, Eva <[email protected]>, Caballe, Adria <[email protected]>, Delicado, Pedro <[email protected]> and Fortiana, Josep <[email protected]>
Boj E, Delicado P, Fortiana J (2010). Distance-based local linear regression for functional predictors. Computational Statistics and Data Analysis 54, 429-437.
Boj E, Grane A, Fortiana J, Claramunt MM (2007). Selection of predictors in distance-based regression. Communications in Statistics B - Simulation and Computation 36, 87-98.
Cuadras CM, Arenas C, Fortiana J (1996). Some computational aspects of a distance-based model for prediction. Communications in Statistics B - Simulation and Computation 25, 593-609.
Cuadras C, Arenas C (1990). A distance-based regression model for prediction with mixed data. Communications in Statistics A - Theory and Methods 19, 2261-2279.
Cuadras CM (1989). Distance analysis in discrimination and classification using both continuous and categorical variables. In: Y. Dodge (ed.), Statistical Data Analysis and Inference. Amsterdam, The Netherlands: North-Holland Publishing Co., pp. 459-473.
ldbglm
for local distance-based generalized linear models.
summary
method for class "ldblm"
.
## S3 method for class 'ldblm' summary(object,...)
## S3 method for class 'ldblm' summary(object,...)
object |
an object of class |
... |
arguments passed to or from other methods to the low level. |
A list of class summary.ldblm
containing the following components:
nobs |
number of observations. |
r.squared |
the coefficient of determination R2. |
trace.hat |
Trace of smoother matrix . |
call |
the matched call. |
residuals |
the residuals (the response minus fitted values). |
family |
the |
kind.kernel |
smoothing kernel function. |
method.h |
method used to decide the optimal bandwidth. |
h.opt |
the optimal bandwidth h used in the fitting proces
( |
crit.value |
value of criterion defined in |
Boj, Eva <[email protected]>, Caballe, Adria <[email protected]>, Delicado, Pedro <[email protected]> and Fortiana, Josep <[email protected]>
Boj E, Delicado P, Fortiana J (2010). Distance-based local linear regression for functional predictors. Computational Statistics and Data Analysis 54, 429-437.
Boj E, Grane A, Fortiana J, Claramunt MM (2007). Selection of predictors in distance-based regression. Communications in Statistics B - Simulation and Computation 36, 87-98.
Cuadras CM, Arenas C, Fortiana J (1996). Some computational aspects of a distance-based model for prediction. Communications in Statistics B - Simulation and Computation 25, 593-609.
Cuadras C, Arenas C (1990). A distance-based regression model for prediction with mixed data. Communications in Statistics A - Theory and Methods 19, 2261-2279.
Cuadras CM (1989). Distance analysis in discrimination and classification using both continuous and categorical variables. In: Y. Dodge (ed.), Statistical Data Analysis and Inference. Amsterdam, The Netherlands: North-Holland Publishing Co., pp. 459-473.
ldblm
for local distance-based linear models.