Title: | Nonparametric Estimation of Regression Models with Factor-by-Curve Interactions |
---|---|
Description: | A method for obtaining nonparametric estimates of regression models with or without factor-by-curve interactions using local polynomial kernel smoothers or splines. Additionally, a parametric model (allometric model) can be estimated. |
Authors: | Marta Sestelo [aut, cre] , Nora M. Villanueva [aut], Javier Roca-Pardinas [aut] |
Maintainer: | Marta Sestelo <[email protected]> |
License: | MIT + file LICENSE |
Version: | 1.5.2 |
Built: | 2024-11-27 06:43:30 UTC |
Source: | CRAN |
Bootstrap-based procedure that tests whether the data can be modelled by an allometric model.
allotest( formula, data, na.action = "na.omit", nboot = 500, seed = NULL, cluster = TRUE, ncores = NULL, test = "res", ... )
allotest( formula, data, na.action = "na.omit", nboot = 500, seed = NULL, cluster = TRUE, ncores = NULL, test = "res", ... )
formula |
An object of class |
data |
An optional data frame, matrix or list required by
the formula. If not found in data, the variables are taken from
|
na.action |
A function which indicates what should happen when the data contain 'NA's. The default is 'na.omit'. |
nboot |
Number of bootstrap repeats. |
seed |
Seed to be used in the bootstrap procedure. |
cluster |
A logical value. If |
ncores |
An integer value specifying the number of cores to be used
in the parallelized procedure. If |
test |
Statistic test to be used, based on residuals on the null model
( |
... |
Other options. |
In order to facilitate the choice of a model appropriate
to the data while at the same time endeavouring to minimise the
loss of information, a bootstrap-based procedure, that test whether the
data can be modelled by an allometric model, was developed. Therefore,
allotest
tests the null hypothesis of an allometric model taking
into account the logarithm of the original variable
( and
).
Based on a general model of the type
the aim here is to test the null hypothesis of an allometric model
the general hypothesis
, with
being an unknown nonparametric function; or analogously,
with being an unknown function not equal to zero.
To implement this test we have used the wild bootstrap.
An object is returned with the following elements:
statistic |
the value of the test statistic. |
value |
the p-value of the test. |
Marta Sestelo, Nora M. Villanueva and Javier Roca-Pardinas.
Sestelo, M. and Roca-Pardinas, J. (2011). A new approach to estimation of
length-weight relationship of
(Gmelin, 1789)
on the Atlantic coast of Galicia (Northwest Spain): some aspects of its
biology and management. Journal of Shellfish Research, 30 (3), 939–948.
Sestelo, M. (2013). Development and computational implementation of estimation and inference methods in flexible regression models. Applications in Biology, Engineering and Environment. PhD Thesis, Department of Statistics and O.R. University of Vigo.
library(npregfast) data(barnacle) allotest(DW ~ RC, data = barnacle, nboot = 50, seed = 130853, cluster = FALSE)
library(npregfast) data(barnacle) allotest(DW ~ RC, data = barnacle, nboot = 50, seed = 130853, cluster = FALSE)
frfast
objects with ggplot2 graphicsUseful for drawing the estimated regression function,
first and second derivative (for each factor's level) using ggplot2 graphics.
Additionally, with the
diffwith
argument it is possible to draw the differences between
two factor's levels.
## S3 method for class 'frfast' autoplot( object = model, fac = NULL, der = 0, diffwith = NULL, points = TRUE, xlab = model$name[2], ylab = model$name[1], ylim = NULL, main = NULL, col = "black", CIcol = "black", CIlinecol = "transparent", pcol = "grey80", abline = TRUE, ablinecol = "red", lty = 1, CIlty = 2, lwd = 1, CIlwd = 1, cex = 1.4, alpha = 0.2, ... )
## S3 method for class 'frfast' autoplot( object = model, fac = NULL, der = 0, diffwith = NULL, points = TRUE, xlab = model$name[2], ylab = model$name[1], ylim = NULL, main = NULL, col = "black", CIcol = "black", CIlinecol = "transparent", pcol = "grey80", abline = TRUE, ablinecol = "red", lty = 1, CIlty = 2, lwd = 1, CIlwd = 1, cex = 1.4, alpha = 0.2, ... )
object |
|
fac |
Factor's level to be taken into account
in the plot. By default is |
der |
Number which determines any inference process.
By default |
diffwith |
Factor's level used for drawing the differences respect to the
level specified in the |
points |
Draw the original data into the plot. By default it is
|
xlab |
A title for the |
ylab |
A title for the |
ylim |
The |
main |
An overall title for the plot. |
col |
A specification for the default plotting color. |
CIcol |
A specification for the default confidence intervals plotting color (for the fill). |
CIlinecol |
A specification for the default confidence intervals plotting color (for the edge). |
pcol |
A specification for the points color. |
abline |
Draw an horizontal line into the plot of the second derivative of the model. |
ablinecol |
The color to be used for |
lty |
The line type. Line types can either be specified as an integer
(0 = blank, 1 = solid (default), 2 = dashed, 3 = dotted, 4 = dotdash,
5 = longdash, 6 = twodash). See details in |
CIlty |
The line type for confidence intervals. Line types can either be specified as an integer (0 = blank, 1 = solid (default), 2 = dashed, 3 = dotted, 4 = dotdash, 5 = longdash, 6 = twodash). |
lwd |
The line width, a positive number, defaulting to 1.
See details in |
CIlwd |
The line width for confidence intervals, a positive number, defaulting to 1. |
cex |
A numerical value giving the amount by which plotting symbols
should be magnified relative to the default. See details in |
alpha |
Alpha transparency for overlapping elements expressed as a fraction between 0 (complete transparency) and 1 (complete opacity). |
... |
Other options. |
A ggplot object, so you can use common features from ggplot2 package to manipulate the plot.
Marta Sestelo, Nora M. Villanueva and Javier Roca-Pardinas.
library(npregfast) library(ggplot2) data(barnacle) # Nonparametric regression without interactions fit <- frfast(DW ~ RC, data = barnacle, nboot = 50) autoplot(fit) autoplot(fit, points = FALSE) + ggtitle("Title") autoplot(fit, der = 1) + xlim(4, 20) #autoplot(fit, der = 1, col = "red", CIcol = "blue") # Nonparametric regression with interactions fit2 <- frfast(DW ~ RC : F, data = barnacle, nboot = 50) autoplot(fit2, fac = "barca") # autoplot(fit2, der = 1, fac = "lens") # Visualization of the differences between two factor's levels autoplot(fit2, fac = "barca", diffwith = "lens") # autoplot(fit2, der = 1, fac = "barca", diffwith = "lens") #Plotting in the same graphics device ## Not run: if (requireNamespace("gridExtra", quietly = TRUE)) { # For plotting two derivatives in the same graphic windows ders <- lapply(0:1, function(x) autoplot(fit, der = x)) gridExtra::grid.arrange(grobs = ders, ncol = 2, nrow = 1) # For plotting two levels in the same graphic windows facs <- lapply(c("barca", "lens"), function(x) autoplot(fit2, der = 0, fac = x)) gridExtra::grid.arrange(grobs = facs, ncol = 2, nrow = 1) } ## End(Not run)
library(npregfast) library(ggplot2) data(barnacle) # Nonparametric regression without interactions fit <- frfast(DW ~ RC, data = barnacle, nboot = 50) autoplot(fit) autoplot(fit, points = FALSE) + ggtitle("Title") autoplot(fit, der = 1) + xlim(4, 20) #autoplot(fit, der = 1, col = "red", CIcol = "blue") # Nonparametric regression with interactions fit2 <- frfast(DW ~ RC : F, data = barnacle, nboot = 50) autoplot(fit2, fac = "barca") # autoplot(fit2, der = 1, fac = "lens") # Visualization of the differences between two factor's levels autoplot(fit2, fac = "barca", diffwith = "lens") # autoplot(fit2, der = 1, fac = "barca", diffwith = "lens") #Plotting in the same graphics device ## Not run: if (requireNamespace("gridExtra", quietly = TRUE)) { # For plotting two derivatives in the same graphic windows ders <- lapply(0:1, function(x) autoplot(fit, der = x)) gridExtra::grid.arrange(grobs = ders, ncol = 2, nrow = 1) # For plotting two levels in the same graphic windows facs <- lapply(c("barca", "lens"), function(x) autoplot(fit2, der = 0, fac = x)) gridExtra::grid.arrange(grobs = facs, ncol = 2, nrow = 1) } ## End(Not run)
This barnacle data set gives the measurements of the variables dry weight (in g.) and rostro-carinal lenght (in mm) for 2000 barnacles collected along the intertidal zone from two sites of the Atlantic coast of Galicia (Spain).
barnacle
barnacle
barnacle
is a data frame with 2000 cases (rows) and
3 variables (columns).
Dry weight (in g.)
Rostro-carinal lenght (in mm).
Factor indicating the sites of harvest: barca
and lens
.
Sestelo, M. and Roca-Pardinas, J. (2011). A new approach to estimation of
length-weight relationship of
(Gmelin, 1789) on the Atlantic coast of Galicia (Northwest Spain): some
aspects of its biology and management. Journal of Shellfish Research,
30(3), 939–948.
Sestelo, M., Villanueva, N.M., Meira-Machado, L., Roca-Pardinas, J. (2017). npregfast: An R Package for Nonparametric Estimation and Inference in Life Sciences. Journal of Statistical Software, 82(12), 1-27.
data(barnacle) head(barnacle)
data(barnacle) head(barnacle)
This children data set contains the age and height measurements of 2500 children aged 5 to 19 years, splitted by sex (1292 females and 1208 males).
children
children
children
is a data frame with 2500 cases (rows) and
3 variables (columns).
Individual's gender (female or male).
Height measured in centimeters.
Age in years.
Other data sets of this type can be obtained from https://www.who.int/toolkits/child-growth-standards.
data(children) head(children)
data(children) head(children)
This function draws inference about some critical point in
the support of which is associated with some features of the regression
function (e.g., minimum, maximum or inflection points which indicate changes
in the sign of curvature). Returns the value of the covariate
x
which maximizes the estimate of the function, the value of the covariate
x
which maximizes the first derivative and the value of the covariate
x
which equals the second derivative to zero, for each level of the
factor.
critical(model, der = NULL)
critical(model, der = NULL)
model |
Parametric or nonparametric regression out
obtained by |
der |
Number which determines any inference process. By default
|
An object is returned with the following elements:
Estimation |
|
First_der |
|
Second_der |
|
Marta Sestelo, Nora M. Villanueva and Javier Roca-Pardinas.
Sestelo, M. (2013). Development and computational implementation of estimation and inference methods in flexible regression models. Applications in Biology, Engineering and Environment. PhD Thesis, Department of Statistics and O.R. University of Vigo.
Sestelo, M., Villanueva, N.M., Meira-Machado, L., Roca-Pardinas, J. (2017). npregfast: An R Package for Nonparametric Estimation and Inference in Life Sciences. Journal of Statistical Software, 82(12), 1-27.
library(npregfast) data(barnacle) fit <- frfast(DW ~ RC, data = barnacle) # without interactions critical(fit) critical(fit, der = 0) critical(fit, der = 1) critical(fit, der = 2) # fit2 <- frfast(DW ~ RC : F, data = barnacle) # with interactions # critical(fit2) # critical(fit2, der = 0) # critical(fit2, der = 1) # critical(fit2, der = 2)
library(npregfast) data(barnacle) fit <- frfast(DW ~ RC, data = barnacle) # without interactions critical(fit) critical(fit, der = 0) critical(fit, der = 1) critical(fit, der = 2) # fit2 <- frfast(DW ~ RC : F, data = barnacle) # with interactions # critical(fit2) # critical(fit2, der = 0) # critical(fit2, der = 1) # critical(fit2, der = 2)
Differences between the estimation of critical
for two
factor's levels.
criticaldiff(model, level1 = NULL, level2 = NULL, der = NULL)
criticaldiff(model, level1 = NULL, level2 = NULL, der = NULL)
model |
Parametric or nonparametric regression model
obtained by |
level1 |
First factor's level at which to perform the differences between critical points. |
level2 |
Second factor's level at which to perform the differences between critical points. |
der |
Number which determines any inference process. By default
|
Differences are calculated by subtracting a factor relative to
another (). By default
level2
and
level1
are NULL
, so the differences calculated are for all
possible combinations between two factors. Additionally, it is obtained
the 95% confidence interval for this difference which let us to make
inference about them.
An object is returned with the following elements:
critical.diff |
a table with a couple of factor's level where it is used to calculate the differences between the critical points, and their 95% confidence interval (for the estimation, first and second derivative). |
Marta Sestelo, Nora M. Villanueva and Javier Roca-Pardinas.
Sestelo, M. (2013). Development and computational implementation of estimation and inference methods in flexible regression models. Applications in Biology, Engineering and Environment. PhD Thesis, Department of Statistics and O.R. University of Vigo.
Sestelo, M., Villanueva, N.M., Meira-Machado, L., Roca-Pardinas, J. (2017). npregfast: An R Package for Nonparametric Estimation and Inference in Life Sciences. Journal of Statistical Software, 82(12), 1-27.
library(npregfast) data(barnacle) fit2 <- frfast(DW ~ RC : F, data = barnacle, seed = 130853, nboot = 100) # with interactions criticaldiff(fit2) criticaldiff(fit2, der = 1) criticaldiff(fit2, der = 1, level1 = "lens", level2 = "barca")
library(npregfast) data(barnacle) fit2 <- frfast(DW ~ RC : F, data = barnacle, seed = 130853, nboot = 100) # with interactions criticaldiff(fit2) criticaldiff(fit2, der = 1) criticaldiff(fit2, der = 1, level1 = "lens", level2 = "barca")
This function is used to fit nonparametric models by using local polynomial kernel smoothers or splines. These models can include or not factor-by-curve interactions. Additionally, a parametric model (allometric model) can be estimated (or not).
frfast( formula, data, na.action = "na.omit", model = "np", smooth = "kernel", h0 = -1, h = -1, nh = 30, weights = NULL, kernel = "epanech", p = 3, kbin = 100, nboot = 500, rankl = NULL, ranku = NULL, seed = NULL, cluster = TRUE, ncores = NULL, ... )
frfast( formula, data, na.action = "na.omit", model = "np", smooth = "kernel", h0 = -1, h = -1, nh = 30, weights = NULL, kernel = "epanech", p = 3, kbin = 100, nboot = 500, rankl = NULL, ranku = NULL, seed = NULL, cluster = TRUE, ncores = NULL, ... )
formula |
An object of class |
data |
An optional data frame, matrix or list required by
the formula. If not found in data, the variables are taken from
|
na.action |
A function which indicates what should happen when the data contain 'NA's. The default is 'na.omit'. |
model |
Type model used: |
smooth |
Type smoother used: |
h0 |
The kernel bandwidth smoothing parameter for the global effect (see references for more details at the estimation). Large values of the bandwidth lead to smoothed estimates; smaller values of the bandwidth lead lo undersmoothed estimates. By default, cross validation is used to obtain the bandwidth. |
h |
The kernel bandwidth smoothing parameter for the partial effects. |
nh |
Integer number of equally-spaced bandwidth in which the
|
weights |
Prior weights on the data. |
kernel |
A character string specifying the desired kernel.
Defaults to |
p |
Polynomial degree to be used in the kernel-based regression. Its value must be the value of derivative + 1. The default value is 3, returning the estimation, first and second derivative. |
kbin |
Number of binning nodes over which the function is to be estimated. |
nboot |
Number of bootstrap repeats. Defaults to 500 bootstrap repeats.
The wild bootstrap is used when |
rankl |
Number or vector specifying the minimum value for the
interval at which to search the |
ranku |
Number or vector specifying the maximum value for the
interval at which to search the |
seed |
Seed to be used in the bootstrap procedure. |
cluster |
A logical value. If |
ncores |
An integer value specifying the number of cores to be used
in the parallelized procedure. If |
... |
Other options. |
The models fitted by frfast
function are specified
in a compact symbolic form. The ~ operator is basic in the formation
of such models. An expression of the form y ~ model
is interpreted as
a specification that the response y
is modelled by a predictor
specified symbolically by model
. The possible terms consist of a
variable name or a variable name and a factor name separated by : operator.
Such a term is interpreted as the interaction of the continuous variable and
the factor. However, if smooth = "splines"
, the formula is based on the function
formula.gam of the mgcv package.
According with the model
argument, if model = "np"
the
estimated regression model will be of the type
being an smooth and unknown function and
the regression error with zero mean. If
model = "allo"
, users could estimate
the classical allometric model (Huxley, 1924) with a regression curve
being and
the parameters of the model.
An object is returned with the following elements:
x |
Vector of values of the grid points at which model is to be estimate. |
p |
Matrix of values of the grid points at which to compute the estimate, their first and second derivative. |
pl |
Lower values of 95% confidence interval for the estimate, their first and second derivative. |
pu |
Upper values of 95% confidence interval for the estimate, their first and second derivative. |
diff |
Differences between the estimation values of a couple of levels (i. e. level 2 - level 1). The same procedure for their first and second derivative. |
diffl |
Lower values of 95% confidence interval for the differences between the estimation values of a couple of levels. It is performed for their first and second derivative. |
diffu |
Upper values of 95% confidence interval for the differences between the estimation values of a couple of levels. It is performed for their first and second derivative. |
nboot |
Number of bootstrap repeats. |
n |
Sample size. |
dp |
Degree of polynomial to be used. |
h0 |
The kernel bandwidth smoothing parameter for the global effect. |
h |
The kernel bandwidth smoothing parameter for the partial effects. |
fmod |
Factor's level for each data. |
xdata |
Original x values. |
ydata |
Original y values. |
w |
Weights on the data. |
kbin |
Number of binning nodes over which the function is to be estimated. |
nf |
Number of levels. |
max |
Value of covariate |
maxu |
Upper value of 95% confidence interval for the
value |
maxl |
Lower value of 95% confidence interval for the
value |
diffmax |
Differences between the estimation of |
diffmaxu |
Upper value of 95% confidence interval for the value
|
diffmaxl |
Lower value of 95% confidence interval for the value
|
repboot |
Matrix of values of the grid points at which to compute the estimate, their first and second derivative for each bootstrap repeat. |
rankl |
Maximum value for the interval at which to search the
|
ranku |
Minimum value for the interval at which to search the
|
nmodel |
Type model used: |
label |
Labels of the variables in the model. |
numlabel |
Number of labels. |
kernel |
A character specifying the derised kernel. |
a |
Estimated coefficient in the case of fitting an allometric model. |
al |
Lower value of 95% confidence interval for the value of |
au |
Upper value of 95% confidence interval for the value of |
b |
Estimated coefficient in the case of fitting an allometric model. |
bl |
Lower value of 95% confidence interval for the value of |
bu |
Upper value of 95% confidence interval for the value of |
name |
Name of the variables in the model. |
formula |
A sympbolic description of the model to be fitted. |
nh |
Integer number of equally-spaced bandwidth on which the
|
r2 |
Coefficient of determination (in the case of the allometric model). |
smooth |
Type smoother used. |
cluster |
Is the procedure parallelized? (for splines smoothers). |
ncores |
Number of cores used in the parallelized procedure? (for splines smoothers). |
Marta Sestelo, Nora M. Villanueva and Javier Roca-Pardinas.
Huxley, J. S. (1924). Constant differential growth-ratios and their significance. Nature, 114:895–896.
Sestelo, M. (2013). Development and computational implementation of estimation and inference methods in flexible regression models. Applications in Biology, Engineering and Environment. PhD Thesis, Department of Statistics and O.R. University of Vigo.
Sestelo, M., Villanueva, N.M., Meira-Machado, L., Roca-Pardinas, J. (2017). npregfast: An R Package for Nonparametric Estimation and Inference in Life Sciences. Journal of Statistical Software, 82(12), 1-27.
library(npregfast) data(barnacle) # Nonparametric regression without interactions fit <- frfast(DW ~ RC, data = barnacle, nboot = 100, smooth = "kernel") fit summary(fit) # using splines #fit <- frfast(DW ~ s(RC), data = barnacle, nboot = 100, #smooth = "splines", cluster = TRUE, ncores = 2) #fit #summary(fit) # Change the number of binning nodes and bootstrap replicates fit <- frfast(DW ~ RC, data = barnacle, kbin = 200, nboot = 100, smooth = "kernel") # Nonparametric regression with interactions fit2 <- frfast(DW ~ RC : F, data = barnacle, nboot = 100) fit2 summary(fit2) # using splines #fit2 <- frfast(DW ~ s(RC, by = F), data = barnacle, # nboot = 100, smooth = "splines", cluster = TRUE, ncores = 2) #fit2 #summary(fit2) # Allometric model fit3 <- frfast(DW ~ RC, data = barnacle, model = "allo", nboot = 100) summary(fit3) # fit4 <- frfast(DW ~ RC : F, data = barnacle, model = "allo", nboot = 100) # summary(fit4)
library(npregfast) data(barnacle) # Nonparametric regression without interactions fit <- frfast(DW ~ RC, data = barnacle, nboot = 100, smooth = "kernel") fit summary(fit) # using splines #fit <- frfast(DW ~ s(RC), data = barnacle, nboot = 100, #smooth = "splines", cluster = TRUE, ncores = 2) #fit #summary(fit) # Change the number of binning nodes and bootstrap replicates fit <- frfast(DW ~ RC, data = barnacle, kbin = 200, nboot = 100, smooth = "kernel") # Nonparametric regression with interactions fit2 <- frfast(DW ~ RC : F, data = barnacle, nboot = 100) fit2 summary(fit2) # using splines #fit2 <- frfast(DW ~ s(RC, by = F), data = barnacle, # nboot = 100, smooth = "splines", cluster = TRUE, ncores = 2) #fit2 #summary(fit2) # Allometric model fit3 <- frfast(DW ~ RC, data = barnacle, model = "allo", nboot = 100) summary(fit3) # fit4 <- frfast(DW ~ RC : F, data = barnacle, model = "allo", nboot = 100) # summary(fit4)
This function can be used to test the equality of the
curves specific to each level.
globaltest( formula, data, na.action = "na.omit", der, smooth = "kernel", weights = NULL, nboot = 500, h0 = -1, h = -1, nh = 30, kernel = "epanech", p = 3, kbin = 100, seed = NULL, cluster = TRUE, ncores = NULL, ... )
globaltest( formula, data, na.action = "na.omit", der, smooth = "kernel", weights = NULL, nboot = 500, h0 = -1, h = -1, nh = 30, kernel = "epanech", p = 3, kbin = 100, seed = NULL, cluster = TRUE, ncores = NULL, ... )
formula |
An object of class |
data |
An optional data frame, matrix or list required by
the formula. If not found in data, the variables are taken from
|
na.action |
A function which indicates what should happen when the data contain 'NA's. The default is 'na.omit'. |
der |
Number which determines any inference process.
By default |
smooth |
Type smoother used: |
weights |
Prior weights on the data. |
nboot |
Number of bootstrap repeats. |
h0 |
The kernel bandwidth smoothing parameter for the global effect (see references for more details at the estimation). Large values of the bandwidth lead to smoothed estimates; smaller values of the bandwidth lead lo undersmoothed estimates. By default, cross validation is used to obtain the bandwidth. |
h |
The kernel bandwidth smoothing parameter for the partial effects. |
nh |
Integer number of equally-spaced bandwidth on which the
|
kernel |
A character string specifying the desired kernel.
Defaults to |
p |
Degree of polynomial to be used. Its value must be the value of derivative + 1. The default value is 3 due to the function returns the estimation, first and second derivative. |
kbin |
Number of binning nodes over which the function is to be estimated. |
seed |
Seed to be used in the bootstrap procedure. |
cluster |
A logical value. If |
ncores |
An integer value specifying the number of cores to be used
in the parallelized procedure. If |
... |
Other options. |
globaltest
can be used to test the equality of the
curves specific to each level. This bootstrap based test assumes the
following null hypothesis:
versus the general alternative
Note that, if is not rejected, then the equality of critical points
will also accepted.
To test the null hypothesis, it is used a test statistic,
, based on direct nonparametric estimates of the curves.
If the null hypothesis is true, the value should be close to zero
but is generally greater. The test rule based on
consists of
rejecting the null hypothesis if
, where
is the empirical
-percentile of
under the null hypothesis. To
obtain this percentile, we have used bootstrap techniques. See details in
references.
Note that the models fitted by globaltest
function are specified
in a compact symbolic form. The ~ operator is basic in the formation
of such models. An expression of the form y ~ model
is interpreted as
a specification that the response y
is modelled by a predictor
specified symbolically by model
. The possible terms consist of a
variable name or a variable name and a factor name separated by : operator.
Such a term is interpreted as the interaction of the continuous variable and
the factor. However, if smooth = "splines"
, the formula is based on the function
formula.gam of the mgcv package.
The value and the
-value are returned. Additionally,
it is shown the decision, accepted or rejected, of the global test.
The null hypothesis is rejected if the
-value
.
Marta Sestelo, Nora M. Villanueva and Javier Roca-Pardinas.
Sestelo, M. (2013). Development and computational implementation of estimation and inference methods in flexible regression models. Applications in Biology, Engineering and Environment. PhD Thesis, Department of Statistics and O.R. University of Vigo.
Sestelo, M., Villanueva, N.M., Meira-Machado, L., Roca-Pardinas, J. (2017). npregfast: An R Package for Nonparametric Estimation and Inference in Life Sciences. Journal of Statistical Software, 82(12), 1-27.
library(npregfast) data(barnacle) globaltest(DW ~ RC : F, data = barnacle, der = 1, seed = 130853, nboot = 100) # globaltest(height ~ s(age, by = sex), data = children, # seed = 130853, der = 0, smooth = "splines")
library(npregfast) data(barnacle) globaltest(DW ~ RC : F, data = barnacle, der = 1, seed = 130853, nboot = 100) # globaltest(height ~ s(age, by = sex), data = children, # seed = 130853, der = 0, smooth = "splines")
This function can be used to test the equality of the
critical points estimated from the respective level-specific curves.
localtest( formula, data = data, na.action = "na.omit", der, smooth = "kernel", weights = NULL, nboot = 500, h0 = -1, h = -1, nh = 30, kernel = "epanech", p = 3, kbin = 100, rankl = NULL, ranku = NULL, seed = NULL, cluster = TRUE, ncores = NULL, ci.level = 0.95, ... )
localtest( formula, data = data, na.action = "na.omit", der, smooth = "kernel", weights = NULL, nboot = 500, h0 = -1, h = -1, nh = 30, kernel = "epanech", p = 3, kbin = 100, rankl = NULL, ranku = NULL, seed = NULL, cluster = TRUE, ncores = NULL, ci.level = 0.95, ... )
formula |
An object of class |
data |
An optional data frame, matrix or list required by
the formula. If not found in data, the variables are taken from
|
na.action |
A function which indicates what should happen when the data contain 'NA's. The default is 'na.omit'. |
der |
Number which determines any inference process.
By default |
smooth |
Type smoother used: |
weights |
Prior weights on the data. |
nboot |
Number of bootstrap repeats. |
h0 |
The kernel bandwidth smoothing parameter for the global effect (see references for more details at the estimation). Large values of the bandwidth lead to smoothed estimates; smaller values of the bandwidth lead lo undersmoothed estimates. By default, cross validation is used to obtain the bandwidth. |
h |
The kernel bandwidth smoothing parameter for the partial effects. |
nh |
Integer number of equally-spaced bandwidth on which the
|
kernel |
A character string specifying the desired kernel.
Defaults to |
p |
Degree of polynomial to be used. Its value must be the value of derivative + 1. The default value is 3 due to the function returns the estimation, first and second derivative. |
kbin |
Number of binning nodes over which the function is to be estimated. |
rankl |
Number or vector specifying the minimum value for the
interval at which to search the |
ranku |
Number or vector specifying the maximum value for the
interval at which to search the |
seed |
Seed to be used in the bootstrap procedure. |
cluster |
A logical value. If |
ncores |
An integer value specifying the number of cores to be used
in the parallelized procedure. If |
ci.level |
Level of bootstrap confidence interval. Defaults to 0.95 (corresponding to 95%). Note that the function accepts a vector of levels. |
... |
Other options. |
localtest
can be used to test the equality of the
critical points estimated from the respective level-specific curves.
Note that, even if the curves and/or their derivatives are different, it is
possible for these points to be equal.
For instance, taking the maxima of the first derivatives into account, interest lies in testing the following null hypothesis
versus the general alternative
The above hypothesis is true if where
otherwise is false. It is important to highlight that, in practice,
the true
are not known, and consequently neither is
,
so an estimate
is used, where,
in general,
are the estimates of
based on the
estimated curves
with
.
Needless to say,
since is only an estimate of the true
, the sampling
uncertainty of these estimates needs to be acknowledged. Hence, a confidence
interval
is created for
for a specific level of
confidence (95%). Based on this, the null hypothesis is rejected if
zero is not contained in the interval.
Note that if this hypothesis is rejected (and the factor has more than
two levels), one option could be to use the maxp.diff
function in
order to obtain the differences between each pair of factor's levels.
Note that the models fitted by localtest
function are specified
in a compact symbolic form. The ~ operator is basic in the formation
of such models. An expression of the form y ~ model
is interpreted as
a specification that the response y
is modelled by a predictor
specified symbolically by model
. The possible terms consist of a
variable name or a variable name and a factor name separated by : operator.
Such a term is interpreted as the interaction of the continuous variable and
the factor. However, if smooth = "splines"
, the formula is based on the function
formula.gam of the mgcv package.
The estimate of value is returned and its confidence interval
for a specific-level of confidence, i.e. 95%. Additionally, it is shown
the decision, accepted or rejected, of the local test. Based on the null
hypothesis is rejected if a zero value is not within the interval.
Marta Sestelo, Nora M. Villanueva and Javier Roca-Pardinas.
Sestelo, M. (2013). Development and computational implementation of estimation and inference methods in flexible regression models. Applications in Biology, Engineering and Environment. PhD Thesis, Department of Statistics and O.R. University of Vigo.
Sestelo, M., Villanueva, N.M., Meira-Machado, L., Roca-Pardinas, J. (2017). npregfast: An R Package for Nonparametric Estimation and Inference in Life Sciences. Journal of Statistical Software, 82(12), 1-27.
library(npregfast) data(barnacle) localtest(DW ~ RC : F, data = barnacle, der = 1, seed = 130853, nboot = 100) # localtest(height ~ s(age, by = sex), data = children, seed = 130853, # der = 1, smooth = "splines")
library(npregfast) data(barnacle) localtest(DW ~ RC : F, data = barnacle, der = 1, seed = 130853, nboot = 100) # localtest(height ~ s(age, by = sex), data = children, seed = 130853, # der = 1, smooth = "splines")
frfast
objects with the base graphicsUseful for drawing the estimated regression function,
first and second derivative (for each factor's level). Additionally, with the
diffwith
argument it is possible to draw the differences between
two factor's levels.
## S3 method for class 'frfast' plot( x = model, y, fac = NULL, der = NULL, diffwith = NULL, points = TRUE, xlab = model$name[2], ylab = model$name[1], ylim = NULL, main = NULL, col = "black", CIcol = "black", pcol = "grey80", ablinecol = "red", abline = TRUE, type = "l", CItype = "l", lwd = 2, CIlwd = 1, lty = 1, CIlty = 2, cex = 0.6, ... )
## S3 method for class 'frfast' plot( x = model, y, fac = NULL, der = NULL, diffwith = NULL, points = TRUE, xlab = model$name[2], ylab = model$name[1], ylim = NULL, main = NULL, col = "black", CIcol = "black", pcol = "grey80", ablinecol = "red", abline = TRUE, type = "l", CItype = "l", lwd = 2, CIlwd = 1, lty = 1, CIlty = 2, cex = 0.6, ... )
x |
|
y |
NULL. |
fac |
Vector which determines the level to take into account
in the plot. By default is |
der |
Number or vector which determines any inference process.
By default |
diffwith |
Factor's level used for drawing the differences respect to the
level specified in the |
points |
Draw the original data into the plot. By default it is
|
xlab |
A title for the |
ylab |
A title for the |
ylim |
The |
main |
An overall title for the plot. |
col |
A specification for the default plotting color. |
CIcol |
A specification for the default confidence intervals plotting color. |
pcol |
A specification for the points color. |
ablinecol |
The color to be used for |
abline |
Draw an horizontal line into the plot of the second derivative of the model. |
type |
What type of plot should be drawn. Possible types are,
|
CItype |
What type of plot should be drawn for confidence intervals.
Possible types are, |
lwd |
The line width, a positive number, defaulting to 1.
See details in |
CIlwd |
The line width for confidence intervals, a positive number, defaulting to 1. |
lty |
The line type. Line types can either be specified as an integer
(0 = blank, 1 = solid (default), 2 = dashed, 3 = dotted, 4 = dotdash,
5 = longdash, 6 = twodash). See details in |
CIlty |
The line type for confidence intervals. Line types can either be specified as an integer (0 = blank, 1 = solid (default), 2 = dashed, 3 = dotted, 4 = dotdash, 5 = longdash, 6 = twodash). |
cex |
A numerical value giving the amount by which plotting symbols
should be magnified relative to the default. See details in |
... |
Other options. |
Simply produce a plot.
Marta Sestelo, Nora M. Villanueva and Javier Roca-Pardinas.
library(npregfast) data(barnacle) # Nonparametric regression without interactions fit <- frfast(DW ~ RC, data = barnacle, nboot = 100) plot(fit) plot(fit, der = 0) plot(fit, der = 0, points = FALSE) plot(fit, der = 1, col = "red", CIcol = "blue") # Nonparametric regression with interactions fit2 <- frfast(DW ~ RC : F, data = barnacle, nboot = 100) plot(fit2) plot(fit2, der = 0, fac = "lens") plot(fit2, der = 1, col = "grey", CIcol = "red") plot(fit2, der = c(0,1), fac = c("barca","lens")) # Visualization of the differences between two factor's levels plot(fit2, fac = "barca", diffwith = "lens") plot(fit2, fac = "barca", diffwith = "lens", der = 1)
library(npregfast) data(barnacle) # Nonparametric regression without interactions fit <- frfast(DW ~ RC, data = barnacle, nboot = 100) plot(fit) plot(fit, der = 0) plot(fit, der = 0, points = FALSE) plot(fit, der = 1, col = "red", CIcol = "blue") # Nonparametric regression with interactions fit2 <- frfast(DW ~ RC : F, data = barnacle, nboot = 100) plot(fit2) plot(fit2, der = 0, fac = "lens") plot(fit2, der = 1, col = "grey", CIcol = "red") plot(fit2, der = c(0,1), fac = c("barca","lens")) # Visualization of the differences between two factor's levels plot(fit2, fac = "barca", diffwith = "lens") plot(fit2, fac = "barca", diffwith = "lens", der = 1)
frfast
modelTakes a fitted frfast
object and produces predictions
(with their 95% confidence intervals) from a fitted model with
interactions or without interactions.
## S3 method for class 'frfast' predict(object = model, newdata, fac = NULL, der = NULL, seed = NULL, ...)
## S3 method for class 'frfast' predict(object = model, newdata, fac = NULL, der = NULL, seed = NULL, ...)
object |
A fitted |
newdata |
A data frame containing the values of the model covariates at which predictions are required. If newdata is provided, then it should contain all the variables needed for prediction: a warning is generated if not. |
fac |
Factor's level to take into account. By default is |
der |
Number which determines any inference process. By default
|
seed |
Seed to be used in the bootstrap procedure. |
... |
Other options. |
predict.frfast
computes and returns a list containing
predictions of the estimates, first and second derivative,
with their 95% confidence intervals.
Marta Sestelo, Nora M. Villanueva and Javier Roca-Pardinas.
library(npregfast) data(barnacle) # Nonparametric regression without interactions fit <- frfast(DW ~ RC, data = barnacle, nboot = 100) nd <- data.frame(RC = c(10, 14, 18)) predict(fit, newdata = nd) # Nonparametric regression with interactions # fit2 <- frfast(DW ~ RC : F, data = barnacle, nboot = 100) # nd2 <- data.frame(RC = c(10, 15, 20)) # predict(fit2, newdata = nd2) # predict(fit2, newdata = nd2, der = 0, fac = "barca")
library(npregfast) data(barnacle) # Nonparametric regression without interactions fit <- frfast(DW ~ RC, data = barnacle, nboot = 100) nd <- data.frame(RC = c(10, 14, 18)) predict(fit, newdata = nd) # Nonparametric regression with interactions # fit2 <- frfast(DW ~ RC : F, data = barnacle, nboot = 100) # nd2 <- data.frame(RC = c(10, 15, 20)) # predict(fit2, newdata = nd2) # predict(fit2, newdata = nd2, der = 0, fac = "barca")
Launch a Shiny app that shows a demo of what can be done with the package.
runExample()
runExample()
This example is also available online.
## Only run this example in interactive R sessions if (interactive()) { runExample() }
## Only run this example in interactive R sessions if (interactive()) { runExample() }
frfast
classTakes a fitted frfast
object produced by frfast()
and produces various useful summaries from it.
## S3 method for class 'frfast' summary(object = model, ...)
## S3 method for class 'frfast' summary(object = model, ...)
object |
a fitted |
... |
additional arguments affecting the predictions produced. |
print.frfast
tries to be smart about summary.frfast
.
summary.frfast
computes and returns a list of summary
information for a fitted frfast
object.
model |
type of model: nonparametric or allometric. |
smooth |
type of smoother: kernel or splines. |
h |
the kernel bandwidth smoothing parameter. |
dp |
degree of the polynomial. |
nboot |
number of bootstrap repeats. |
kbin |
number of binning nodes over which the function is to be estimated. |
n |
sample size. |
fmod |
factor's levels. |
coef |
if |
Marta Sestelo, Nora M. Villanueva and Javier Roca-Pardinas.
Sestelo, M. (2013). Development and computational implementation of estimation and inference methods in flexible regression models. Applications in Biology, Engineering and Environment. PhD Thesis, Department of Statistics and O.R. University of Vigo.
Sestelo, M., Villanueva, N.M., Meira-Machado, L., Roca-Pardinas, J. (2017). npregfast: An R Package for Nonparametric Estimation and Inference in Life Sciences. Journal of Statistical Software, 82(12), 1-27.
library(npregfast) data(barnacle) # Nonparametric regression without interactions fit <- frfast(DW ~ RC, data = barnacle, nboot = 100) fit summary(fit) # Nonparametric regression with interactions fit2 <- frfast(DW ~ RC : F, data = barnacle, nboot = 100) fit2 summary(fit2) # Allometric model fit3 <- frfast(DW ~ RC, data = barnacle, model = "allo", nboot = 100) fit3 summary(fit3)
library(npregfast) data(barnacle) # Nonparametric regression without interactions fit <- frfast(DW ~ RC, data = barnacle, nboot = 100) fit summary(fit) # Nonparametric regression with interactions fit2 <- frfast(DW ~ RC : F, data = barnacle, nboot = 100) fit2 summary(fit2) # Allometric model fit3 <- frfast(DW ~ RC, data = barnacle, model = "allo", nboot = 100) fit3 summary(fit3)