Title: | Ridge Regression with Automatic Selection of the Penalty Parameter |
---|---|
Description: | Linear and logistic ridge regression functions. Additionally includes special functions for genome-wide single-nucleotide polymorphism (SNP) data. More details can be found in <doi: 10.1002/gepi.21750> and <doi: 10.1186/1471-2105-12-372>. |
Authors: | Steffen Moritz [aut, cre] , Erika Cule [aut], Dan Frankowski [aut] |
Maintainer: | Steffen Moritz <[email protected]> |
License: | GPL-2 |
Version: | 3.3 |
Built: | 2024-10-30 06:46:50 UTC |
Source: | CRAN |
R package for fitting linear and logistic ridge regression models.
This package contains functions for fitting linear and logistic ridge regression models, including functions for fitting linear and logistic ridge regression models for genome-wide SNP data supplied as file names when the data are too big to read into R.
For a complete
list of functions, use help(package="ridge")
.
Steffen Moritz, Erika Cule
Simulated genetic data at 15 SNPs, together with simulated binary phenotypes
data(GenBin)
data(GenBin)
GenBin is a saved R matrix with 500 rows and 15 columns. The first column is the pheotypes and columns 2-15 contain the genotypes. Each row represents an indiviaul. The same data are stored in flat text files in GenBin_genotypes and GenBin_phenotypes (in the directory extdata (in the installed package) or inst/extdata (in the source)).
Simulated using FREGENE
Fregene: Simulation of realistic sequence-level data in populations and ascertained samples Chadeau-Hyam, M. et al, 2008, BMC Bioinformatics 9:364
data(GenBin)
data(GenBin)
Simulated genetic data with continuous outcomes.
data(GenCont)
data(GenCont)
GenCont is a saved R matrix with 500 rows and 13 columns. The first column is the pheotypes and columns 2-13 contain the genotypes. Each row represents an indiviaul. The same data are stored in flat text files in GenCont_genotypes and GenCont_phenotypes (in the directory extdata (in the installed package) or inst/extdata (in the source)).
Genotypes were simulated using FREGENE.
Fregene: Simulation of realistic sequence-level data in populations and ascertained samples Chadeau-Hyam, M. et al, 2008, BMC Bioinformatics 9:364
data(GenCont)
data(GenCont)
A Ten-Factor data set first described by Gornam and Toman (1966) and used by Hoerl and Kennard (1970) (and others) to investigate regression problems.
data(Gorman)
data(Gorman)
Numeric matrix.
The first column is the response on the log scale, the remaining columns are the predictors.
Selection of variables for fitting equations to data. Gorman, J. W. and Toman, R. J. (1966) Technometrics, 8:27.
Selection of variables for fitting equations to data. Gorman, J. W. and Toman, R. J. (1966) Technometrics, 8:27. Ridge Regression: Biased estimators for nonorthogonal problems. Hoerl, A. E. and Kennard, R. W. (1970) Technometrics, 12:55.
data(Gorman)
data(Gorman)
The Hald data as used by Hoerl, Kennard and Baldwin (1975).
These data are also in package wle
.
data(Hald)
data(Hald)
Numeric matrix.
The first column is the response and the remaining four columns are the predictors.
Ridge Regression: some simulations, Hoerl, A. E. et al, 1975, Comm Stat Theor Method 4:105
data(Hald)
data(Hald)
Fits a linear ridge regression model. Optionally, the ridge regression parameter is chosen automatically using the method proposed by Cule et al (2012).
linearRidge(formula, data, lambda = "automatic", nPCs = NULL, scaling = c("corrForm", "scale", "none"), ...) ## S3 method for class 'ridgeLinear' coef(object, all.coef = FALSE, ...) ## S3 method for class 'ridgeLinear' plot(x, y = NULL, ...) ## S3 method for class 'ridgeLinear' predict(object, newdata, na.action = na.pass, all.coef = FALSE, ...) ## S3 method for class 'ridgeLinear' print(x, all.coef = FALSE, ...) ## S3 method for class 'ridgeLinear' summary(object, all.coef = FALSE, ...) ## S3 method for class 'summary.ridgeLinear' print(x, digits = max(3, getOption("digits") - 3), signif.stars = getOption("show.signif.stars"), ...)
linearRidge(formula, data, lambda = "automatic", nPCs = NULL, scaling = c("corrForm", "scale", "none"), ...) ## S3 method for class 'ridgeLinear' coef(object, all.coef = FALSE, ...) ## S3 method for class 'ridgeLinear' plot(x, y = NULL, ...) ## S3 method for class 'ridgeLinear' predict(object, newdata, na.action = na.pass, all.coef = FALSE, ...) ## S3 method for class 'ridgeLinear' print(x, all.coef = FALSE, ...) ## S3 method for class 'ridgeLinear' summary(object, all.coef = FALSE, ...) ## S3 method for class 'summary.ridgeLinear' print(x, digits = max(3, getOption("digits") - 3), signif.stars = getOption("show.signif.stars"), ...)
formula |
a formula expression as for regression models, of the form |
data |
an optional data frame in which to interpret the variables occuring in |
lambda |
A ridge regression parameter. May be a vector. If |
nPCs |
The number of principal components to use to choose the ridge regression parameter, following the method of
Cule et al (2012). It is not possible to specify both |
scaling |
The method to be used to scale the predictors. One of
|
object |
A ridgeLinear object, typically generated by a call to |
newdata |
An optional data frame in which to look for variables with which to predict. If omitted, the fitted values are used. |
na.action |
function determining what should be done with missing values
in |
all.coef |
Logical. Should results be returned for all ridge regression penalty
parameters ( |
x |
An object of class |
y |
Dummy argument for compatibility with the default |
digits |
minimum number of significant digits to be used for most numbers |
signif.stars |
logical; if |
... |
Additional arguments to be passed to or from other methods. |
If an intercept is present in the model, its coefficient is not penalised. If you want to penalise an intercept, put in your own constant term and remove the intercept.
An object of class "ridgeLinear"
, with components:
automatic |
Logical. Was |
call |
The matched call. |
coef |
A named vector of fitted coefficients. |
df |
A vector of degrees of freedom of the model fit, degrees of freedom for variance, and residual degrees of freedom of the fitted model. |
Inter |
Was an intercept included? |
isScaled |
Were the predictors scaled before the model was fitted? |
lambda |
The ridge regression parameter(s). |
scales |
The scales used to standardize the predictors. |
terms |
The |
x |
The scaled predictor matrix. |
xm |
A vector of means of the predictors. |
y |
The response. |
ym |
The mean of the response. |
And optionally the components
max.nPCs |
The maximum number of principal components for which a ridge regression parameter was computed. |
chosen.nPCs |
The number of principal components used to compute the ridge parameter. |
Erika Cule
A semi-automatic method to guide the choice of ridge parameter in ridge regression. Cule, E. and De Iorio, M. (2012) arXiv:1205.0686v1 [stat.AP]
data(GenCont) mod <- linearRidge(Phenotypes ~ ., data = as.data.frame(GenCont)) summary(mod)
data(GenCont) mod <- linearRidge(Phenotypes ~ ., data = as.data.frame(GenCont)) summary(mod)
Fits linear ridge regression models for genome-wide SNP data. The SNP genotypes are not read into R but file names are passed the code directly, enabling the analysis of genome-wide scale SNP data sets.
linearRidgeGenotypes(genotypesfilename, phenotypesfilename, lambda = -1, thinfilename = NULL, betafilename = NULL, approxfilename = NULL, permfilename = NULL, intercept = TRUE, verbose = FALSE)
linearRidgeGenotypes(genotypesfilename, phenotypesfilename, lambda = -1, thinfilename = NULL, betafilename = NULL, approxfilename = NULL, permfilename = NULL, intercept = TRUE, verbose = FALSE)
genotypesfilename |
character string: path to file containing SNP genotypes coded 0, 1,
2. See |
phenotypesfilename |
character string: path to file containing phenotypes. See |
lambda |
(optional) shrinkage parameter. If not provided, the default denotes automatic choice of the shrinkage parameter using the method of Cule & De Iorio (2012). |
thinfilename |
(optional) character string: path to file containing three columns: SNP name, chromosme and SNP psotion. See |
betafilename |
(optional) character string: path to file where the output will be written. See |
approxfilename |
(optional) character string: path to fine where the approximate test p-values will be written.
Approximate p-values are not computed unless this argument is given. Approximate p-values
are computed using the method of Cule et al (2011). See |
permfilename |
(optional) character string: path to file where the permutation test
p-values will be written.
Permutation test p-values are not computed unless this argument is
given. (See warning). See |
intercept |
Logical: Should the ridge regression model be fitted with an
intercept? (Defaults to |
verbose |
Logical: If |
If a file thin
is supplied, and the shrinkage parameter
lambda
is being computed automatically based on the data, then
this file is used to thin the SNP data by SNP position. If this file
is not supplied, SNPs are thinned automatically based on number of SNPs.
The vector of fitted ridge regression coefficients.
If betafilename
is given, the fitted coefficients are written to this
file as well as being returned.
If approxfilename
and/or permfilename
are given, results of approximate
test p-values and/or permutation test p-values are written to the files
given in their arguments.
A header row, plus one row for each individual, one SNP per column. The header row contains SNP names. SNPs are coded as 0, 1, 2 for minor allele count. Missing values are not accommodated. Invariant SNPs in the data cause an error, please remove these from the file before calling the function.
A single column of phenotypes with the individuals in the same order as those in the file genotypesfilename
.
(optional) Three columns and the same number of rows as there are SNPs in the file genotypesfilename
, one row per SNP. First column: SNP names (must match names in genotypesfilename
); second column: chromosome; third column: SNP position in BP.
All output files are optional. Whether or not betafilename
is provided, fitted coefficients are returned to the R workshpace. If betafilename
is provided, fitted coefficients are written to the file specified (in addition).
Two columns: First column is SNP names in same order as in genotypesfilename
, second column is fitted coefficients. If intercept = TRUE
(the default) then the first row is the fitted intercept (with the name Intercept in the first column).
Two columns: First column is SNP names in same order as in genotypesfilename
, second column is approximate p-values.
Two columns: First column is SNP names in same order as in genotypesfilename
, second column is permutation p-values.
When data are large, the permutation test p-values
may take a very long time to compute. It is recommended not to request
permutation test p-values (using the argument permfilename
)
when data are large.
Erika Cule
Significance testing in ridge regression for genetic data. Cule, E. et al (2011) BMC Bioinformatics, 12:372 A semi-automatic method to guide the choice of ridge parameter in ridge regression. Cule, E. and De Iorio, M. (2012) arXiv:1205.0686v1 [stat.AP]
linearRidge
for fitting linear ridge regression models
when the data are small enough to be read into R.
logisticRidge
and logisticRidgeGenotypes
for fitting logistic ridge
regression models.
## Not run: genotypesfile <- system.file("extdata","GenCont_genotypes.txt",package = "ridge") phenotypesfile <- system.file("extdata","GenCont_phenotypes.txt",package = "ridge") beta_linearRidgeGenotypes <- linearRidgeGenotypes(genotypesfilename = genotypesfile, phenotypesfilename = phenotypesfile) ## compare to output of linearRidge data(GenCont) ## Same data as in GenCont_genotypes.txt and GenCont_phenotypes.txt beta_linearRidge <- linearRidge(Phenotypes ~ ., data = as.data.frame(GenCont)) cbind(round(coef(beta_linearRidge), 6), beta_linearRidgeGenotypes) ## End(Not run)
## Not run: genotypesfile <- system.file("extdata","GenCont_genotypes.txt",package = "ridge") phenotypesfile <- system.file("extdata","GenCont_phenotypes.txt",package = "ridge") beta_linearRidgeGenotypes <- linearRidgeGenotypes(genotypesfilename = genotypesfile, phenotypesfilename = phenotypesfile) ## compare to output of linearRidge data(GenCont) ## Same data as in GenCont_genotypes.txt and GenCont_phenotypes.txt beta_linearRidge <- linearRidge(Phenotypes ~ ., data = as.data.frame(GenCont)) cbind(round(coef(beta_linearRidge), 6), beta_linearRidgeGenotypes) ## End(Not run)
Predict phenotypes from genome-wide SNP data based on a file of coefficients. Genotypes and fitted coefficients are provided as filenames, allowing the computation of fitted probabilities when SNP data are too large to be read into R.
linearRidgeGenotypesPredict(genotypesfilename, betafilename, phenotypesfilename = NULL, verbose = FALSE)
linearRidgeGenotypesPredict(genotypesfilename, betafilename, phenotypesfilename = NULL, verbose = FALSE)
genotypesfilename |
character string: path to file containing SNP genotypes coded 0, 1,
2. See |
betafilename |
character string: path to file containing fitted coefficients. See |
phenotypesfilename |
(optional) character string: path to file in which to write out the
predicted phenotypes. See |
verbose |
Logical: If |
A vector of fitted values, the same length as the number of
individuals whose data are in genotypesfilename
. If
phenotypesfilename
is supplied, the fitted values are also
written there.
A header row, plus one row for each individual, one SNP per column. The header row contains SNP names. SNPs are coded as 0, 1, 2 for minor allele count. Missing values are not accommodated.
Two columns: First column is SNP names in same order as in genotypesfilename
, second column is fitted coefficients. If the coefficients include an intercept then the first row of betafilename
should contain it with the name Intercept in the first column. An Intercept thus labelled will be used appropriately in predicting the phenotypes. SNP names must match those in genotypesfilename
.
The format of betafilename
is
that of the output of linearRidgeGenotypes
, meaning
linearRidgeGenotypesPredict
can be used to predict using
coefficients fitted using linearRidgeGenotypes
(see the example).
Whether or not phenotypesfilename
is provided, predicted phenotypes are returned to the R workshpace. If phenotypesfilename
is provided, predicted phenotypes are written to the file specified (in addition).
One column, containing predicted phenotypes, one individual per row.
Erika Cule
A semi-automatic method to guide the choice of ridge parameter in ridge regression. Cule, E. and De Iorio, M. (2012) arXiv:1205.0686v1 [stat.AP]
linearRidgeGenotypes
for model
fitting. logisticRidgeGenotypes
and
logisticRidgeGenotypesPredict
for corresponding functions
to fit and predict on SNP data with binary outcomes.
## Not run: genotypesfile <- system.file("extdata","GenCont_genotypes.txt",package = "ridge") phenotypesfile <- system.file("extdata","GenCont_phenotypes.txt",package = "ridge") betafile <- tempfile(pattern = "beta", fileext = ".dat") beta_linearRidgeGenotypes <- linearRidgeGenotypes(genotypesfilename = genotypesfile, phenotypesfilename = phenotypesfile, betafilename = betafile) pred_phen_geno <- linearRidgeGenotypesPredict(genotypesfilename = genotypesfile, betafilename = betafile) ## compare to output of linearRidge data(GenCont) ## Same data as in GenCont_genotypes.txt and GenCont_phenotypes.txt beta_linearRidge <- linearRidge(Phenotypes ~ ., data = as.data.frame(GenCont)) pred_phen <- predict(beta_linearRidge) print(cbind(pred_phen_geno, pred_phen)) ## Delete the temporary betafile unlink(betafile) ## End(Not run)
## Not run: genotypesfile <- system.file("extdata","GenCont_genotypes.txt",package = "ridge") phenotypesfile <- system.file("extdata","GenCont_phenotypes.txt",package = "ridge") betafile <- tempfile(pattern = "beta", fileext = ".dat") beta_linearRidgeGenotypes <- linearRidgeGenotypes(genotypesfilename = genotypesfile, phenotypesfilename = phenotypesfile, betafilename = betafile) pred_phen_geno <- linearRidgeGenotypesPredict(genotypesfilename = genotypesfile, betafilename = betafile) ## compare to output of linearRidge data(GenCont) ## Same data as in GenCont_genotypes.txt and GenCont_phenotypes.txt beta_linearRidge <- linearRidge(Phenotypes ~ ., data = as.data.frame(GenCont)) pred_phen <- predict(beta_linearRidge) print(cbind(pred_phen_geno, pred_phen)) ## Delete the temporary betafile unlink(betafile) ## End(Not run)
Fits a logistic ridge regression model. Optionally, the ridge regression parameter is chosen automatically using the method proposed by Cule et al (2012).
logisticRidge(formula, data, lambda = "automatic", nPCs = NULL, scaling = c("corrForm", "scale", "none"), ...) ## S3 method for class 'ridgeLogistic' coef(object, all.coef = FALSE, ...) ## S3 method for class 'ridgeLogistic' plot(x, y = NULL, ...) ## S3 method for class 'ridgeLogistic' predict(object, newdata = NULL, type = c("link", "response"), na.action = na.pass, all.coef = FALSE, ...) ## S3 method for class 'ridgeLogistic' print(x, all.coef = FALSE, ...) ## S3 method for class 'ridgeLogistic' summary(object, all.coef = FALSE, ...) ## S3 method for class 'summary.ridgeLogistic' print(x, digits = max(3, getOption("digits") - 3), signif.stars = getOption("show.signif.stars"), ...)
logisticRidge(formula, data, lambda = "automatic", nPCs = NULL, scaling = c("corrForm", "scale", "none"), ...) ## S3 method for class 'ridgeLogistic' coef(object, all.coef = FALSE, ...) ## S3 method for class 'ridgeLogistic' plot(x, y = NULL, ...) ## S3 method for class 'ridgeLogistic' predict(object, newdata = NULL, type = c("link", "response"), na.action = na.pass, all.coef = FALSE, ...) ## S3 method for class 'ridgeLogistic' print(x, all.coef = FALSE, ...) ## S3 method for class 'ridgeLogistic' summary(object, all.coef = FALSE, ...) ## S3 method for class 'summary.ridgeLogistic' print(x, digits = max(3, getOption("digits") - 3), signif.stars = getOption("show.signif.stars"), ...)
formula |
a formula expression as for regression models, of the form |
data |
an optional data frame in which to interpret the variables occuring in |
lambda |
A ridge regression parameter. If |
nPCs |
The number of principal components to use to choose the ridge regression parameter, following the method of
Cule et al (2012). It is not possible to specify both |
scaling |
The method to be used to scale the predictors. One of
|
object |
A ridgeLogistic object, typically generated by a call to |
newdata |
An optional data frame in which to look for variables with which to predict. If omitted, the fitted values are used. |
type |
the type of prediction required. The default predictions are of log-odds
(probabilities on logit scale) and |
na.action |
function determining what should be done with missing values
in |
all.coef |
Logical. Should results be returned for all ridge regression penalty
parameters ( |
x |
An object of class |
y |
Dummy argument for compatibility with the default |
digits |
minimum number of significant digits to be used for most numbers |
signif.stars |
logical; if |
... |
Additional arguments to be passed to or from other methods. |
If an intercept is present in the model, its coefficient is not penalised. If you want to penalise an intercept, put in your own constant term and remove the intercept.
An object of class "ridgeLogistic"
, with components:
automatic |
Was |
call |
The matched call. |
coef |
A named vector of fitted coefficients. |
df |
A vector of degrees of freedom of the model fit and degrees of freedom for variance. |
Inter |
Was in antercept included? |
isScaled |
Were the predictors scaled before the model was fitted? |
lambda |
The ridge regression parameter. |
scales |
The scales used to standardize the predictors. |
terms |
The |
x |
The scaled predictor matrix. |
xm |
A vector of means of the predictors. |
y |
The response. |
And optionally the component
nPCs |
The number of principal components used to compute the ridge regression parameter. |
Erika Cule
A semi-automatic method to guide the choice of ridge parameter in ridge regression. Cule, E. and De Iorio, M. (2012) arXiv:1205.0686v1 [stat.AP]
data(GenBin) mod <- logisticRidge(Phenotypes ~ ., data = as.data.frame(GenBin)) summary(mod)
data(GenBin) mod <- logisticRidge(Phenotypes ~ ., data = as.data.frame(GenBin)) summary(mod)
Fits logistic ridge regression models for genome-wide SNP data. The SNP genotypes are not read into R but file names are passed to the code directly, enabling the analysis of genome-wide SNP data sets which are too big to be read into R.
logisticRidgeGenotypes(genotypesfilename, phenotypesfilename, lambda = -1, thinfilename = NULL, betafilename = NULL, approxfilename = NULL, permfilename = NULL, intercept = TRUE, verbose = FALSE)
logisticRidgeGenotypes(genotypesfilename, phenotypesfilename, lambda = -1, thinfilename = NULL, betafilename = NULL, approxfilename = NULL, permfilename = NULL, intercept = TRUE, verbose = FALSE)
genotypesfilename |
character string: path to file containing SNP genotypes coded 0, 1,
2. See |
phenotypesfilename |
character string: path to file containing phenotypes. See |
lambda |
(optional) shrinkage parameter. If not provided, the default denotes automatic choice of the shrinkage parameter using the method of Cule & De Iorio (2012). |
thinfilename |
(optional) character string: path to file containing three columns: SNP name, chromosme and SNP psotion. See |
betafilename |
(optional) character string: path to file where the output will be written. See |
approxfilename |
(optional) character string: path to fine where the approximate test p-values will be written.
Approximate p-values are not computed unless this argument is given. Approximate p-values
are computed using the method of Cule et al (2011). See |
permfilename |
(optional) character string: path to file where the permutation test
p-values will be written.
Permutation test p-values are not computed unless this argument is
given. (See warning). See |
intercept |
Logical: Should the ridge regression model be fitted with an
intercept? Defaults to |
verbose |
Logical: If |
If a file thin
is supplied, and the shrinkage parameter
lambda
is being computed automatically based on the data, then
this file is used to thin the SNP data by SNP position. If this file
is not supplied, SNPs are thinned automatically based on number of SNPs.
The vector of fitted ridge regression coefficients.
If betafilename
is given, the fitted coefficients are written to this
file as well as being returned.
If approxfilename
and/or permfilename
are given, results of approximate
test p-values and/or permutation test p-values are written to the files
given in their arguments.
A header row, plus one row for each individual, one SNP per column. The header row contains SNP names. SNPs are coded as 0, 1, 2 for minor allele count. Missing values are not accommodated. Invariant SNPs in the data cause an error, please remove these from the file before calling the function.
A single column of phenotypes with the individuals in the same order as those in the file genotypesfilename
. Phenotypes must be coded as 0 or 1.
(optional) Three columns and the same number of rows as there are SNPs in the file genotypesfilename
, one row per SNP. First column: SNP names (must match names in genotypesfilename
); second column: chromosome; third column: SNP position in BP.
All output files are optional. Whether or not betafilename
is provided, fitted coefficients are returned to the R workshpace. If betafilename
is provided, fitted coefficients are written to the file specified (in addition).
Two columns: First column is SNP names in same order as in genotypesfilename
, second column is fitted coefficients. If intercept = TRUE
(the default) then the first row is the fitted intercept (with the name Intercept in the first column).
Two columns: First column is SNP names in same order as in genotypesfilename
, second column is approximate p-values.
Two columns: First column is SNP names in same order as in genotypesfilename
, second column is permutation p-values.
When data are large, the permutation test p-values
may take a very long time to compute. It is recommended not to request
permutation test p-values (using the argument permfilename
)
when data are large.
Erika Cule
Significance testing in ridge regression for genetic data. Cule, E. et al (2011) BMC Bioinformatics, 12:372 A semi-automatic method to guide the choice of ridge parameter in ridge regression. Cule, E. and De Iorio, M. (2012) arXiv:1205.0686v1 [stat.AP]
logisticRidge
for fitting logistic ridge regression models
when the data are small enough to be read into R.
linearRidge
and linearRidgeGenotypes
for fitting linear ridge
regression models.
## Not run: genotypesfile <- system.file("extdata","GenBin_genotypes.txt",package = "ridge") phenotypesfile <- system.file("extdata","GenBin_phenotypes.txt",package = "ridge") beta_logisticRidgeGenotypes <- logisticRidgeGenotypes(genotypesfilename = genotypesfile, phenotypesfilename = phenotypesfile) ## compare to output of logisticRidge data(GenBin) ## Same data as in GenBin_genotypes.txt and GenBin_phenotypes.txt beta_logisticRidge <- logisticRidge(Phenotypes ~ ., data = as.data.frame(GenBin)) cbind(round(coef(beta_logisticRidge), 6), beta_logisticRidgeGenotypes) ## End(Not run)
## Not run: genotypesfile <- system.file("extdata","GenBin_genotypes.txt",package = "ridge") phenotypesfile <- system.file("extdata","GenBin_phenotypes.txt",package = "ridge") beta_logisticRidgeGenotypes <- logisticRidgeGenotypes(genotypesfilename = genotypesfile, phenotypesfilename = phenotypesfile) ## compare to output of logisticRidge data(GenBin) ## Same data as in GenBin_genotypes.txt and GenBin_phenotypes.txt beta_logisticRidge <- logisticRidge(Phenotypes ~ ., data = as.data.frame(GenBin)) cbind(round(coef(beta_logisticRidge), 6), beta_logisticRidgeGenotypes) ## End(Not run)
Predict fitted probabilities from genome-wide SNP data based on a file of coefficients. Genotypes and fitted coefficients are provided as filenames, allowing the computation of fitted probabilities when SNP data are too large to be read into R.
logisticRidgeGenotypesPredict(genotypesfilename, betafilename, phenotypesfilename = NULL, verbose = FALSE)
logisticRidgeGenotypesPredict(genotypesfilename, betafilename, phenotypesfilename = NULL, verbose = FALSE)
genotypesfilename |
character string: path to file containing SNP genotypes coded 0, 1,
2. See |
betafilename |
character string: path to file containing fitted coefficients. See |
phenotypesfilename |
(optional) character string: path to file in which to write out the
fitted probabilities. See |
verbose |
Logical: If |
A vector of fitted probabilities, the same length as the number of
individuals whose data are in genotypesfilename
. If
phenotypesfilename
is supplied, the fitted probabilities are also
written there.
A header row, plus one row for each individual, one SNP per column. The header row contains SNP names. SNPs are coded as 0, 1, 2 for minor allele count. Missing values are not accommodated.
Two columns: First column is SNP names in same order as in genotypesfilename
, second column is fitted coefficients. If the coefficients include an intercept then the first row of betafilename
should contain it with the name Intercept in the first column. An Intercept thus labelled will be used appropriately in predicting the phenotypes. SNP names must match those in genotypesfilename
.
The format of betafilename
is
that of the output of linearRidgeGenotypes
, meaning
linearRidgeGenotypesPredict
can be used to predict using
coefficients fitted using linearRidgeGenotypes
(see the example).
Whether or not phenotypesfilename
is provided, fitted probabilities are returned to the R workshpace. If phenotypesfilename
is provided, fitted probabilities are written to the file specified (in addition).
One column, containing fitted probabilities, one individual per row.
Erika Cule
A semi-automatic method to guide the choice of ridge parameter in ridge regression. Cule, E. and De Iorio, M. (2012) arXiv:1205.0686v1 [stat.AP]
logisticRidgeGenotypes
for model
fitting. linearRidgeGenotypes
and
linearRidgeGenotypesPredict
for corresponding functions
to fit and predict on SNP data with continuous outcomes.
## Not run: genotypesfile <- system.file("extdata","GenBin_genotypes.txt",package = "ridge") phenotypesfile <- system.file("extdata","GenBin_phenotypes.txt",package = "ridge") betafile <- tempfile(pattern = "beta", fileext = ".dat") beta_logisticRidgeGenotypes <- logisticRidgeGenotypes(genotypesfilename = genotypesfile, phenotypesfilename = phenotypesfile, betafilename = betafile) pred_phen_geno <- logisticRidgeGenotypesPredict(genotypesfilename = genotypesfile, betafilename = betafile) ## compare to output of logisticRidge data(GenBin) ## Same data as in GenBin_genotypes.txt and GenBin_phenotypes.txt beta_logisticRidge <- logisticRidge(Phenotypes ~ ., data = as.data.frame(GenBin)) pred_phen <- predict(beta_logisticRidge, type="response") print(cbind(pred_phen_geno, pred_phen)) ## Delete the temporary betafile unlink(betafile) ## End(Not run)
## Not run: genotypesfile <- system.file("extdata","GenBin_genotypes.txt",package = "ridge") phenotypesfile <- system.file("extdata","GenBin_phenotypes.txt",package = "ridge") betafile <- tempfile(pattern = "beta", fileext = ".dat") beta_logisticRidgeGenotypes <- logisticRidgeGenotypes(genotypesfilename = genotypesfile, phenotypesfilename = phenotypesfile, betafilename = betafile) pred_phen_geno <- logisticRidgeGenotypesPredict(genotypesfilename = genotypesfile, betafilename = betafile) ## compare to output of logisticRidge data(GenBin) ## Same data as in GenBin_genotypes.txt and GenBin_phenotypes.txt beta_logisticRidge <- logisticRidge(Phenotypes ~ ., data = as.data.frame(GenBin)) pred_phen <- predict(beta_logisticRidge, type="response") print(cbind(pred_phen_geno, pred_phen)) ## Delete the temporary betafile unlink(betafile) ## End(Not run)
Functions for computing, printing and plotting p-values for ridgeLinear and ridgeLogistic models. The p-values are computed using the significance test of Cule et al (2011).
pvals(x, ...) ## S3 method for class 'ridgeLinear' pvals(x, ...) ## S3 method for class 'ridgeLogistic' pvals(x, ...) ## S3 method for class 'pvalsRidgeLinear' print(x, digits = max(3, getOption("digits") - 3), signif.stars = getOption("show.signif.stars"), all.coef = FALSE, ...) ## S3 method for class 'pvalsRidgeLogistic' print(x, digits = max(3, getOption("digits") - 3), signif.stars = getOption("show.signif.stars"), all.coef = FALSE, ...) ## S3 method for class 'pvalsRidgeLinear' plot(x, y = NULL, ...) ## S3 method for class 'pvalsRidgeLogistic' plot(x, y = NULL, ...)
pvals(x, ...) ## S3 method for class 'ridgeLinear' pvals(x, ...) ## S3 method for class 'ridgeLogistic' pvals(x, ...) ## S3 method for class 'pvalsRidgeLinear' print(x, digits = max(3, getOption("digits") - 3), signif.stars = getOption("show.signif.stars"), all.coef = FALSE, ...) ## S3 method for class 'pvalsRidgeLogistic' print(x, digits = max(3, getOption("digits") - 3), signif.stars = getOption("show.signif.stars"), all.coef = FALSE, ...) ## S3 method for class 'pvalsRidgeLinear' plot(x, y = NULL, ...) ## S3 method for class 'pvalsRidgeLogistic' plot(x, y = NULL, ...)
x |
For the pvals methods, an object of class "ridgeLinear" or "ridgeLogistic", typically from a call to "linearRidge" or "logisticRidge". For the print and plot methods, an object of class "pvalsRidgeLinear" or "pvalsRidgeLogistic", typically from a call to "pvals". |
digits |
minimum number of significant digits to be used for most numbers |
signif.stars |
logical; if |
all.coef |
Logical. Should p-values for all the ridge regression parameters be printed, or only the one from the ridge parameter chosen using the method of Cule et al (2012) |
y |
Dummy argument for compatibility with the default |
... |
further arguments to be passed to or from other methods |
Standard errors, test statistics and p-values are computed using coefficients and data on the scale that was used to fit them. If the coefficients were standardized before the model was fitted, then the p-values relate to the scaled data.
For the pvals methods, an object of class "pvalsRidgeLinear" or "pvalsRidgeLogistic" which is a list with elements
coef |
The (scaled) regression coefficients |
se |
The standard errors of the regression coefficients |
tstat |
The test statistic of the regression coefficients |
pval |
The p-values of the regression coefficients |
isScaled |
Were the data scaled before the regression coefficients were fitted? |
For the print methods, the argument x
is returned invisibly.
Erika Cule
Significance testing in ridge regression for genetic data. Cule, E. et al (2011) BMC Bioinformatics, 12:372
linearRidge
, logisticRidge
data(GenBin) mod <- logisticRidge(Phenotypes ~ ., data = as.data.frame(GenBin)) pvalsMod <- pvals(mod) print(pvalsMod) print(pvalsMod, all.coef = TRUE) plot(pvalsMod)
data(GenBin) mod <- logisticRidge(Phenotypes ~ ., data = as.data.frame(GenBin)) pvalsMod <- pvals(mod) print(pvalsMod) print(pvalsMod, all.coef = TRUE) plot(pvalsMod)
Additionally includes special functions for genome-wide single-nucleotide polymorphism (SNP) data.