Title: | Prediction with Less Overfitting and Robust to Noise |
---|---|
Description: | A method for the quantitative prediction with much predictors. This package provides functions to construct the quantitative prediction model with less overfitting and robust to noise. |
Authors: | Takahiko Koizumi, Kenta Suzuki, Yasunori Ichihashi |
Maintainer: | Takahiko Koizumi <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.1.1 |
Built: | 2024-11-20 06:45:21 UTC |
Source: | CRAN |
Clean data by eliminating predictors with many missing values
p.clean(x, missing = 0.1, lowest = 10)
p.clean(x, missing = 0.1, lowest = 10)
x |
A data matrix (raw: samples, col: predictors). |
missing |
A ratio of missing values in each column allowed to be remained in the data. |
lowest |
The lowest value recognized in the data. |
A data matrix (raw: samples, col: qualified predictors)
Takahiko Koizumi
data(Pinus) train.raw <- Pinus$train ncol(train.raw) train <- p.clean(train.raw) ncol(train)
data(Pinus) train.raw <- Pinus$train ncol(train.raw) train <- p.clean(train.raw) ncol(train)
Estimate the optimal number of predictors to construct PLORN model
p.opt(x, y, range = 5:50, method = "linear", rep = 1)
p.opt(x, y, range = 5:50, method = "linear", rep = 1)
x |
A data matrix (row: samples, col: predictors). |
y |
A vector of an environment in which the samples were collected. |
range |
A sequence of numbers of predictors to be tested for MAE calculation (default: 5:50). |
method |
A string to specify the method of regression for calculating R-squared values. "linear" (default), "quadratic" or "cubic" regression model can be specified. |
rep |
The number of replications for each case set by range (default: 1). |
A sample-MAE curve
Takahiko Koizumi
data(Pinus) train <- p.clean(Pinus$train) target <- Pinus$target p.opt(train[1:10, ], target[1:10], range = 5:15)
data(Pinus) train <- p.clean(Pinus$train) target <- Pinus$target p.opt(train[1:10, ], target[1:10], range = 5:15)
Visualize predictors using principal coordinate analysis
p.pca(x, y, method = "linear", lower.thr = 0, n.pred = ncol(x), size = 1)
p.pca(x, y, method = "linear", lower.thr = 0, n.pred = ncol(x), size = 1)
x |
A data matrix (row: samples, col: predictors). |
y |
A vector of an environment in which the samples were collected. |
method |
A string to specify the method of regression for calculating R-squared values. "linear" (default), "quadratic" or "cubic" regression model can be specified. |
lower.thr |
The lower threshold of R-squared value to be indicated in a PCA plot (default: 0). |
n.pred |
The number of candidate predictors for PLORN model to be indicated in a PCA plot (default: ncol(x)). |
size |
The size of symbols in a PCA plot (default: 1). |
A PCA plot
Takahiko Koizumi
data(Pinus) train <- p.clean(Pinus$train) target <- Pinus$target p.pca(train, target)
data(Pinus) train <- p.clean(Pinus$train) target <- Pinus$target p.pca(train, target)
Visualize R-squared value distribution in predictor-environment interaction
p.rank( x, y, method = "linear", lower.thr = 0, n.pred = ncol(x), upper.xlim = ncol(x) )
p.rank( x, y, method = "linear", lower.thr = 0, n.pred = ncol(x), upper.xlim = ncol(x) )
x |
A data matrix (row: samples, col: predictors). |
y |
A vector of an environment in which the samples were collected. |
method |
A string to specify the method of regression for calculating R-squared values. "linear" (default), "quadratic" or "cubic" regression model can be specified. |
lower.thr |
The lower threshold of R-squared value to be included in PLORN model (default: 0). |
n.pred |
The number of predictors to be included in PLORN model (default: ncol(x)). |
upper.xlim |
The upper limitation of x axis (i.e., the number of predictors) in the resulted figure (default: ncol(x)). |
A rank order plot
Takahiko Koizumi
data(Pinus) train <- p.clean(Pinus$train) target <- Pinus$target train <- p.sort(train, target) p.rank(train, target)
data(Pinus) train <- p.clean(Pinus$train) target <- Pinus$target train <- p.sort(train, target) p.rank(train, target)
Sort and truncate predictors according to the strength of predictor-environment interaction
p.sort(x, y, method = "linear", n.pred = ncol(x), trunc = 1)
p.sort(x, y, method = "linear", n.pred = ncol(x), trunc = 1)
x |
A data matrix (raw: samples, col: predictors). |
y |
A vector of an environment in which the samples were collected. |
method |
A string to specify the method of regression for calculating R-squared values. "linear" (default), "quadratic" or "cubic" regression model can be specified. |
n.pred |
The number of predictors to be included in PLORN model (default: ncol(x)). |
trunc |
a threshold to be truncated (default: 1). |
A data matrix (raw: samples, col: sorted predictors)
Takahiko Koizumi
data(Pinus) train <- p.clean(Pinus$train) target <- Pinus$target cor(target, train[, 1]) train <- p.sort(train, target, trunc = 0.5) cor(target, train[, 1])
data(Pinus) train <- p.clean(Pinus$train) target <- Pinus$target cor(target, train[, 1]) train <- p.sort(train, target, trunc = 0.5) cor(target, train[, 1])
This dataset gives the TPM values of 200 selected genes obtained from 60 Pinus root samples (30 samples each for training and test data) under a temperature gradient, generated by RNA-seq.
Pinus
Pinus
A gene expression data matrix of 30 root samples of P. thunbergii under five temperature conditions (8, 13, 18, 23, 28 °C) with six biological replicates is in the first element of the list.
A gene expression data matrix of another 30 root samples of P. thunbergii under the same condition is in the second one.
Temperature conditions where 30 root samples in each data matrix were generated are in the third one.
Gene expressions are normalized in the TPM value.
original (not published)
original (not published)
Construct and apply the PLORN model with your own data
plorn(x, y, newx = x, method = "linear", lower.thr = 0, n.pred = 0)
plorn(x, y, newx = x, method = "linear", lower.thr = 0, n.pred = 0)
x |
A data matrix (row: samples, col: predictors). |
y |
A vector of an environment in which the samples were collected. |
newx |
A data matrix (row: samples, col: predictors). |
method |
A string to specify the method of regression for calculating R-squared values. "linear" (default), "quadratic" or "cubic" regression model can be specified. |
lower.thr |
The lower threshold of R-squared value to be used in PLORN model (default: 0). |
n.pred |
The number of candidate predictors to be used in PLORN model (default: 30). |
A vector of the environment in which the samples of newx were collected
Takahiko Koizumi
data(Pinus) train <- p.clean(Pinus$train) test <- Pinus$test test <- test[, colnames(train)] target <- Pinus$target cor(target, plorn(train, target, newx = test, method = "cubic"))
data(Pinus) train <- p.clean(Pinus$train) test <- Pinus$test test <- test[, colnames(train)] target <- Pinus$target cor(target, plorn(train, target, newx = test, method = "cubic"))