Package 'A3' reference manual

Package 'A3'

Title:	Accurate, Adaptable, and Accessible Error Metrics for Predictive Models
Description:	Supplies tools for tabulating and analyzing the results of predictive models. The methods employed are applicable to virtually any predictive model and make comparisons between different methodologies straightforward.
Authors:	Scott Fortmann-Roe
Maintainer:	Scott Fortmann-Roe <[email protected]>
License:	GPL (>= 2)
Version:	1.0.0
Built:	2024-10-31 06:22:45 UTC
Source:	CRAN

Title:

Accurate, Adaptable, and Accessible Error Metrics for Predictive Models

Description:

Supplies tools for tabulating and analyzing the results of predictive models. The methods employed are applicable to virtually any predictive model and make comparisons between different methodologies straightforward.

Authors:

Scott Fortmann-Roe

Maintainer:

Scott Fortmann-Roe <[email protected]>

License:

GPL (>= 2)

Version:

1.0.0

Built:

2024-10-31 06:22:45 UTC

Source:

CRAN

Help Index

A3 Error Metrics for Predictive Models

Description

A package for the generation of accurate, accessible, and adaptable error metrics for developing high quality predictions and inferences. The name A3 (pronounced "A-Cubed") comes from the combination of the first letters of these three primary adjectives.

Details

The overarching purpose of the outputs and tools in this package are to make the accurate assessment of model errors more accessible to a wider audience. Furthermore, a standardized set of reporting features are provided by this package which create consistent outputs for virtually any predictive model. This makes it straightforward to compare, for instance, a linear regression model to more exotic techniques such as Random forests or Support vector machines.

The standard outputs for each model fit provided by the A3 package include:

Average Slope: Equivalent to a linear regression coefficient.
Cross Validated $R^2$ : Robust calculation of $R^2$ (percent of squared error explained by the model compared to the null model) values adjusting for over-fitting.
p Values: Robust calculation of p-values requiring no parametric assumptions other than independence between observations (which may be violated if compensated for).

The primary functions that will be used are a3 for arbitrary modeling functions and a3.lm for linear models. This package also includes print.A3 and plot.A3 for outputting the A3 results.

Author(s)

Scott Fortmann-Roe [email protected] http://Scott.Fortmann-Roe.com

A3 Results for Arbitrary Model

Description

This function calculates the A3 results for an arbitrary model construction algorithm (e.g. Linear Regressions, Support Vector Machines or Random Forests). For linear regression models, you may use the a3.lm convenience function.

Usage

a3(formula, data, model.fn, model.args = list(), ...)
a3(formula, data, model.fn, model.args = list(), ...)

Arguments

`formula`	the regression formula.
`data`	a data frame containing the data to be used in the model fit.
`model.fn`	the function to be used to build the model.
`model.args`	a list of arguments passed to `model.fn`.
`...`	additional arguments passed to `a3.base`.

Value

S3 A3 object; see a3.base for details

References

Scott Fortmann-Roe (2015). Consistent and Clear Reporting of Results from Diverse Modeling Techniques: The A3 Method. Journal of Statistical Software, 66(7), 1-23. <http://www.jstatsoft.org/v66/i07/>

Examples


 ## Standard linear regression results:

 summary(lm(rating ~ ., attitude))

 ## A3 Results for a Linear Regression model:

 # In practice, p.acc should be <= 0.01 in order
 # to obtain finer grained p values.

 a3(rating ~ ., attitude, lm, p.acc = 0.1)


 ## A3 Results for a Random Forest model:

 # It is important to include the "+0" in the formula
 # to eliminate the constant term.

 require(randomForest)
 a3(rating ~ .+0, attitude, randomForest, p.acc = 0.1)

 # Set the ntrees argument of the randomForest function to 100

 a3(rating ~ .+0, attitude, randomForest, p.acc = 0.1, model.args = list(ntree = 100))

 # Speed up the calculation by doing 5-fold cross-validation.
 # This is faster and more conservative (i.e. it should over-estimate error)

 a3(rating ~ .+0, attitude, randomForest, n.folds = 5, p.acc = 0.1)

 # Use Leave One Out Cross Validation. The least biased approach,
 # but, for large data sets, potentially very slow.

 a3(rating ~ .+0, attitude, randomForest, n.folds = 0, p.acc = 0.1)

 ## Use a Support Vector Machine algorithm.

 # Just calculate the slopes and R^2 values, do not calculate p values.

 require(e1071)
 a3(rating ~ .+0, attitude, svm, p.acc = NULL)
 
## Standard linear regression results:

 summary(lm(rating ~ ., attitude))

 ## A3 Results for a Linear Regression model:

 # In practice, p.acc should be <= 0.01 in order
 # to obtain finer grained p values.

 a3(rating ~ ., attitude, lm, p.acc = 0.1)


 ## A3 Results for a Random Forest model:

 # It is important to include the "+0" in the formula
 # to eliminate the constant term.

 require(randomForest)
 a3(rating ~ .+0, attitude, randomForest, p.acc = 0.1)

 # Set the ntrees argument of the randomForest function to 100

 a3(rating ~ .+0, attitude, randomForest, p.acc = 0.1, model.args = list(ntree = 100))

 # Speed up the calculation by doing 5-fold cross-validation.
 # This is faster and more conservative (i.e. it should over-estimate error)

 a3(rating ~ .+0, attitude, randomForest, n.folds = 5, p.acc = 0.1)

 # Use Leave One Out Cross Validation. The least biased approach,
 # but, for large data sets, potentially very slow.

 a3(rating ~ .+0, attitude, randomForest, n.folds = 0, p.acc = 0.1)

 ## Use a Support Vector Machine algorithm.

 # Just calculate the slopes and R^2 values, do not calculate p values.

 require(e1071)
 a3(rating ~ .+0, attitude, svm, p.acc = NULL)

Base A3 Results Calculation

Description

This function calculates the A3 results. Generally this function is not called directly. It is simpler to use a3 (for arbitrary models) or a3.lm (specifically for linear regressions).

Usage

a3.base(formula, data, model.fn, simulate.fn, n.folds = 10,
  data.generating.fn = replicate(ncol(x), a3.gen.default), p.acc = 0.01,
  features = TRUE, slope.sample = NULL, slope.displacement = 1)
a3.base(formula, data, model.fn, simulate.fn, n.folds = 10,
  data.generating.fn = replicate(ncol(x), a3.gen.default), p.acc = 0.01,
  features = TRUE, slope.sample = NULL, slope.displacement = 1)

Arguments

`formula`	the regression formula.
`data`	a data frame containing the data to be used in the model fit.
`model.fn`	function used to generate a model.
`simulate.fn`	function used to create the model and generate predictions.
`n.folds`	the number of folds used for cross-validation. Set to 0 to use Leave One Out Cross Validation.
`data.generating.fn`	the function used to generate stochastic noise for calculation of exact p values.
`p.acc`	the desired accuracy for the calculation of exact p values. The entire calculation process will be repeated $1/p.acc$ times so this can have a dramatic affect on time required. Set to `NULL` to disable the calculation of p values.
`features`	whether to calculate the average slopes, added $R^2$ and p values for each of the features in addition to the overall model.
`slope.sample`	if not NULL the sample size for use to calculate the average slopes (useful for very large data sets).
`slope.displacement`	the amount of displacement to take in calculating the slopes. May be a single number in which case the same slope is applied to all features. May also be a named vector where there is a name for each feature.

Value

S3 A3 object containing:

`model.R2`	The cross validated $R^2$ for the entire model.
`feature.R2`	The cross validated $R^2$ 's for the features (if calculated).
`model.p`	The p value for the entire model (if calculated).
`feature.p`	The p value for the features (if calculated).
`all.R2`	The $R^2$ 's for the model features, and any stochastic simulations for calculating exact p values.
`observed`	The observed response for each observation.
`predicted`	The predicted response for each observation.
`slopes`	Average slopes for each of the features (if calculated).
`all.slopes`	Slopes for each of the observations for each of the features (if calculated).
`table`	The A3 results table.

Stochastic Data Generators

Description

The stochastic data generators generate stochastic noise with (if specified correctly) the same properties as the observed data. By replicating the stochastic properties of the original data, we are able to obtain the exact calculation of p values.

Usage

a3.gen.default(x, n.reps)
a3.gen.default(x, n.reps)

Arguments

`x`	the original (observed) data series.
`n.reps`	the number of stochastic repetitions to generate.

Details

Generally these will not be called directly but will instead be passed to the data.generating.fn argument of a3.base.

Value

A list of of length n.reps of vectors of stochastic noise. There are a number of different methods of generating noise:

`a3.gen.default`	The default data generator. Uses `a3.gen.bootstrap`.
`a3.gen.resample`	Reorders the original data series.
`a3.gen.bootstrap`	Resamples the original data series with replacement.
`a3.gen.normal`	Calculates the mean and standard deviation of the original series and generates a new series with that distribution.
`a3.gen.autocor`	Assumesa first order autocorrelation of the original series and generates a new series with the same properties.

Examples


 # Calculate the A3 results assuming an auto-correlated set of observations.
 # In usage p.acc should be <=0.01 in order to obtain more accurate p values.

 a3.lm(rating ~ ., attitude, p.acc = 0.1,
   data.generating.fn = replicate(ncol(attitude), a3.gen.autocor))
 

 ## A general illustration:

 # Take x as a sample set of observations for a feature
 x <- c(0.349, 1.845, 2.287, 1.921, 0.803, 0.855, 2.368, 3.023, 2.102, 4.648)

 # Generate three stochastic data series with the same autocorrelation properties as x
 rand.x <- a3.gen.autocor(x, 3)

 plot(x, type="l")
 for(i in 1:3) lines(rand.x[[i]], lwd = 0.2)
# Calculate the A3 results assuming an auto-correlated set of observations.
 # In usage p.acc should be <=0.01 in order to obtain more accurate p values.

 a3.lm(rating ~ ., attitude, p.acc = 0.1,
   data.generating.fn = replicate(ncol(attitude), a3.gen.autocor))
 

 ## A general illustration:

 # Take x as a sample set of observations for a feature
 x <- c(0.349, 1.845, 2.287, 1.921, 0.803, 0.855, 2.368, 3.023, 2.102, 4.648)

 # Generate three stochastic data series with the same autocorrelation properties as x
 rand.x <- a3.gen.autocor(x, 3)

 plot(x, type="l")
 for(i in 1:3) lines(rand.x[[i]], lwd = 0.2)

A3 for Linear Regressions

Description

This convenience function calculates the A3 results specifically for linear regressions. It uses R's glm function and so supports logistic regressions and other link functions using the family argument. For other forms of models you may use the more general a3 function.

Usage

a3.lm(formula, data, family = gaussian, ...)
a3.lm(formula, data, family = gaussian, ...)

Arguments

`formula`	the regression formula.
`data`	a data frame containing the data to be used in the model fit.
`family`	the regression family. Typically 'gaussian' for linear regressions.
`...`	additional arguments passed to `a3.base`.

Value

S3 A3 object; see a3.base for details

Examples


 ## Standard linear regression results:

 summary(lm(rating ~ ., attitude))

 ## A3 linear regression results:

 # In practice, p.acc should be <= 0.01 in order
 # to obtain fine grained p values.

 a3.lm(rating ~ ., attitude, p.acc = 0.1)

 # This is equivalent both to:

 a3(rating ~ ., attitude, glm, model.args = list(family = gaussian), p.acc = 0.1)

 # and also to:

 a3(rating ~ ., attitude, lm, p.acc = 0.1)
 
## Standard linear regression results:

 summary(lm(rating ~ ., attitude))

 ## A3 linear regression results:

 # In practice, p.acc should be <= 0.01 in order
 # to obtain fine grained p values.

 a3.lm(rating ~ ., attitude, p.acc = 0.1)

 # This is equivalent both to:

 a3(rating ~ ., attitude, glm, model.args = list(family = gaussian), p.acc = 0.1)

 # and also to:

 a3(rating ~ ., attitude, lm, p.acc = 0.1)

Cross-Validated $R^2$

Description

Applies cross validation to obtain the cross-validated $R^2$ for a model: the fraction of the squared error explained by the model compared to the null model (which is defined as the average response). A pseudo $R^2$ is implemented for classification.

Usage

a3.r2(y, x, simulate.fn, cv.folds)
a3.r2(y, x, simulate.fn, cv.folds)

Arguments

`y`	a vector or responses.
`x`	a matrix of features.
`simulate.fn`	a function object that creates a model and predicts y.
`cv.folds`	the cross-validation folds.

Value

A list comprising of the following elements:

`R2`	the cross-validated $R^2$
`predicted`	the predicted responses
`observed`	the observed responses

Boston Housing Prices

Description

A dataset containing the prices of houses in the Boston region and a number of features. The dataset and the following description is based on that provided by UCI Machine Learning Repository (http://archive.ics.uci.edu/ml/datasets/Housing).

Usage

data(housing)
data(housing)

Details

CRIME: Per capita crime rate by town
ZN: Proportion of residential land zoned for lots over 25,000 sq.ft.
INDUS: Proportion of non-retail business acres per town
CHAS: Charles River dummy variable (= 1 if tract bounds river; 0 otherwise)
NOX: Nitrogen oxides pollutant concentration (parts per 10 million)
ROOMS: Average number of rooms per dwelling
AGE: Proportion of owner-occupied units built prior to 1940
DISTANCE: Weighted distances to five Boston employment centres
HIGHWAY: Index of accessibility to radial highways
TAX: Full-value property-tax rate per ten thousand dollar
PUPIL.TEACHER: Pupil-teacher ratio by town
MINORITY: 1000(Bk - 0.63)^2 where Bk is the proportion of blacks by town
LSTAT: Percent lower status of the population
MED.VALUE: Median value of owner-occupied homes in thousands of dollars

References

Frank, A. & Asuncion, A. (2010). UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science.

Harrison, D. and Rubinfeld, D.L. Hedonic prices and the demand for clean air, J. Environ. Economics & Management, vol.5, 81-102, 1978.

Ecosystem Multifunctionality

Description

This dataset relates multifunctionality to a number of different biotic and abiotic features in a global survey of drylands. The dataset was obtained from (http://www.sciencemag.org/content/335/6065/214/suppl/DC1). The dataset contains the features listed below.

Usage

data(multifunctionality)
data(multifunctionality)

Details

ELE: Elevation of the site
LAT & LONG: Location of the site
SLO: Site slope
SAC: Soil sand content
PCA_C1, PCA_C2, PCA_C3, PCA_C4: Principal components of a set of 21 climatic features
SR: Species richness
MUL: Multifunctionality

References

Maestre, F. T., Quero, J. L., Gotelli, N. J., Escudero, A., Ochoa, V., Delgado-Baquerizo, M., et al. (2012). Plant Species Richness and Ecosystem Multifunctionality in Global Drylands. Science, 335(6065), 214-218. doi:10.1126/science.1215442

Plot A3 Results

Description

Plots an 'A3' object results. Displays predicted versus observed values for each observation along with the distribution of slopes measured for each feature.

Usage

## S3 method for class 'A3'
plot(x, ...)
## S3 method for class 'A3'
plot(x, ...)

Arguments

`x`	an A3 object.
`...`	additional options provided to `plotPredictions`, `plotSlopes` and `plot` functions.

Examples


 data(housing)
 res <- a3.lm(MED.VALUE ~ NOX + ROOMS + AGE + HIGHWAY + PUPIL.TEACHER, housing, p.acc = NULL)
 plot(res)
 
data(housing)
 res <- a3.lm(MED.VALUE ~ NOX + ROOMS + AGE + HIGHWAY + PUPIL.TEACHER, housing, p.acc = NULL)
 plot(res)

Plot Predicted versus Observed

Description

Plots an 'A3' object's values showing the predicted versus observed values for each observation.

Usage

plotPredictions(x, show.equality = TRUE, xlab = "Observed Value",
  ylab = "Predicted Value", main = "Predicted vs Observed", ...)
plotPredictions(x, show.equality = TRUE, xlab = "Observed Value",
  ylab = "Predicted Value", main = "Predicted vs Observed", ...)

Arguments

`x`	an A3 object,
`show.equality`	if true plot a line at 45-degrees.
`xlab`	the x-axis label.
`ylab`	the y-axis label.
`main`	the plot title.
`...`	additional options provided to the `plot` function.

Examples

data(multifunctionality)
 x <- a3.lm(MUL ~ ., multifunctionality, p.acc = NULL, features = FALSE)
 plotPredictions(x)
data(multifunctionality)
 x <- a3.lm(MUL ~ ., multifunctionality, p.acc = NULL, features = FALSE)
 plotPredictions(x)

Plot Distribution of Slopes

Description

Plots an 'A3' object's distribution of slopes for each feature and observation. Uses Kernel Density Estimation to create an estimate of the distribution of slopes for a feature.

Usage

plotSlopes(x, ...)
plotSlopes(x, ...)

Arguments

`x`	an A3 object.
`...`	additional options provided to the `plot` and `density` functions.

Examples


 require(randomForest)
 data(housing)

 x <- a3(MED.VALUE ~ NOX + PUPIL.TEACHER + ROOMS + AGE + HIGHWAY + 0,
   housing, randomForest, p.acc = NULL, n.folds = 2)

 plotSlopes(x)
 
require(randomForest)
 data(housing)

 x <- a3(MED.VALUE ~ NOX + PUPIL.TEACHER + ROOMS + AGE + HIGHWAY + 0,
   housing, randomForest, p.acc = NULL, n.folds = 2)

 plotSlopes(x)

Print Fit Results

Description

Prints an 'A3' object results table.

Usage

## S3 method for class 'A3'
print(x, ...)
## S3 method for class 'A3'
print(x, ...)

Arguments

`x`	an A3 object.
`...`	additional arguments passed to the `print` function.

Examples

x <- a3.lm(rating ~ ., attitude, p.acc = NULL)
 print(x)
x <- a3.lm(rating ~ ., attitude, p.acc = NULL)
 print(x)

Nicely Formatted Fit Results

Description

Creates a LaTeX table of results. Depends on the xtable package.

Usage

## S3 method for class 'A3'
xtable(x, ...)
## S3 method for class 'A3'
xtable(x, ...)

Arguments

`x`	an A3 object.
`...`	additional arguments passed to the `print.xtable` function.

Examples

x <- a3.lm(rating ~ ., attitude, p.acc = NULL)
 xtable(x)
x <- a3.lm(rating ~ ., attitude, p.acc = NULL)
 xtable(x)

Package 'A3'

Help Index

A3 Error Metrics for Predictive Models

Description

Details

Author(s)

A3 Results for Arbitrary Model

Description

Usage

Arguments

Value

References

Examples

Base A3 Results Calculation

Description

Usage

Arguments

Value

Stochastic Data Generators

Description

Usage

Arguments

Details

Value

Examples

A3 for Linear Regressions

Description

Usage

Arguments

Value

Examples

Cross-Validated R2R^2R2

Description

Usage

Arguments

Value

Boston Housing Prices

Description

Usage

Details

References

Ecosystem Multifunctionality

Description

Usage

Details

References

Plot A3 Results

Description

Usage

Arguments

Examples

Plot Predicted versus Observed

Description

Usage

Arguments

Examples

Plot Distribution of Slopes

Description

Usage

Arguments

Examples

Print Fit Results

Description

Usage

Arguments

Examples

Nicely Formatted Fit Results

Description

Usage

Arguments

Examples

Cross-Validated $R^2$