Package 'bigReg' reference manual

Title:	Generalized Linear Models (GLM) for Large Data Sets
Description:	Allows the user to carry out GLM on very large data sets. Data can be created using the data_frame() function and appended to the object with object$append(data); data_frame and data_matrix objects are available that allow the user to store large data on disk. The data is stored as doubles in binary format and any character columns are transformed to factors and then stored as numeric (binary) data while a look-up table is stored in a separate .meta_data file in the same folder. The data is stored in blocks and GLM regression algorithm is modified and carries out a MapReduce- like algorithm to fit the model. The functions bglm(), and summary() and bglm_predict() are available for creating and post-processing of models. The library requires Armadillo installed on your system. It may not function on windows since multi-core processing is done using mclapply() which forks R on Unix/Linux type operating systems.
Authors:	Chibisi Chima-Okereke <chibisi@active-analytics.com>
Maintainer:	Chibisi Chima-Okereke <chibisi@active-analytics.com>
License:	GPL (>= 2)
Version:	0.1.5
Built:	2025-03-06 06:50:07 UTC
Source:	CRAN

Function for creating control parameters for the GLM fit

Description

Function for creating control parameters for the GLM fit

Usage

.control(epsilon = 1e-08, maxit = 25, trace = TRUE)
.control(epsilon = 1e-08, maxit = 25, trace = TRUE)

Arguments

`epsilon`	defaults to 1E-8
`maxit`	defaults 25 maximum number of iterations
`trace`	defaults to TRUE

converts numeric vector to integer

Description

converts numeric vector to integer

Usage

asInteger(x)
asInteger(x)

Arguments

`x`	numeric vector

Function to carry out generalized linear regression on a data_frame data object

Description

Function to carry out generalized linear regression on a data_frame data object

Usage

bglm(
  formula,
  family = gaussian_(),
  data,
  weights = NULL,
  offset = NULL,
  start = NULL,
  control = list(),
  etastart = NULL,
  mustart = NULL
)
bglm(
  formula,
  family = gaussian_(),
  data,
  weights = NULL,
  offset = NULL,
  start = NULL,
  control = list(),
  etastart = NULL,
  mustart = NULL
)

Arguments

`formula`	formula that defines your regression model
`family`	family object from activeReg, e.g. .gaussian(), .binomial(), .poisson(), .quasipoisson(), .quasibinomial(), .Gamma(), .inverse.gaussian(), .quasi()
`data`	data_frame object containing data for linear regression
`weights`	weights for the model
`offset`	offsets for the model
`start`	starting values for the linear predictor
`control`	list of parameters for .control() function
`etastart`	starting values for the linear predictor
`mustart`	starting values for vector of means

Examples

require(parallel)
data("plasma", package = "bigReg")
data_dir = tempdir()
plasma1 <- plasma
plasma1 <- data_frame(plasma1, 10, path = data_dir, nCores = 1)
plasma_glm <- bglm(ESR ~ fibrinogen + globulin, data = plasma1, family = binomial_("logit"))
summary(plasma_glm)

require(parallel)
data("plasma", package = "bigReg")
data_dir = tempdir()
plasma1 <- plasma
plasma1 <- data_frame(plasma1, 10, path = data_dir, nCores = 1)
plasma_glm <- bglm(ESR ~ fibrinogen + globulin, data = plasma1, family = binomial_("logit"))
summary(plasma_glm)

predict function for bglm object

Description

predict function for bglm object

Usage

bglm_predict(
  mf = stop("mf: model frame must be supplied"),
  object = stop("object: bglm object must be supplied"),
  type = stop("type: either \"link\", \"response\", \"terms\"")
)
bglm_predict(
  mf = stop("mf: model frame must be supplied"),
  object = stop("object: bglm object must be supplied"),
  type = stop("type: either \"link\", \"response\", \"terms\"")
)

Arguments

`mf`	model frame
`object`	a bglm object
`type`	one of c("link", "response", "terms")

binomial family function

Description

binomial family function

Usage

binomial_(link = "logit")
binomial_(link = "logit")

Arguments

link

function character

Function to carry out linear regression on a data_frame data object

Description

Function to carry out linear regression on a data_frame data object

Usage

blm(
  formula = stop("formula: not supplied"),
  data = stop("data: data not supplied"),
  control = list(),
  weights = NULL,
  offset = NULL
)
blm(
  formula = stop("formula: not supplied"),
  data = stop("data: data not supplied"),
  control = list(),
  weights = NULL,
  offset = NULL
)

Arguments

`formula`	formula that defines your regression model
`data`	data_frame object containing data for linear regression
`control`	list of parameters for control() function
`weights`	weights for the model
`offset`	offsets for the model

creates factor from numeric vector and character vector as levels

Description

The CreateFactor function creates a factor from a numeric vector and a character vector for levels

Usage

CreateFactor(x, levels)
CreateFactor(x, levels)

Arguments

`x`	numeric vector containing the numeric indices of the levels
`levels`	character vector levels

function to create a data_frame object

Description

function to create a data_frame object. The data_frame object is an object that is held on disk. It is written to a folder path on disk where the data is written to in blocks or chunks. The data is written in binary format using a C++ function in purely numerical data and a mapping to the table is held in a ".meta_data" file in the folder. The table object accomodates numeric, factor, and character (converted to factor).

Usage

data_frame(
  data = stop("data must be supplied"),
  chunkSize = stop("chunkSize must be specified, a good number is 50000"),
  path = stop("path must be specified"),
  nCores = parallel::detectCores(),
  ...
)
data_frame(
  data = stop("data must be supplied"),
  chunkSize = stop("chunkSize must be specified, a good number is 50000"),
  path = stop("path must be specified"),
  nCores = parallel::detectCores(),
  ...
)

Arguments

`data`	data.frame object to be converted into a data_frame object
`chunkSize`	number of rows to be used in each chunk
`path`	character to folder where the object will be created
`nCores`	the number of cores to use defaults to parallel::detectCores()
`...`	not currently used.

Details

Creates a data_frame object

Examples

irisA <- data_frame(iris[1:75,], 10, "irisA", nCores = 1)
irisA$append(iris[76:150,])
irisA$head()
irisA$tail(10)
irisA$delete(); rm(irisA)
irisA <- data_frame(iris[1:75,], 10, "irisA", nCores = 1)
irisA$append(iris[76:150,])
irisA$head()
irisA$tail(10)
irisA$delete(); rm(irisA)

function to create a data_frame object

Description

function to create a data_matrix object. The data_matrix object is an object that is held on disk. It is written to a folder path on disk where the data is written to in blocks or chunks. The data is written in binary format using a C++ function in purely numerical data.

Usage

data_matrix(
  data = stop("data: matrix must be supplied"),
  chunkSize = stop("chunkSize must be specified, a good number is 50000"),
  path = stop("path must be specified"),
  nCores = parallel::detectCores(),
  ...
)
data_matrix(
  data = stop("data: matrix must be supplied"),
  chunkSize = stop("chunkSize must be specified, a good number is 50000"),
  path = stop("path must be specified"),
  nCores = parallel::detectCores(),
  ...
)

Arguments

`data`	object to be converted into a data_matrix object
`chunkSize`	number of rows to be used in each chunk
`path`	character to folder where the object will be created
`nCores`	the number of cores to use defaults to parallel::detectCores()
`...`	not used at the moment

Details

Creates a data_matrix object

family function

Description

family function

Usage

family_(distr, link)
family_(distr, link)

Arguments

`distr`	distr character one of "binomial", "poisson", "gaussian", "quasipoisson", "quasibinomial", "Gamma", "inverse.gaussian", "quasi"
`link`	function character

Gamma family function

Description

Gamma family function

Usage

Gamma_(link = "inverse")
Gamma_(link = "inverse")

Arguments

link

function character

gaussian family function

Description

gaussian family function

Usage

gaussian_(link = "identity")
gaussian_(link = "identity")

Arguments

link

function character

inverse.gaussian family function

Description

inverse.gaussian family function

Usage

inverse.gaussian_(link = "1/mu^2")
inverse.gaussian_(link = "1/mu^2")

Arguments

link

function character

function to load data_frame object

Description

function to load data_frame object

Usage

load_data_frame(path = stop("path: to data_frame folder must be supplied"))
load_data_frame(path = stop("path: to data_frame folder must be supplied"))

Arguments

path

character to folder containing object

function to load data_frame object

Description

function to load data_frame object

Usage

load_data_matrix(path = stop("path: to data_matrix folder must be supplied"))
load_data_matrix(path = stop("path: to data_matrix folder must be supplied"))

Arguments

path

character to folder containing object

finds whether x is in y

Description

finds whether x is in y

Usage

myIn(x, y)
myIn(x, y)

Arguments

`x`	item to be sought
`y`	vector to be matched against

mySeq function to sequence integers

Description

a function to create a sequence of integers

Usage

mySeq(start, end)
mySeq(start, end)

Arguments

`start`	integer from where sequence should start
`end`	integer where sequence should end

plasma data from the HSAUR package

Description

Dataset from the HSAUR package

Usage

data(plasma)
data(plasma)

Format

a data.frame

Details

...

Source

HSAUR package

References

HSAUR R package (HSAUR package)

Examples

data(plasma)
head(plasma)
data(plasma)
head(plasma)

poisson family function

Description

poisson family function

Usage

poisson_(link = "log")
poisson_(link = "log")

Arguments

link

function character

print function for the bglm object

Description

print function for the bglm object

Usage

## S3 method for class 'bglm'
print(x, digits = max(3L, getOption("digits") - 3L), ...)
## S3 method for class 'bglm'
print(x, digits = max(3L, getOption("digits") - 3L), ...)

Arguments

`x`	bglm object to be displayed
`digits`	number of significant digits to use
`...`	not yet used

print function for the blm object

Description

print function for the blm object

Usage

## S3 method for class 'blm'
print(x, digits = max(3L, getOption("digits") - 3L), ...)
## S3 method for class 'blm'
print(x, digits = max(3L, getOption("digits") - 3L), ...)

Arguments

`x`	blm object to be displayed
`digits`	number of significant digits to use
`...`	not yet used

print function for a data_frame

Description

print function for a data_frame

Usage

## S3 method for class 'data_frame'
print(x, ...)
## S3 method for class 'data_frame'
print(x, ...)

Arguments

`x`	data_frame object to print
`...`	not used

print function for a data_matrix

Description

print function for a data_matrix

Usage

## S3 method for class 'data_matrix'
print(x, ...)
## S3 method for class 'data_matrix'
print(x, ...)

Arguments

`x`	data_matrix object to print
`...`	not used

Function to print the summary object from the bglm object

Description

Function to print the summary object from the bglm object

Usage

## S3 method for class 'summary.bglm'
print(
  x,
  digits = max(3L, getOption("digits") - 3L),
  signif.stars = getOption("show.signif.stars"),
  ...
)
## S3 method for class 'summary.bglm'
print(
  x,
  digits = max(3L, getOption("digits") - 3L),
  signif.stars = getOption("show.signif.stars"),
  ...
)

Arguments

`x`	summary blm object
`digits`	- the digits to be displayed
`signif.stars`	passed to printCoefmat
`...`	arguments passed to `printCoefmat()` function

Function to print the summary object from the blm object

Description

Function to print the summary object from the blm object

Usage

## S3 method for class 'summary.blm'
print(
  x,
  digits = max(3L, getOption("digits") - 3L),
  signif.stars = getOption("show.signif.stars"),
  ...
)
## S3 method for class 'summary.blm'
print(
  x,
  digits = max(3L, getOption("digits") - 3L),
  signif.stars = getOption("show.signif.stars"),
  ...
)

Arguments

`x`	summary blm object
`digits`	- the digits to be displayed
`signif.stars`	passed to printCoefmat
`...`	arguments passed to `printCoefmat()` function

Function to print the summary object from the blm object

Description

Function to print the summary object from the blm object

Usage

process_bglm_block(
  mf,
  formula,
  mmCall,
  family,
  offset,
  weights,
  start,
  niter,
  etastart,
  mustart
)
process_bglm_block(
  mf,
  formula,
  mmCall,
  family,
  offset,
  weights,
  start,
  niter,
  etastart,
  mustart
)

Arguments

`mf`	the data block to be processed
`formula`	the formula of for the model
`mmCall`	the call object of the model
`family`	the family object for the model
`offset`	the model offset
`weights`	the model weights
`start`	the starting coefficient estimates
`niter`	the current number of iterations
`etastart`	the start for eta
`mustart`	the start for mu

quasi family function

Description

quasi family function

Usage

quasi_(link = "identity", variance = "constant")
quasi_(link = "identity", variance = "constant")

Arguments

`link`	function character
`variance`	choice character

quasibinomial family function

Description

quasibinomial family function

Usage

quasibinomial_(link = "logit")
quasibinomial_(link = "logit")

Arguments

link

function character

quasipoisson family function

Description

quasipoisson family function

Usage

quasipoisson_(link = "log")
quasipoisson_(link = "log")

Arguments

link

function character

row binding for benchmarking ...

Description

row binding for benchmarking

Usage

r_bind(x, y)
r_bind(x, y)

Arguments

`x`	first matrix to be bound together
`y`	second matrix to be bound together

read data frame block from file

Description

read data frame block from file

Usage

read_df_block(size, filePath, df, ncol, factors, factor_indices)
read_df_block(size, filePath, df, ncol, factors, factor_indices)

Arguments

`size`	number of elements in the block
`filePath`	path to where the block is stored
`df`	an empty list having the same number of elements as columns in the table
`ncol`	number of columns in the dataframe block
`factors`	list containing factors
`factor_indices`	numeric vector containing the indicies that denote the factors

read multiple blocks of data frames from file

Description

read multiple blocks of data frames from file

Usage

read_df_blocks(size, filePaths, df, ncols, factors, factor_indices)
read_df_blocks(size, filePaths, df, ncols, factors, factor_indices)

Arguments

`size`	number of elements in each block
`filePaths`	path to where the blocks are stored
`df`	an empty list having the same number of elements as columns in the table
`ncols`	number of columns in the dataframe block
`factors`	list containing factors
`factor_indices`	numeric vector containing the indicies that denote the factors

read matrix block from file

Description

read matrix block from file

Usage

read_matrix_block(filePath, size, ncol)
read_matrix_block(filePath, size, ncol)

Arguments

`filePath`	path to file where matrix should be read from
`size`	total number of elements to be read
`ncol`	number of columns in the matrix

read matrix blocks from file

Description

read matrix blocks from file

Usage

read_matrix_blocks(filePaths, size, ncols)
read_matrix_blocks(filePaths, size, ncols)

Arguments

`filePaths`	file paths from where the matrix blocks will be read
`size`	numeric vector containing the number of elements in each block
`ncols`	number of columns in the matrix

reads numeric vector to file

Description

reads numeric vector to file

Usage

readNumericVector(size, filePath)
readNumericVector(size, filePath)

Arguments

`size`	the length of the numeric vector
`filePath`	dependent variable

The reduction function for the algorithm

Description

The reduction function for the algorithm

Usage

sum_bglm_block(x1, x2)
sum_bglm_block(x1, x2)

Arguments

`x1`	the first list object to be reduced
`x2`	the second list object to be reduced

summary function for the bglm object

Description

summary function for the bglm object

Usage

## S3 method for class 'bglm'
summary(object, ...)
## S3 method for class 'bglm'
summary(object, ...)

Arguments

`object`	bglm object to be summarized
`...`	not used

summary function for the blm object

Description

summary function for the blm object

Usage

## S3 method for class 'blm'
summary(object, ...)
## S3 method for class 'blm'
summary(object, ...)

Arguments

`object`	blm object to be summarized
`...`	not used

Singular value decomposition of the aggregated list from XWXMatrix(W) functions

Description

Singular value decomposition of the aggregated list from XWXMatrix(W) functions

Usage

SVD(out, epsilon)
SVD(out, epsilon)

Arguments

`out`	list containing requisite computed values
`epsilon`	either machine epsilon or user depermined epsilon

writes numeric vector to file

Description

writes numeric vector to file

Usage

write_numeric_vector(v, filePath)
write_numeric_vector(v, filePath)

Arguments

`v`	numeric vector to be written to file
`filePath`	path to file where the numeric vector should be written

writes numeric vector to file

Description

writes numeric vector to file

Usage

writeNumericVector(v, filePath)
writeNumericVector(v, filePath)

Arguments

`v`	numeric vector
`filePath`	dependent variable

Calculation of iterative regression components

Description

Calculation of iterative regression components

Usage

XWXMatrix(X, y)
XWXMatrix(X, y)

Arguments

`X`	design matrix
`y`	dependent variable

Calculation of iterative regression components

Description

Calculation of iterative regression components

Usage

XWXMatrixW(X, y, W)
XWXMatrixW(X, y, W)

Arguments

`X`	design matrix
`y`	dependent variable
`W`	weights

Package 'bigReg'

Help Index

Function for creating control parameters for the GLM fit

Description

Usage

Arguments

converts numeric vector to integer

Description

Usage

Arguments

Function to carry out generalized linear regression on a data_frame data object

Description

Usage

Arguments

Examples

predict function for bglm object

Description

Usage

Arguments

binomial family function

Description

Usage

Arguments

Function to carry out linear regression on a data_frame data object

Description

Usage

Arguments

creates factor from numeric vector and character vector as levels

Description

Usage

Arguments

function to create a data_frame object

Description

Usage

Arguments

Details

Examples

function to create a data_frame object

Description

Usage

Arguments

Details

family function

Description

Usage

Arguments

Gamma family function

Description

Usage

Arguments

gaussian family function

Description

Usage

Arguments

inverse.gaussian family function

Description

Usage

Arguments

function to load data_frame object

Description

Usage

Arguments

function to load data_frame object

Description

Usage

Arguments

finds whether x is in y

Description

Usage

Arguments

mySeq function to sequence integers

Description

Usage

Arguments

plasma data from the HSAUR package

Description

Usage

Format

Details

Source