Package 'dbglm' reference manual

Title:	Generalised Linear Models by Subsampling and One-Step Polishing
Description:	Fast fitting of generalised linear models on moderately large datasets, by taking an initial sample, fitting in memory, then evaluating the score function for the full data in the database. Thomas Lumley <doi:10.1080/10618600.2019.1610312>.
Authors:	Thomas Lumley [aut, cph], Shangqing Cao [ctb, cre]
Maintainer:	Shangqing Cao <caoalbert@g.ucla.edu>
License:	MIT + file LICENSE
Version:	1.0.0
Built:	2025-03-09 06:56:52 UTC
Source:	CRAN

Fast generalized linear model in a database

Description

Fast generalized linear model in a database

Usage

dbglm(formula, family = binomial(), tbl, sd = FALSE,
weights = .NotYetImplemented(), subset = .NotYetImplemented(), ...)
dbglm(formula, family = binomial(), tbl, sd = FALSE,
weights = .NotYetImplemented(), subset = .NotYetImplemented(), ...)

Arguments

`...`	This argument is required for S3 method extension.
`formula`	A model formula. It can have interactions but cannot have any transformations except `factor`
`family`	Model family
`tbl`	An object inheriting from `tbl`. Will typically be a database-backed lazy `tbl` from the `dbplyr` package.
`sd`	Experimental: compute the standard deviation of the score as well as the mean in the update and use it to improve the information matrix estimate
`weights`	We don't support weights
`subset`	If you want to analyze a subset, use `filter()` on the data

Details

For a dataset of size N the subsample is of size N^(5/9). Unless N is large the approximation won't be very good. Also, with small N it's quite likely that, eg, some factor levels will be missing in the subsample.

Value

A list with elements

`tildebeta`	coefficients from subsample
`hatbeta`	final estimate
`tildeV`	variance matrix from subsample
`hatV`	final estimate

References

http://notstatschat.tumblr.com/post/171570186286/faster-generalised-linear-models-in-largeish-data

Data of vehicles registered in New Zealand as of November 2017

Description

Data of vehicles registered in New Zealand as of November 2017

Usage

data(fleet1)
data(fleet1)

Format

A tibble with 10000 rows and 34 variables:

basic_colour: chracter colour of the car
power_rating: numeric horsepower of the car
gross_vehicle_mass: numeric mass of the vehicle in kg
number_of_seats: numeric number of seats in the car

Source

https://nzta.govt.nz/resources/new-zealand-motor-vehicle-register-statistics/new-zealand-vehicle-fleet-open-data-sets/

Package 'dbglm'

Help Index

Fast generalized linear model in a database

Description

Usage

Arguments

Details

Value

References

Data of vehicles registered in New Zealand as of November 2017

Description

Usage

Format

Source