Title: | Organizing Data in Hypercubes |
---|---|
Description: | Provides functions and methods for organizing data in hypercubes (i.e., a multi-dimensional cube). Cubes are generated from molten data frames. Each cube can be manipulated with five operations: rotation (change.dimensionOrder()), dicing and slicing (add.selection(), remove.selection()), drilling down (add.aggregation()), and rolling up (remove.aggregation()). |
Authors: | Michael Scholz |
Maintainer: | Michael Scholz <[email protected]> |
License: | GPL-3 |
Version: | 0.2.1 |
Built: | 2025-02-03 06:44:32 UTC |
Source: | CRAN |
This package provides methods for organizing data in a hypercube Each cube can be manipulated with five operations rotation (changeDimensionOrder), dicing and slicing (add.selection, remove.selection), drilling down (add.aggregation), and rolling up (remove.aggregation).
Package: | hypercube |
Type: | Package |
Version: | 0.2.1 |
Date: | 2020-02-27 |
License: | GPL-3 |
Depends: | R (>= 3.0), methods |
Michael Scholz [email protected]
# Simple example data("sales") cube = generateCube(sales, columns = list(time = c("month", "year"), location = c("state"), product = "product"), valueColumn = "amount") cube # More sophisticated example data("sales") cube = generateCube(sales, columns = list(time = c("month", "year"), location = c("state"), product = "product"), valueColumn = "amount") cube = add.selection(cube, criteria = list(state = c("AL", "TX"))) cube = add.aggregation(cube, dimensions = c("month", "year"), fun = "sum") cube df = as.data.frame(cube) df
# Simple example data("sales") cube = generateCube(sales, columns = list(time = c("month", "year"), location = c("state"), product = "product"), valueColumn = "amount") cube # More sophisticated example data("sales") cube = generateCube(sales, columns = list(time = c("month", "year"), location = c("state"), product = "product"), valueColumn = "amount") cube = add.selection(cube, criteria = list(state = c("AL", "TX"))) cube = add.aggregation(cube, dimensions = c("month", "year"), fun = "sum") cube df = as.data.frame(cube) df
This function adds a further aggregation to a hypercube. The cube itself will not be changed. The aggregation only affect the data that will be shown when printing the cube. Note that selection criteria will be applied before aggregating the data.
add.aggregation( x, dimensions, fun = c("sum", "min", "max", "prod", "mean", "median", "sd", "count") )
add.aggregation( x, dimensions, fun = c("sum", "min", "max", "prod", "mean", "median", "sd", "count") )
x |
Hypercube for which the selection criteria will be defined. |
dimensions |
A vector of dimensions that are used in the aggregation. |
fun |
The function that is used for aggregation. Possible functions are sum, prod, min, max, mean, median, sd, and count. |
Returns a Cube
object with the added aggregation.
Michael Scholz [email protected]
Cube
remove.aggregation
add.selection
data("sales") cube = generateCube(sales, columns = list(time = c("month", "year"), location = c("state"), product = "product"), valueColumn = "amount") cube = add.aggregation(cube, dimensions = c("month", "year"), fun = "sum") cube
data("sales") cube = generateCube(sales, columns = list(time = c("month", "year"), location = c("state"), product = "product"), valueColumn = "amount") cube = add.aggregation(cube, dimensions = c("month", "year"), fun = "sum") cube
This function adds further selection criteria to a hypercube. The cube itself will not be changed. The selection criteria only affect the data that will be shown when printing the cube. Note that selection criteria will be applied before aggregating the data.
add.selection(x, criteria)
add.selection(x, criteria)
x |
Hypercube for which the selection criteria will be defined. |
criteria |
A list of selection criteria. |
Returns a Cube
object with the added selection criteria.
Michael Scholz [email protected]
Cube
remove.selection
add.aggregation
data("sales") print(str(sales)) cube = generateCube(sales, columns = list(time = c("month", "year"), location = c("state"), product = "product"), valueColumn = "amount") cube = add.selection(cube, criteria = list(state = c("CA", "FL"))) cube cube = add.selection(cube, criteria = list(state = c("TX"))) cube
data("sales") print(str(sales)) cube = generateCube(sales, columns = list(time = c("month", "year"), location = c("state"), product = "product"), valueColumn = "amount") cube = add.selection(cube, criteria = list(state = c("CA", "FL"))) cube cube = add.selection(cube, criteria = list(state = c("TX"))) cube
Converts the actual view of a Cube
object to a data frame. All added selections and
aggregations will be regarded. Note that selection criteria will be applied before
aggregating the data.
## S3 method for class 'Cube' as.data.frame(x, row.names = NULL, optional = FALSE, ...)
## S3 method for class 'Cube' as.data.frame(x, row.names = NULL, optional = FALSE, ...)
x |
The |
row.names |
A character vector giving the row names for the data frame. |
optional |
Should setting row names and converting column names be optional? |
... |
Further parameters that are passed to |
A molten data frame
Michael Scholz [email protected]
data("sales") cube = generateCube(sales, columns = list(time = c("month", "year"), location = c("state"), product = "product"), valueColumn = "amount") cube = change.dimensionOrder(cube, dimensions = c("product", "month", "year", "state")) df = as.data.frame(cube) df
data("sales") cube = generateCube(sales, columns = list(time = c("month", "year"), location = c("state"), product = "product"), valueColumn = "amount") cube = change.dimensionOrder(cube, dimensions = c("product", "month", "year", "state")) df = as.data.frame(cube) df
Changes the order of the dimensions in a given cube
change.dimensionOrder(x, dimensions)
change.dimensionOrder(x, dimensions)
x |
Hypercube for which the dimensions should be re-ordered. |
dimensions |
Vector of dimensions. The order of the dimensions in this vector defines the order of the dimensions in the cube. |
Returns a Cube
object.
Michael Scholz [email protected]
data("sales") cube = generateCube(sales, columns = list(time = c("month", "year"), location = c("state"), product = "product"), valueColumn = "amount") cube = change.dimensionOrder(cube, dimensions = c("product", "month", "year", "state")) cube
data("sales") cube = generateCube(sales, columns = list(time = c("month", "year"), location = c("state"), product = "product"), valueColumn = "amount") cube = change.dimensionOrder(cube, dimensions = c("product", "month", "year", "state")) cube
"Cube"
Class "Cube"
data
(array) The data that are represented as hypercube.
structure
(list) The structure of the dimensions of the hypercube.
view
(list) Information about how to build a view for the hypercube. This information is stored in a list of Dimension-class
objects.
Objects can be created by calls of the form
new("Cube", ...)
. This S4 class describes Cube
objects.
Michael Scholz [email protected]
# show Cube definition showClass("Cube")
# show Cube definition showClass("Cube")
"Cube"
Class "Cube"
name
(character) The name of the dimension.
values
(vector) A vector of selected values for this dimension.
aggregation
(vector) A vector of aggregation functions that will be applied to this dimension.
Objects can be created by calls of the form
new("Dimension", ...)
. This S4 class describes Dimension
objects.
Michael Scholz [email protected]
# show Dimension definition showClass("Dimension")
# show Dimension definition showClass("Dimension")
This function generates a hypercube from a given dataframe. The dimensions of the hypercube correspond to a set of selected columns from the dataframe.
generateCube( data, columns, valueColumn, fun = c("sum", "min", "max", "prod", "mean", "median", "sd", "count") )
generateCube( data, columns, valueColumn, fun = c("sum", "min", "max", "prod", "mean", "median", "sd", "count") )
data |
A dataframe that is used as source for the hypercube. |
columns |
A vector of column names that will form the dimensions of the hypercube. |
valueColumn |
The name of the column that provides the values for the cells of the hypercube. |
fun |
Aggregation function for aggregating over those columns that do not correspond with any dimension of the hypercube. |
Returns a Cube
object.
Michael Scholz [email protected]
data("sales") cube = generateCube(sales, columns = list(time = c("month", "year"), location = c("state"), product = "product"), valueColumn = "amount")
data("sales") cube = generateCube(sales, columns = list(time = c("month", "year"), location = c("state"), product = "product"), valueColumn = "amount")
Calculates the importance values for all dimensions of the actual view of a Cube
object. All added selections and
aggregations will be regarded. Note that selection criteria will be applied before
aggregating the data.
importance(x)
importance(x)
x |
The |
Sparsity value
Michael Scholz [email protected]
data("sales") cube = generateCube(sales, columns = list(time = c("month", "year"), location = c("state"), product = "product"), valueColumn = "amount") importance(cube)
data("sales") cube = generateCube(sales, columns = list(time = c("month", "year"), location = c("state"), product = "product"), valueColumn = "amount") importance(cube)
Generates a parallel coordinate plot for a given Cube
object. All added selections and aggregations will be
regarded.
## S4 method for signature 'Cube' plot(x, color = NA, colorscale = "RdBu", ...)
## S4 method for signature 'Cube' plot(x, color = NA, colorscale = "RdBu", ...)
x |
The |
color |
The color of the lines in the parallel coordinate plot. If this parameter is NA or NULL, a colorscale rather than a unique color will be used. |
colorscale |
The colorscale for the lines in the parallel coordinate plot. Default is RdBu. All plotly colorscales (e.g., Blackbody, Earth, Jet) are possible. |
... |
Further plot_ly parameters. |
Michael Scholz [email protected]
data("sales") cube = generateCube(sales, columns = list(time = c("month", "year"), location = c("state"), product = "product"), valueColumn = "amount") plot(cube)
data("sales") cube = generateCube(sales, columns = list(time = c("month", "year"), location = c("state"), product = "product"), valueColumn = "amount") plot(cube)
Prints an Importances
object.
## S3 method for class 'Importances' print(x, ...)
## S3 method for class 'Importances' print(x, ...)
x |
The |
... |
Ignored parameters. |
Sparsity value
Michael Scholz [email protected]
data("sales") cube = generateCube(sales, columns = list(time = c("month", "year"), location = c("state"), product = "product"), valueColumn = "amount") importances = importance(cube) print(importances)
data("sales") cube = generateCube(sales, columns = list(time = c("month", "year"), location = c("state"), product = "product"), valueColumn = "amount") importances = importance(cube) print(importances)
This function removes aggregations from a hypercube. The cube itself will not be changed. The aggregation only affect the data that will be shown when printing the cube.
remove.aggregation(x, dimensions = NA, last = FALSE)
remove.aggregation(x, dimensions = NA, last = FALSE)
x |
Hypercube from which the aggregation will be removed. |
dimensions |
A vector of dimensions for which the aggregations will be removed. |
last |
Should the last aggregation be removed? If this parameter is set TRUE, the dimension vector will be ignored. |
Returns a Cube
object with the added aggregation.
Michael Scholz [email protected]
Cube
add.aggregation
remove.selection
data("sales") cube = generateCube(sales, columns = list(time = c("month", "year"), location = c("state"), product = "product"), valueColumn = "amount") cube = add.aggregation(cube, dimensions = c("month", "year"), fun = "sum") cube cube = add.aggregation(cube, dimensions = "year", fun = "sum") cube cube = remove.aggregation(cube, dimensions = "year") cube
data("sales") cube = generateCube(sales, columns = list(time = c("month", "year"), location = c("state"), product = "product"), valueColumn = "amount") cube = add.aggregation(cube, dimensions = c("month", "year"), fun = "sum") cube cube = add.aggregation(cube, dimensions = "year", fun = "sum") cube cube = remove.aggregation(cube, dimensions = "year") cube
This function removes all selection criteria for the given dimensions. The cube itself will not be changed. The selection criteria only affect the data that will be shown when printing the cube.
remove.selection(x, dimensions)
remove.selection(x, dimensions)
x |
Hypercube for which the selection criteria will be defined. |
dimensions |
A vector of dimension names for which all selection criteria will be removed. |
Returns a Cube
object with removed selection criteria.
Michael Scholz [email protected]
Cube
add.selection
remove.aggregation
data("sales") print(str(sales)) cube = generateCube(sales, columns = list(time = c("month", "year"), location = c("state"), product = "product"), valueColumn = "amount") cube = add.selection(cube, criteria = list(state = c("CA", "FL"))) cube cube = remove.selection(cube, dimensions = c("state")) cube
data("sales") print(str(sales)) cube = generateCube(sales, columns = list(time = c("month", "year"), location = c("state"), product = "product"), valueColumn = "amount") cube = add.selection(cube, criteria = list(state = c("CA", "FL"))) cube cube = remove.selection(cube, dimensions = c("state")) cube
A dataset containing 2,500 sales of 4 books in different states and countries.
sales
sales
A data fram with 2500 rows and 7 variables:
month as number
year as number
abbreviation of the state as character
country as character
name of the product as character
number of sold products
amount of sales
Synthetic dataset
Shows the actual view of a Cube
object. All added selections and aggregations will be
regarded. Note that selection criteria will be applied before
aggregating the data.
## S4 method for signature 'Cube' show(object)
## S4 method for signature 'Cube' show(object)
object |
The |
Michael Scholz [email protected]
data("sales") cube = generateCube(sales, columns = list(time = c("month", "year"), location = c("state"), product = "product"), valueColumn = "amount") cube
data("sales") cube = generateCube(sales, columns = list(time = c("month", "year"), location = c("state"), product = "product"), valueColumn = "amount") cube
Dimension
objectShows a Dimension
object
## S4 method for signature 'Dimension' show(object)
## S4 method for signature 'Dimension' show(object)
object |
The |
Michael Scholz [email protected]
data("sales") cube = generateCube(sales, columns = list(time = c("month", "year"), location = c("state"), product = "product"), valueColumn = "amount") cube@view[[1]]
data("sales") cube = generateCube(sales, columns = list(time = c("month", "year"), location = c("state"), product = "product"), valueColumn = "amount") cube@view[[1]]
Calculates the sparsity of the actual view of a Cube
object. All added selections and
aggregations will be regarded. Note that selection criteria will be applied before
aggregating the data.
sparsity(x)
sparsity(x)
x |
The |
Sparsity value
Michael Scholz [email protected]
data("sales") cube = generateCube(sales, columns = list(time = c("month", "year"), location = c("state"), product = "product"), valueColumn = "amount") sparsity(cube)
data("sales") cube = generateCube(sales, columns = list(time = c("month", "year"), location = c("state"), product = "product"), valueColumn = "amount") sparsity(cube)
Shows the dimensions and the number of levels per dimension of the given cube. All added selections and aggregations will be regarded.
summary(x)
summary(x)
x |
The |
Michael Scholz [email protected]
data("sales") cube = generateCube(sales, columns = list(time = c("month", "year"), location = c("state"), product = "product"), valueColumn = "amount") summary(cube)
data("sales") cube = generateCube(sales, columns = list(time = c("month", "year"), location = c("state"), product = "product"), valueColumn = "amount") summary(cube)