Title: | Spatially Smoothed MRCD Estimator |
---|---|
Description: | Estimation of the Spatially Smoothed Minimum Regularized Determinant (ssMRCD) estimator and its usage in an ssMRCD-based outlier detection method as described in Puchhammer and Filzmoser (2023) <doi:10.1080/10618600.2023.2277875> and for sparse robust PCA for multi-source data described in Puchhammer, Wilms and Filzmoser (2024) <doi:10.48550/arXiv.2407.16299>. Included are also complementary visualization and parameter tuning tools. |
Authors: | Patricia Puchhammer [aut, cre, cph], Peter Filzmoser [aut] |
Maintainer: | Patricia Puchhammer <[email protected]> |
License: | GPL-3 |
Version: | 1.1.0 |
Built: | 2024-11-22 06:58:19 UTC |
Source: | CRAN |
Aligns loadings per neighborhood for better visualization and comparison. Different options are available.
align_PC(PC, N, p, type = "largest", vec = NULL)
align_PC(PC, N, p, type = "largest", vec = NULL)
PC |
matrix of loadings of size Np x k |
N |
integer, number of groups/neighborhoods |
p |
integer, number of variables |
type |
character indicating how loadings are aligned (see details),
options are |
vec |
|
For input type
possible values are "largest", "maxvar","nonzero","mean","scalar"
.
For option "maxvar"
the variable with the highest absolute value in the loading
is scaled to be positive (per neighborhood, per loading).
For option "nonzero"
the variable with largest distance to zero in the entries is
scaled to be positive (per neighborhood, per loading).
For option "scalar"
the variable is scaled in a way, that the scalar product
between the loading and the respective part of vec
is positive (per neighborhood, per loading).
If vec
is of size p
times k
, the same vector is used for all neighborhoods.
Option "mean"
is option "scalar"
with vec
being the mean of the loadings per variable across neighborhoods.
Option "largest"
scales the largest absolute value to be positive per neighborhood and per PC.
Option "none"
does nothing and returns PC
.
Returns a matrix of loadings of size Np
times k
.
x = matrix(c(1, 0, 0, 0, sqrt(0.5), -sqrt(0.5), 0, 0, 0, sqrt(1/3), -sqrt(1/3), sqrt(1/3), sqrt(0.5), sqrt(0.5), 0, 0), ncol = 2) align_PC(PC = x, N = 2, p = 4, type = "largest") align_PC(PC = x, N = 2, p = 4, type = "mean")
x = matrix(c(1, 0, 0, 0, sqrt(0.5), -sqrt(0.5), 0, 0, 0, sqrt(1/3), -sqrt(1/3), sqrt(1/3), sqrt(0.5), sqrt(0.5), 0, 0), ncol = 2) align_PC(PC = x, N = 2, p = 4, type = "largest") align_PC(PC = x, N = 2, p = 4, type = "mean")
Biplot for PCAloc
## S3 method for class 'PCAloc' biplot(x, ...)
## S3 method for class 'PCAloc' biplot(x, ...)
x |
object of class PCAloc. |
... |
other input arguments, see details. |
Additional parameters that can be given to the function are:
shape |
point shape |
size |
point size |
alpha |
transparency |
color |
either "variable" or "groups"
indication how points should be coloured. |
Returns version of biplot for PCAloc object.
# set seed set.seed(236) # make data data = matrix(rnorm(2000), ncol = 4) groups = sample(1:10, 500, replace = TRUE) W = time_weights(N = 10, c(3,2,1)) # calculate covariance matrices covs = ssMRCD(data, groups = groups, weights = W, lambda = 0.3) # sparse PCA pca = sparsePCAloc(eta = 0.3, gamma = 0.7, cor = FALSE, COVS = covs$MRCDcov, n_max = 1000, increase_rho = list(TRUE, 50, 1), trace = FALSE) # plot biplot biplot(pca, alpha = 0.4, shape = 16, size = 2, color = "variable")
# set seed set.seed(236) # make data data = matrix(rnorm(2000), ncol = 4) groups = sample(1:10, 500, replace = TRUE) W = time_weights(N = 10, c(3,2,1)) # calculate covariance matrices covs = ssMRCD(data, groups = groups, weights = W, lambda = 0.3) # sparse PCA pca = sparsePCAloc(eta = 0.3, gamma = 0.7, cor = FALSE, COVS = covs$MRCDcov, n_max = 1000, increase_rho = list(TRUE, 50, 1), trace = FALSE) # plot biplot biplot(pca, alpha = 0.4, shape = 16, size = 2, color = "variable")
This function swaps observations completely random in order to introduce contamination
in the data. Used in parameter_tuning
.
contamination_random(cont, data)
contamination_random(cont, data)
cont |
numeric, amount of contamination in data. |
data |
data whose observations should be switched. |
A matrix with switched observations.
# set seed set.seed(1) # get data data(weatherAUT2021) # switch 5% of observations contamination_random(cont = 0.05, data = weatherAUT2021[,1:6])
# set seed set.seed(1) # get data data(weatherAUT2021) # switch 5% of observations contamination_random(cont = 0.05, data = weatherAUT2021[,1:6])
Objective function value for local sparse PCA
eval_objective(PC, eta, gamma, COVS)
eval_objective(PC, eta, gamma, COVS)
PC |
vectorised component to evaluate. |
eta |
degree of sparsity. |
gamma |
distribution of sparsity between groupwise ( |
COVS |
list of covariance matrices used for PCA |
Returns value of the objective function for given v
.
S1 = matrix(c(1, 0.9, 0.8, 0.5, 0.9, 1.1, 0.7, 0.4, 0.8, 0.7, 1.5, 0.2, 0.5, 0.4, 0.2, 1), ncol = 4) S2 = t(S1)%*% S1 S2 = S2/2 eval_objective(PC = c(1,0,0,0,sqrt(2),0,0,-sqrt(2)), eta = 1, gamma = 0.5, COVS = list(S1, S2))
S1 = matrix(c(1, 0.9, 0.8, 0.5, 0.9, 1.1, 0.7, 0.4, 0.8, 0.7, 1.5, 0.2, 0.5, 0.4, 0.2, 1), ncol = 4) S2 = t(S1)%*% S1 S2 = S2/2 eval_objective(PC = c(1,0,0,0,sqrt(2),0,0,-sqrt(2)), eta = 1, gamma = 0.5, COVS = list(S1, S2))
Explained Variance summarized over Groups
explained_var(COVS, PC, k, type = "scaled", cor = FALSE, gamma = 0.5)
explained_var(COVS, PC, k, type = "scaled", cor = FALSE, gamma = 0.5)
COVS |
list of covariance matrices |
PC |
matrix-like object holding the loadings of length np |
k |
which component should be evaluated |
type |
character, either |
cor |
logical, if |
gamma |
scalar between 0 and 1 indicatig distribution of sparsity. |
Returns scalar
S1 = matrix(c(1, 0.9, 0.8, 0.5, 0.9, 1.1, 0.7, 0.4, 0.8, 0.7, 1.5, 0.2, 0.5, 0.4, 0.2, 1), ncol = 4) S2 = t(S1)%*% S1 S2 = S2/2 explained_var(COVS = list(S1, S2), PC = c(1,0,0,0,sqrt(2),0,0,-sqrt(2)), k = 1, cor = FALSE, gamma = 0.5) explained_var(COVS = list(cov2cor(S1), cov2cor(S2)), PC = c(1,0,0,0,sqrt(2),0,0,-sqrt(2)), k = 1, cor = TRUE, gamma = 0.5)
S1 = matrix(c(1, 0.9, 0.8, 0.5, 0.9, 1.1, 0.7, 0.4, 0.8, 0.7, 1.5, 0.2, 0.5, 0.4, 0.2, 1), ncol = 4) S2 = t(S1)%*% S1 S2 = S2/2 explained_var(COVS = list(S1, S2), PC = c(1,0,0,0,sqrt(2),0,0,-sqrt(2)), k = 1, cor = FALSE, gamma = 0.5) explained_var(COVS = list(cov2cor(S1), cov2cor(S2)), PC = c(1,0,0,0,sqrt(2),0,0,-sqrt(2)), k = 1, cor = TRUE, gamma = 0.5)
Calculates a inverse-distance based weight matrix for the function ssMRCD
(see details).
geo_weights(coordinates, groups)
geo_weights(coordinates, groups)
coordinates |
matrix of coordinates of observations. |
groups |
vector of neighborhood groups. |
First, the centers (means of the coordinates given) of each neighborhood is calculated.
Then, the Euclidean distance between the centers is calculated and the weight is based on
the inverse distance between two neighborhoods,
It is scaled according to a weight matrix.
Returns a weighting matrix W
and the coordinates of the centers per neighborhood centersN
.
coordinates = matrix(rnorm(1000), ncol = 2, nrow = 500) groups = sample(1:5, 500, replace = TRUE) geo_weights(coordinates, groups)
coordinates = matrix(rnorm(1000), ncol = 2, nrow = 500) groups = sample(1:5, 500, replace = TRUE) geo_weights(coordinates, groups)
This function creates a grid-based neighborhood structure for the ssMRCD
function using cut-off values for two coordinate axis.
groups_gridbased(x, y, cutx, cuty)
groups_gridbased(x, y, cutx, cuty)
x |
vector of first coordinate of data set. |
y |
vector of second coordinate of data set. |
cutx |
cut-offs for first coordinate. |
cuty |
cut-offs for second coordinate. |
Returns a neighborhood assignment vector for the coordinates x
and y
.
# get data data(weatherAUT2021) # set cut-off values cut_lon = c(9:16, 18) cut_lat = c(46, 47, 47.5, 48, 49) # create neighborhood assignments groups_gridbased(weatherAUT2021$lon, weatherAUT2021$lat, cut_lon, cut_lat)
# get data data(weatherAUT2021) # set cut-off values cut_lon = c(9:16, 18) cut_lat = c(46, 47, 47.5, 48, 49) # create neighborhood assignments groups_gridbased(weatherAUT2021$lon, weatherAUT2021$lat, cut_lon, cut_lat)
This function applies the local outlier detection method based on the spatially smoothed MRCD estimator developed in Puchhammer and Filzmoser (2023).
local_outliers_ssMRCD( data, coords, groups, lambda, weights = NULL, k = NULL, dist = NULL )
local_outliers_ssMRCD( data, coords, groups, lambda, weights = NULL, k = NULL, dist = NULL )
data |
data matrix with measured values. |
coords |
matrix of coordinates of observations. |
groups |
vector of neighborhood assignments. |
lambda |
scalar used for spatial smoothing (see also |
weights |
weight matrix used in |
k |
integer, if given the |
dist |
scalar, if given the neighbors closer than given distance are used for next distances. If |
Returns an object of class "locOuts"
with following components:
outliers |
indices of found outliers. |
next_distance |
vector of next distances for all observations. |
cutoff |
upper fence of adjusted boxplot (see adjbox ) used as cutoff value for next distances. |
coords |
matrix of observation coordinates. |
data |
matrix of observation values. |
groups |
vector of neighborhood assignments. |
k, dist |
specifications regarding neighbor comparisons. |
centersN |
coordinates of centers of neighborhoods. |
matneighbor |
matrix storing information which observations where used to calculate next distance for each observation (per row). 1 indicates it is used. |
ssMRCD |
object of class "ssMRCD" and output of ssMRCD covariance estimation. |
Puchhammer P. and Filzmoser P. (2023): Spatially smoothed robust covariance estimation for local outlier detection. doi:10.48550/arXiv.2305.05371
See also functions ssMRCD, plot.locOuts, summary.locOuts
.
# data construction data = matrix(rnorm(2000), ncol = 4) coords = matrix(rnorm(1000), ncol = 2) groups = sample(1:10, 500, replace = TRUE) lambda = 0.3 # apply function outs = local_outliers_ssMRCD(data = data, coords = coords, groups = groups, lambda = lambda, k = 10) outs
# data construction data = matrix(rnorm(2000), ncol = 4) coords = matrix(rnorm(1000), ncol = 2) groups = sample(1:10, 500, replace = TRUE) lambda = 0.3 # apply function outs = local_outliers_ssMRCD(data = data, coords = coords, groups = groups, lambda = lambda, k = 10) outs
Calculation of the value of the objective function for the ssMRCD
for a given list of matrices,
lambda and a weighting matrix according to formula (3) in Puchhammer and Filzmoser (2023).
objective_matrix(matrix_list, lambda, weights)
objective_matrix(matrix_list, lambda, weights)
matrix_list |
a list of matrices |
lambda |
scalar smoothing parameter |
weights |
matrix of weights |
Returns the value of the objective function using matrices .
Puchhammer P. and Filzmoser P. (2023): Spatially smoothed robust covariance estimation for local outlier detection. doi:10.48550/arXiv.2305.05371
# construct matrices k1 = matrix(c(1,2,3,4), nrow = 2) k2 = matrix(c(1,3,5,7), nrow = 2) # construct weighting matrix W = matrix(c(0, 1, 1, 0), nrow = 2) objective_matrix(list(k1, k2), 0.5, W)
# construct matrices k1 = matrix(c(1,2,3,4), nrow = 2) k2 = matrix(c(1,3,5,7), nrow = 2) # construct weighting matrix W = matrix(c(0, 1, 1, 0), nrow = 2) objective_matrix(list(k1, k2), 0.5, W)
This function provides insight into the effects of different parameter settings.
parameter_tuning( data, coords, groups, lambda = c(0, 0.25, 0.5, 0.75, 0.9), weights = NULL, k = NULL, dist = NULL, cont = 0.05, repetitions = 5 )
parameter_tuning( data, coords, groups, lambda = c(0, 0.25, 0.5, 0.75, 0.9), weights = NULL, k = NULL, dist = NULL, cont = 0.05, repetitions = 5 )
data |
matrix with observations. |
coords |
matrix of coordinates of these observations. |
groups |
numeric vector, the neighborhood structure that should be used for |
lambda |
scalar, the smoothing parameter. |
weights |
weighting matrix used in |
k |
vector of possible k-values to evaluate. |
dist |
vector of possible dist-values to evaluate. |
cont |
level of contamination, between 0 and 1. |
repetitions |
number of repetitions wanted to have a good picture of the best parameter combination. |
Returns a matrix of average false-negative rate (FNR) values and the total number of outliers found by the method as aproxy for the false-positive rate. Be aware that the FNR does not take into account that there are also natural outliers included in the data set that might or might not be found. Also a plot is returned representing these average. The best parameter selection depends on the goal of the analysis.
# get data set data("weatherAUT2021") # make neighborhood assignments cut_lon = c(9:16, 18) cut_lat = c(46, 47, 47.5, 48, 49) N = ssMRCD::groups_gridbased(weatherAUT2021$lon, weatherAUT2021$lat, cut_lon, cut_lat) table(N) N[N == 2] = 1 N[N == 3] = 4 N[N == 5] = 4 N[N == 6] = 7 N[N == 11] = 15 N = as.numeric(as.factor(N)) # tune parameters set.seed(123) parameter_tuning(data = weatherAUT2021[, 1:6 ], coords = weatherAUT2021[, c("lon", "lat")], groups = N, lambda = c(0.5, 0.75), k = c(10), repetitions = 1)
# get data set data("weatherAUT2021") # make neighborhood assignments cut_lon = c(9:16, 18) cut_lat = c(46, 47, 47.5, 48, 49) N = ssMRCD::groups_gridbased(weatherAUT2021$lon, weatherAUT2021$lat, cut_lon, cut_lat) table(N) N[N == 2] = 1 N[N == 3] = 4 N[N == 5] = 4 N[N == 6] = 7 N[N == 11] = 15 N = as.numeric(as.factor(N)) # tune parameters set.seed(123) parameter_tuning(data = weatherAUT2021[, 1:6 ], coords = weatherAUT2021[, c("lon", "lat")], groups = N, lambda = c(0.5, 0.75), k = c(10), repetitions = 1)
Plots of loadings of PCAloc object
plot_loadings(object, ...)
plot_loadings(object, ...)
object |
object of class PCAloc |
... |
other input arguments, see details. |
Additional parameters that can be given to the function are:
text |
logical if values should be added as text. |
size |
point size. |
tolerance |
tolerance for rounding to zero. |
k |
integer, which component scores should be plotted. |
groupnames |
names of groups. |
varnames |
names of variables. |
textrotate |
angle of text rotation, if included. |
Returns loading heatmap for component k
.
# set seed set.seed(236) data = matrix(rnorm(2000), ncol = 4) groups = sample(1:10, 500, replace = TRUE) W = time_weights(N = 10, c(3,2,1)) # calculate covariance matrices covs = ssMRCD(data, groups = groups, weights = W, lambda = 0.3) # sparse PCA pca = sparsePCAloc(eta = 0.3, gamma = 0.7, cor = FALSE, COVS = covs$MRCDcov, n_max = 1000, increase_rho = list(TRUE, 50, 1), trace = FALSE) # plot score distances plot_loadings(object = pca, k = 1, size = 2)
# set seed set.seed(236) data = matrix(rnorm(2000), ncol = 4) groups = sample(1:10, 500, replace = TRUE) W = time_weights(N = 10, c(3,2,1)) # calculate covariance matrices covs = ssMRCD(data, groups = groups, weights = W, lambda = 0.3) # sparse PCA pca = sparsePCAloc(eta = 0.3, gamma = 0.7, cor = FALSE, COVS = covs$MRCDcov, n_max = 1000, increase_rho = list(TRUE, 50, 1), trace = FALSE) # plot score distances plot_loadings(object = pca, k = 1, size = 2)
Distance-distance plot of scores of PCA
plot_score_distances(X, PC, groups, ssMRCD, k, ...)
plot_score_distances(X, PC, groups, ssMRCD, k, ...)
X |
data matrix. |
PC |
loadings from PCA. |
groups |
vector containing group assignments. |
ssMRCD |
ssMRCD object. |
k |
integer of how many components should be used. |
... |
other input arguments, see details. |
Additional parameters that can be given to the function are:
shape |
point shape |
size |
point size |
alpha |
transparency |
Returns distance-distance plot of orthogonal and score distance.
# set seed set.seed(236) data = matrix(rnorm(2000), ncol = 4) groups = sample(1:10, 500, replace = TRUE) W = time_weights(N = 10, c(3,2,1)) # calculate covariance matrices covs = ssMRCD(data, groups = groups, weights = W, lambda = 0.3) # sparse PCA pca = sparsePCAloc(eta = 0.3, gamma = 0.7, cor = FALSE, COVS = covs$MRCDcov, n_max = 1000, increase_rho = list(TRUE, 50, 1), trace = FALSE) # plot score distances plot_score_distances(PC = pca$PC, groups = groups, X = data, ssMRCD = covs, k = 2, alpha = 0.4, shape = 16, size = 2)
# set seed set.seed(236) data = matrix(rnorm(2000), ncol = 4) groups = sample(1:10, 500, replace = TRUE) W = time_weights(N = 10, c(3,2,1)) # calculate covariance matrices covs = ssMRCD(data, groups = groups, weights = W, lambda = 0.3) # sparse PCA pca = sparsePCAloc(eta = 0.3, gamma = 0.7, cor = FALSE, COVS = covs$MRCDcov, n_max = 1000, increase_rho = list(TRUE, 50, 1), trace = FALSE) # plot score distances plot_score_distances(PC = pca$PC, groups = groups, X = data, ssMRCD = covs, k = 2, alpha = 0.4, shape = 16, size = 2)
Plots of score distribution
plot_scores(X, PC, groups, ssMRCD, ...)
plot_scores(X, PC, groups, ssMRCD, ...)
X |
data matrix. |
PC |
loadings from PCA. |
groups |
vector containing group assignments. |
ssMRCD |
ssMRCD object. |
... |
other input arguments, see details. |
Additional parameters that can be given to the function are:
shape |
point shape |
size |
point size |
alpha |
transparency |
k |
integer, which component scores should be plotted |
Returns histograms of scores for component k
.
# set seed set.seed(236) data = matrix(rnorm(2000), ncol = 4) groups = sample(1:10, 500, replace = TRUE) W = time_weights(N = 10, c(3,2,1)) # calculate covariance matrices covs = ssMRCD(data, groups = groups, weights = W, lambda = 0.3) # sparse PCA pca = sparsePCAloc(eta = 0.3, gamma = 0.7, cor = FALSE, COVS = covs$MRCDcov, n_max = 1000, increase_rho = list(TRUE, 50, 1), trace = FALSE) # plot score distances plot_scores(PC = pca$PC, groups = groups, X = data, ssMRCD = covs, k = 1, alpha = 0.4, shape = 16, size = 2)
# set seed set.seed(236) data = matrix(rnorm(2000), ncol = 4) groups = sample(1:10, 500, replace = TRUE) W = time_weights(N = 10, c(3,2,1)) # calculate covariance matrices covs = ssMRCD(data, groups = groups, weights = W, lambda = 0.3) # sparse PCA pca = sparsePCAloc(eta = 0.3, gamma = 0.7, cor = FALSE, COVS = covs$MRCDcov, n_max = 1000, increase_rho = list(TRUE, 50, 1), trace = FALSE) # plot score distances plot_scores(PC = pca$PC, groups = groups, X = data, ssMRCD = covs, k = 1, alpha = 0.4, shape = 16, size = 2)
This function plots different diagnostic plots for local outlier detection.
It can be applied to an object of class "locOuts"
which is the output of the function local_outliers_ssMRCD
.
## S3 method for class 'locOuts' plot( x, type = c("hist", "spatial", "lines", "3D"), colour = "all", focus = NULL, pos = NULL, alpha = 0.3, data = NULL, add_map = TRUE, ... )
## S3 method for class 'locOuts' plot( x, type = c("hist", "spatial", "lines", "3D"), colour = "all", focus = NULL, pos = NULL, alpha = 0.3, data = NULL, add_map = TRUE, ... )
x |
a locOuts object obtained by the function |
type |
vector containing the types of plots that should be plotted, possible values |
colour |
character specifying the color scheme (see details). Possible values |
focus |
an integer being the index of the observation whose neighborhood should be analysed more closely. |
pos |
integer specifying the position of the text "cut-off" in the histogram (see |
alpha |
scalar specifying the transparency level of the points plotted for plot type |
data |
optional data frame or matrix used for plot of type |
add_map |
TRUE if a map should be plotted along the line plot ( |
... |
further parameters passed on to base-R plotting functions. |
Regarding the parameter type
the value "hist"
corresponds to a plot of the
histogram of the next distances together with the used cutoff-value.
When using "spatial"
the coordinates of each observation are plotted and colorized according to the color setting.
The "lines"
plot is used with the index focus
of one observation whose out/inlyingness to its neighborhood
should by plotted. The whole data set is scaled to the range [0,1] and the scaled value of the selected observation and
its neighbors are plotted. Outliers are plotted in orange.
The "3D"
setting leads to a 3D-plot using the colour setting as height.
The view can be adapted using the parameters theta
and phi
.
For the colour
setting possible values are "all"
(all next distances are
used and colored in an orange palette), "onlyOuts"
(only outliers are
plotted in orange, inliers are plotted in grey) and "outScore"
(the next
distance divided by the cutoff value is used to colourize the points; inliers are colorized in blue, outliers in orange).
Returns plots regarding next distances and spatial context.
# set seed set.seed(1) # make locOuts object data = matrix(rnorm(2000), ncol = 4) coords = matrix(rnorm(1000), ncol = 2) groups = sample(1:10, 500, replace = TRUE) lambda = 0.3 # local outlier detection outs = local_outliers_ssMRCD(data = data, coords = coords, groups = groups, lambda = lambda, k = 10) # plot results plot(outs, type = "hist") plot(outs, type = "spatial", colour = "outScore") plot(outs, type = "3D", colour = "outScore", theta = 0) plot(outs, type ="lines", focus = outs$outliers[1])
# set seed set.seed(1) # make locOuts object data = matrix(rnorm(2000), ncol = 4) coords = matrix(rnorm(1000), ncol = 2) groups = sample(1:10, 500, replace = TRUE) lambda = 0.3 # local outlier detection outs = local_outliers_ssMRCD(data = data, coords = coords, groups = groups, lambda = lambda, k = 10) # plot results plot(outs, type = "hist") plot(outs, type = "spatial", colour = "outScore") plot(outs, type = "3D", colour = "outScore", theta = 0) plot(outs, type ="lines", focus = outs$outliers[1])
Plotting method PCAloc object
## S3 method for class 'PCAloc' plot( x, type = c("loadings", "screeplot", "scores", "score_distances", "biplot"), ... )
## S3 method for class 'PCAloc' plot( x, type = c("loadings", "screeplot", "scores", "score_distances", "biplot"), ... )
x |
object of class PCAloc |
type |
character indicating the type of plot, see details. |
... |
further arguments passed down. |
Returns plots in ggplot2.
# set seed set.seed(236) # create data and setup data = matrix(rnorm(2000), ncol = 4) groups = sample(1:10, 500, replace = TRUE) W = time_weights(N = 10, c(3,2,1)) # calculate covariances covs = ssMRCD(data, groups = groups, weights = W, lambda = 0.3) # calculate sparse PCA pca = sparsePCAloc(eta = 0.3, gamma = 0.7, cor = FALSE, COVS = covs$MRCDcov, n_max = 1000, increase_rho = list(TRUE, 50, 1), trace = FALSE) # align loadings pca$PC = align_PC(PC = pca$PC, N = pca$N, p = pca$p, type = "mean") # plot different PCA plots plot(x = pca, type = "score_distances", groups = groups, X = data, ssMRCD = covs, k = 2) plot(x = pca, type = "biplot", color = "variable") plot(x = pca, type = "scores", groups = groups, X = data, ssMRCD = covs, k = 1) plot(x = pca, type = "screeplot") plot(x = pca, type = "loadings", k = 1)
# set seed set.seed(236) # create data and setup data = matrix(rnorm(2000), ncol = 4) groups = sample(1:10, 500, replace = TRUE) W = time_weights(N = 10, c(3,2,1)) # calculate covariances covs = ssMRCD(data, groups = groups, weights = W, lambda = 0.3) # calculate sparse PCA pca = sparsePCAloc(eta = 0.3, gamma = 0.7, cor = FALSE, COVS = covs$MRCDcov, n_max = 1000, increase_rho = list(TRUE, 50, 1), trace = FALSE) # align loadings pca$PC = align_PC(PC = pca$PC, N = pca$N, p = pca$p, type = "mean") # plot different PCA plots plot(x = pca, type = "score_distances", groups = groups, X = data, ssMRCD = covs, k = 2) plot(x = pca, type = "biplot", color = "variable") plot(x = pca, type = "scores", groups = groups, X = data, ssMRCD = covs, k = 1) plot(x = pca, type = "screeplot") plot(x = pca, type = "loadings", k = 1)
Plots diagnostics for function output of ssMRCD
regarding convergence behavior
and the resulting covariances matrices.
## S3 method for class 'ssMRCD' plot( x, type = c("convergence", "ellipses"), centersN = NULL, colour_scheme = "none", xlim_upper = 9, manual_rescale = 1, legend = TRUE, xlim = NULL, ylim = NULL, ... )
## S3 method for class 'ssMRCD' plot( x, type = c("convergence", "ellipses"), centersN = NULL, colour_scheme = "none", xlim_upper = 9, manual_rescale = 1, legend = TRUE, xlim = NULL, ylim = NULL, ... )
x |
object of class |
type |
type of plot, possible values are |
centersN |
for plot type |
colour_scheme |
coloring scheme used for plot type |
xlim_upper |
numeric giving the upper x limit for plot type |
manual_rescale |
for plot type |
legend |
logical, if color legend should be included. |
xlim |
vector of xlim (see |
ylim |
vector of ylim (see |
... |
further plotting parameters. |
For type = "convergence"
a plot is produced displaying the convergence behaviour.
Each line represents a different initial value used for the c-step iteration. On the x-axis the
iteration step is plotted with the corresponding value of the objective function. Not monotonically
lines are plotted in red.
For type = "ellipses"
and more than a 2-dimensional data setting plotting the exact tolerance ellipse is
not possible anymore. Instead the two eigenvectors with highest eigenvalue from the
MCD used on the full data set without neighborhood assignments are taken and used as axis for
the tolerance ellipses of the ssMRCD covariance estimators. The tolerance ellipse for the global MCD
covariance is plotted in grey in the upper left corner. It is possible to set the colour scheme
to "trace"
to see the overall amount of variabilty and compare the plotted covariance and
the real trace to see how much variance is not plotted. For "regularity"
the regularization of each
covariance is shown.
Returns plots of the ssMRCD methodology and results.
ssMRCD, summary.ssMRCD,
local_outliers_ssMRCD, plot.locOuts
# set seed set.seed(1) # create data set data = matrix(rnorm(2000), ncol = 4) coords = matrix(rnorm(1000), ncol = 2) groups = sample(1:10, 500, replace = TRUE) lambda = 0.3 # calculate ssMRCD by using the local outlier detection method outs = local_outliers_ssMRCD(data = data, coords = coords, groups = groups, lambda = lambda, k = 10) # plot ssMRCD object included in outs plot(x = outs$ssMRCD, centersN = outs$centersN, colour_scheme = "trace", legend = FALSE)
# set seed set.seed(1) # create data set data = matrix(rnorm(2000), ncol = 4) coords = matrix(rnorm(1000), ncol = 2) groups = sample(1:10, 500, replace = TRUE) lambda = 0.3 # calculate ssMRCD by using the local outlier detection method outs = local_outliers_ssMRCD(data = data, coords = coords, groups = groups, lambda = lambda, k = 10) # plot ssMRCD object included in outs plot(x = outs$ssMRCD, centersN = outs$centersN, colour_scheme = "trace", legend = FALSE)
Given a matrix with values for neighborhood influences the function rescales
the matrix in order to get an appropriate weight matrix used for the function ssMRCD
.
rescale_weights(W)
rescale_weights(W)
W |
weight matrix with diagonals equal to zero and at least one positive entry per row. |
An appropriately scaled weight matrix.
ssMRCD, local_outliers_ssMRCD, geo_weights
W = matrix(c(0, 1, 2, 1, 0, 1, 2, 1, 0), nrow = 3) rescale_weights(W)
W = matrix(c(0, 1, 2, 1, 0, 1, 2, 1, 0), nrow = 3) rescale_weights(W)
Extracting Residuals from Local Fit
## S3 method for class 'ssMRCD' residuals(object, ...)
## S3 method for class 'ssMRCD' residuals(object, ...)
object |
|
... |
see details |
Other input variables are:
remove_outliers |
logical (default FALSE ). If TRUE, only residuals
from not outlying observations are calculated. If FALSE, trimmed residuals are used (see alpha ). |
X |
matrix of new data, if data from the ssMRCD object is used. |
groups |
vector of groups for new data, if NULL data from the ssMRCD object is used. |
mean |
logical (default FALSE ), specifying if mean of trimmed
observations is returned or all residuals. |
If X
and groups
are provided, alpha
is set to one and all residuals are used.
If remove_outliers
is TRUE, alpha
is set to 1 automatically.
Returns either all residuals or the mean of the residual norms lower than the alpha
- Quantile.
# create data set x1 = matrix(runif(200), ncol = 2) x2 = matrix(rnorm(200), ncol = 2) x = list(x1, x2) # create weighting matrix W = matrix(c(0, 1, 1, 0), ncol = 2) # calculate ssMRCD localCovs = ssMRCD(x, weights = W, lambda = 0.5) # residuals of model residuals(localCovs, remove_outliers = TRUE, mean = FALSE) # residuals of new data residuals(localCovs, X = matrix(rnorm(20), ncol = 2, nrow = 10), groups = rep(2, 10), mean =TRUE)
# create data set x1 = matrix(runif(200), ncol = 2) x2 = matrix(rnorm(200), ncol = 2) x = list(x1, x2) # create weighting matrix W = matrix(c(0, 1, 1, 0), ncol = 2) # calculate ssMRCD localCovs = ssMRCD(x, weights = W, lambda = 0.5) # residuals of model residuals(localCovs, remove_outliers = TRUE, mean = FALSE) # residuals of new data residuals(localCovs, X = matrix(rnorm(20), ncol = 2, nrow = 10), groups = rep(2, 10), mean =TRUE)
This function restructures neighborhood information given by a data matrix
containing all information and one neighborhood assignment vector. It returns a list
of data matrices used in ssMRCD
.
restructure_as_list(data, groups)
restructure_as_list(data, groups)
data |
data matrix with all observations. |
groups |
numeric neighborhood assignment vector. |
Returns a list containing the observations per neighborhood assignment. The list is sorted according to the order of the first appearance in the groups vector.
# data matrix data = matrix(rnorm(n = 3000), ncol = 3) N_assign = sample(x = 1:10, size = 1000, replace = TRUE) restructure_as_list(data, N_assign)
# data matrix data = matrix(rnorm(n = 3000), ncol = 3) N_assign = sample(x = 1:10, size = 1000, replace = TRUE) restructure_as_list(data, N_assign)
Scale Data Locally
scale_ssMRCD( ssMRCD, X = NULL, groups = NULL, multivariate = FALSE, center_only = FALSE )
scale_ssMRCD( ssMRCD, X = NULL, groups = NULL, multivariate = FALSE, center_only = FALSE )
ssMRCD |
|
X |
matrix, new data to scale with ssMRCD estimation. |
groups |
vector, group assignments of new data |
multivariate |
logical, |
center_only |
logical, if |
Returns matrix of observations. If X = NULL
X from the ssMRCD object is
used and sorted according to group numbering.
# create data set x1 = matrix(runif(200), ncol = 2) x2 = matrix(rnorm(200), ncol = 2) x = list(x1, x2) # create weighting matrix W = matrix(c(0, 1, 1, 0), ncol = 2) # calculate ssMRCD localCovs = ssMRCD(x, weights = W, lambda = 0.5) # scale used data scale_ssMRCD(localCovs, multivariate = TRUE) # scale new data scale_ssMRCD(localCovs, X = matrix(rnorm(20), ncol = 2, nrow = 10), groups = rep(2, 10), multivariate =TRUE)
# create data set x1 = matrix(runif(200), ncol = 2) x2 = matrix(rnorm(200), ncol = 2) x = list(x1, x2) # create weighting matrix W = matrix(c(0, 1, 1, 0), ncol = 2) # calculate ssMRCD localCovs = ssMRCD(x, weights = W, lambda = 0.5) # scale used data scale_ssMRCD(localCovs, multivariate = TRUE) # scale new data scale_ssMRCD(localCovs, X = matrix(rnorm(20), ncol = 2, nrow = 10), groups = rep(2, 10), multivariate =TRUE)
Calculate Scores for local sparse PCA
scores(X, PC, groups, ssMRCD = NULL)
scores(X, PC, groups, ssMRCD = NULL)
X |
data set as matrix. |
PC |
loading matrix. |
groups |
vector of grouping structure (numeric). |
ssMRCD |
ssMRCD object used for scaling |
Returns a list with scores and univariately and locally centered and scaled observations.
# create data set x1 = matrix(runif(200), ncol = 2) x2 = matrix(rnorm(200), ncol = 2) x = list(x1, x2) # create weighting matrix W = matrix(c(0, 1, 1, 0), ncol = 2) # calculate ssMRCD loccovs = ssMRCD(x, weights = W, lambda = 0.5) # calculate PCA pca = sparsePCAloc(eta = 1, gamma = 0.5, cor = FALSE, COVS = loccovs$MRCDcov, increase_rho = list(FALSE, 20, 1)) # calculate scores scores(X = rbind(x1, x2), PC = pca$PC, groups = rep(c(1,2), each = 100), ssMRCD = loccovs)
# create data set x1 = matrix(runif(200), ncol = 2) x2 = matrix(rnorm(200), ncol = 2) x = list(x1, x2) # create weighting matrix W = matrix(c(0, 1, 1, 0), ncol = 2) # calculate ssMRCD loccovs = ssMRCD(x, weights = W, lambda = 0.5) # calculate PCA pca = sparsePCAloc(eta = 1, gamma = 0.5, cor = FALSE, COVS = loccovs$MRCDcov, increase_rho = list(FALSE, 20, 1)) # calculate scores scores(X = rbind(x1, x2), PC = pca$PC, groups = rep(c(1,2), each = 100), ssMRCD = loccovs)
Orthogonal Distances for PCAloc
scores.OD(X, PC, groups, ssMRCD)
scores.OD(X, PC, groups, ssMRCD)
X |
data matrix of observations. |
PC |
loadings of sparse local PCA. |
groups |
grouping vector for locality. |
ssMRCD |
ssMRCD object used for PCA calculation. |
Returns vector of orthogonal distances of observations.
scores
, scores.SD
, sparsePCAloc
, scale_ssMRCD
# create data set x1 = matrix(runif(200), ncol = 2) x2 = matrix(rnorm(200), ncol = 2) x = list(x1, x2) # create weighting matrix W = matrix(c(0, 1, 1, 0), ncol = 2) # calculate ssMRCD loccovs = ssMRCD(x, weights = W, lambda = 0.5) # calculate PCA pca = sparsePCAloc(eta = 1, gamma = 0.5, cor = FALSE, COVS = loccovs$MRCDcov, increase_rho = list(FALSE, 20, 1)) # calculate scores scores.OD(X = rbind(x1, x2), PC = pca$PC, groups = rep(c(1,2), each = 100), ssMRCD = loccovs)
# create data set x1 = matrix(runif(200), ncol = 2) x2 = matrix(rnorm(200), ncol = 2) x = list(x1, x2) # create weighting matrix W = matrix(c(0, 1, 1, 0), ncol = 2) # calculate ssMRCD loccovs = ssMRCD(x, weights = W, lambda = 0.5) # calculate PCA pca = sparsePCAloc(eta = 1, gamma = 0.5, cor = FALSE, COVS = loccovs$MRCDcov, increase_rho = list(FALSE, 20, 1)) # calculate scores scores.OD(X = rbind(x1, x2), PC = pca$PC, groups = rep(c(1,2), each = 100), ssMRCD = loccovs)
Score Distances for PCAloc
scores.SD(X, PC, groups, ssMRCD)
scores.SD(X, PC, groups, ssMRCD)
X |
data matrix of observations. |
PC |
loadings of sparse local PCA. |
groups |
grouping vector for locality. |
ssMRCD |
ssMRCD object used for PCA calculation. |
Returns vector of score distances of observations.
scores
, scores.OD
, sparsePCAloc
, scale_ssMRCD
# create data set x1 = matrix(runif(200), ncol = 2) x2 = matrix(rnorm(200), ncol = 2) x = list(x1, x2) # create weighting matrix W = matrix(c(0, 1, 1, 0), ncol = 2) # calculate ssMRCD loccovs = ssMRCD(x, weights = W, lambda = 0.5) # calculate PCA pca = sparsePCAloc(eta = 1, gamma = 0.5, cor = FALSE, COVS = loccovs$MRCDcov, increase_rho = list(FALSE, 20, 1)) # calculate scores scores.SD(X = rbind(x1, x2), PC = pca$PC, groups = rep(c(1,2), each = 100), ssMRCD = loccovs)
# create data set x1 = matrix(runif(200), ncol = 2) x2 = matrix(rnorm(200), ncol = 2) x = list(x1, x2) # create weighting matrix W = matrix(c(0, 1, 1, 0), ncol = 2) # calculate ssMRCD loccovs = ssMRCD(x, weights = W, lambda = 0.5) # calculate PCA pca = sparsePCAloc(eta = 1, gamma = 0.5, cor = FALSE, COVS = loccovs$MRCDcov, increase_rho = list(FALSE, 20, 1)) # calculate scores scores.SD(X = rbind(x1, x2), PC = pca$PC, groups = rep(c(1,2), each = 100), ssMRCD = loccovs)
Screeplot for PCAloc
## S3 method for class 'PCAloc' screeplot(x, ...)
## S3 method for class 'PCAloc' screeplot(x, ...)
x |
object of class PCAloc. |
... |
other input arguments, see details. |
Additional parameters that can be given to the function are:
text |
logical if text should be plotted |
size |
text size |
cutoff |
cutoff line for scree plot |
groupnames |
name of groups |
textrotate |
angle of text, if text is plotted. |
Returns version of scree plot and cumulative explained variance per group for PCAloc object.
# set seed set.seed(236) data = matrix(rnorm(2000), ncol = 4) groups = sample(1:10, 500, replace = TRUE) W = time_weights(N = 10, c(3,2,1)) # calculate covariance matrices covs = ssMRCD(data, groups = groups, weights = W, lambda = 0.3) # sparse PCA pca = sparsePCAloc(eta = 0.3, gamma = 0.7, cor = FALSE, COVS = covs$MRCDcov, n_max = 1000, increase_rho = list(TRUE, 50, 1), trace = FALSE) # plot biplot screeplot(pca, text = TRUE, cutoff = 0.8, size = 2)
# set seed set.seed(236) data = matrix(rnorm(2000), ncol = 4) groups = sample(1:10, 500, replace = TRUE) W = time_weights(N = 10, c(3,2,1)) # calculate covariance matrices covs = ssMRCD(data, groups = groups, weights = W, lambda = 0.3) # sparse PCA pca = sparsePCAloc(eta = 0.3, gamma = 0.7, cor = FALSE, COVS = covs$MRCDcov, n_max = 1000, increase_rho = list(TRUE, 50, 1), trace = FALSE) # plot biplot screeplot(pca, text = TRUE, cutoff = 0.8, size = 2)
The optimal smoothing value for the ssMRCD estimator is based on the residuals and the trimmed mean of the norm.
select_smoothing( X, groups, weights, lambda = seq(0, 1, 0.1), TM = NULL, alpha = 0.75, seed = 123436, return_all = TRUE, cores = 1 )
select_smoothing( X, groups, weights, lambda = seq(0, 1, 0.1), TM = NULL, alpha = 0.75, seed = 123436, return_all = TRUE, cores = 1 )
X |
data matrix containing observations. |
groups |
grouping vector corresponding to |
weights |
weight matrix for groups, see |
lambda |
vector of parameter values for smoothing, between 0 and 1. |
TM |
target matrix, if not given MCD (or MRCD if non regular) is used with default values and |
alpha |
percentage of outliers to be expected. |
seed |
seed for ssMRCD calculations. |
return_all |
logical, if FALSE the function returns only the optimal lambda. |
cores |
integer, number of cores used for parallel computing. |
lambda_opt |
optimal lambda for smoothing. |
COVS |
ssMRCD object with optimal parameter setting. |
plot |
plot for optimal parameter setting. |
residuals |
mean of norm of residuals for varying lambda. |
# create data set x1 = matrix(runif(200), ncol = 2) x2 = matrix(rnorm(200), ncol = 2) # create weighting matrix W = matrix(c(0, 1, 1, 0), ncol = 2) select_smoothing (X = rbind(x1, x2), groups = rep(c(1,2), each = 100), weights = W, lambda = seq(0, 1, 0.1), return_all = TRUE, cores = 1)
# create data set x1 = matrix(runif(200), ncol = 2) x2 = matrix(rnorm(200), ncol = 2) # create weighting matrix W = matrix(c(0, 1, 1, 0), ncol = 2) select_smoothing (X = rbind(x1, x2), groups = rep(c(1,2), each = 100), weights = W, lambda = seq(0, 1, 0.1), return_all = TRUE, cores = 1)
Optimal Sparsity Parameter Selection for PCA
select_sparsity( COVS, k = 1, rho = NULL, cor = FALSE, eta = seq(0, 5, by = 0.2), gamma = seq(0, 1, 0.05), eps_threshold = 0.001, eps_root = 0.1, eps_ADMM = 1e-04, n_max = 300, adjust_eta = FALSE, cores = 1, increase_rho = list(TRUE, 100, 1), convergence_plot = FALSE, trace = FALSE, stop.sparse = TRUE )
select_sparsity( COVS, k = 1, rho = NULL, cor = FALSE, eta = seq(0, 5, by = 0.2), gamma = seq(0, 1, 0.05), eps_threshold = 0.001, eps_root = 0.1, eps_ADMM = 1e-04, n_max = 300, adjust_eta = FALSE, cores = 1, increase_rho = list(TRUE, 100, 1), convergence_plot = FALSE, trace = FALSE, stop.sparse = TRUE )
COVS |
list of covariance or correlation matrices. |
k |
number of components to be returned. |
rho |
penalty parameter for ADMM. |
cor |
logical, if starting values for covariances or correlation matrices should be used. |
eta |
vector of possible values for degree of sparsity. |
gamma |
vector of possible values for distribution of sparsity. If only one value is provided, the optimal eta is calculated. |
eps_threshold |
tolerance for thresholding. |
eps_root |
tolerance for root finder. |
eps_ADMM |
tolerance for ADMM iterations. |
n_max |
maximal number of ADMM iterations. |
adjust_eta |
if eta should be adjusted for further components. |
cores |
number of cores for parallel computing. |
increase_rho |
list of settings for improved automated calculation and convergence. See Details. |
convergence_plot |
logical, if convergence plot should be plotted. Not applicable for |
trace |
logical, if messages should be displayed. Not applicable for |
stop.sparse |
calculate if AUC should be calculated for PCAs until full sparsity is reached ( |
The input increase_rho
consists of a logical indicating if rho should be adjusted
if algorithm did not converged within the given maximal number of iterations. Two integers specify the
maximal rho
that is allowed and the step size.
Returns list with
PCA |
object of type PCAloc. |
PC |
local loadings of PCA |
gamma |
optimal value for gamma. |
eta |
optimal value for eta. |
eta_tpo |
values of Trade-Off-Product for eta from optimization process. |
auc |
area under the curve for varying gamma values. |
pars |
parameters and respective sparsity entrywise and mixed and explained variance. |
plot |
ggplot object for optimal parameter selection. |
plot_info |
additional data for plotting functions. |
C1 = matrix(c(1,0,0,0.9), ncol = 2) C2 = matrix(c(1.1, 0.1, 0.1, 1), ncol = 2) C3 = matrix(c(1.2, 0.2, 0.2, 1), ncol = 2) select_sparsity(COVS = list(C1, C2, C3), k = 1, rho = 5, eta = c(0, 0.1, 0.15, 0.2, 0.3, 0.4, 0.5 ,0.75, 1), gamma =c(0, 0.25, 0.5, 0.75, 1), eps_threshold = 0.005, increase_rho = list(FALSE, 20, 5))
C1 = matrix(c(1,0,0,0.9), ncol = 2) C2 = matrix(c(1.1, 0.1, 0.1, 1), ncol = 2) C3 = matrix(c(1.2, 0.2, 0.2, 1), ncol = 2) select_sparsity(COVS = list(C1, C2, C3), k = 1, rho = 5, eta = c(0, 0.1, 0.15, 0.2, 0.3, 0.4, 0.5 ,0.75, 1), gamma =c(0, 0.25, 0.5, 0.75, 1), eps_threshold = 0.005, increase_rho = list(FALSE, 20, 5))
Calculate Sparse Principle Components
sparsePCAloc( eta, gamma, COVS, cor = FALSE, rho = NULL, k = NULL, eps_threshold = NULL, eps_ADMM = 1e-04, n_max = 200, eps_root = 0.1, maxiter_root = 50, increase_rho = list(TRUE, 100, 1), convergence_plot = TRUE, starting_value = NULL, adjust_eta = TRUE, trace = TRUE )
sparsePCAloc( eta, gamma, COVS, cor = FALSE, rho = NULL, k = NULL, eps_threshold = NULL, eps_ADMM = 1e-04, n_max = 200, eps_root = 0.1, maxiter_root = 50, increase_rho = list(TRUE, 100, 1), convergence_plot = TRUE, starting_value = NULL, adjust_eta = TRUE, trace = TRUE )
eta |
numeric, degree of sparsity. |
gamma |
numeric, distribution of sparsity. |
COVS |
list of covariance or correlation matrices. |
cor |
logical, if starting value for correlation or covariance matrices should be used. |
rho |
numeric bigger than zero, penalty for ADMM. |
k |
number of components to calculate. |
eps_threshold |
tolerance for thresholding. |
eps_ADMM |
tolerance for ADMM convergence. |
n_max |
number of maximal iterations. |
eps_root |
tolerance for root finder. |
maxiter_root |
maximal number of iterations for root finder. |
increase_rho |
list with entries for stable convergence. See Details. |
convergence_plot |
logical, if convergence plot should be displayed. |
starting_value |
optional given starting value. |
adjust_eta |
logical, if eta should be adjusted by the variance. |
trace |
logical, if messages should be displayed. |
The input increase_rho
consists of a logical indicating if rho should be adjusted
if algorithm did not converged within the given maximal number of iterations. Two integers specify the
maximal rho
that is allowed and the step size.
An object of class "PCAloc"
containing the following elements:
PC |
Matrix of dimension Np x k of stacked loading vectors. |
p |
Number of variables. |
N |
Number of neighborhoods. |
k |
Number of components. |
COVS |
List of covariance matrices sorted by neighborhood. |
gamma |
Sparsity distribution. |
eta |
Amount of sparsity. |
converged |
Logical, if ADMM converged with given specifications. |
n_steps |
Number of steps used. |
summary |
Description of result per component. |
residuals |
Primary and secondary residuals. |
C1 = diag(c(1.1, 0.9, 0.6)) C2 = matrix(c(1.1, 0.1, -0.1, 0.1, 1.0, -0.2, -0.1, -0.2, 0.7), ncol = 3) C3 = (C1 + C2)/2 sparsePCAloc(eta = 1, gamma = 0.5, cor = FALSE, COVS = list(C1, C2, C3), n_max = 100, increase_rho = list(FALSE, 100, 1))
C1 = diag(c(1.1, 0.9, 0.6)) C2 = matrix(c(1.1, 0.1, -0.1, 0.1, 1.0, -0.2, -0.1, -0.2, 0.7), ncol = 3) C3 = (C1 + C2)/2 sparsePCAloc(eta = 1, gamma = 0.5, cor = FALSE, COVS = list(C1, C2, C3), n_max = 100, increase_rho = list(FALSE, 100, 1))
Entry-wise Sparsity in the Loadings
sparsity_entries(PC, N, p, tolerance = 0, k = 1, scaled = TRUE)
sparsity_entries(PC, N, p, tolerance = 0, k = 1, scaled = TRUE)
PC |
matrix-like object of PCs. |
N |
integer, number of groups. |
p |
integer, number of variables. |
tolerance |
tolerance for sparsity. |
k |
integer or integer vector of which component should be used. |
scaled |
logical, if total number or percentage of possible sparse entries should be returned. |
Returns either a percentage (scaled = TRUE
) or the amount of zero-values entries (scaled = FALSE
).
PC = matrix(c(1,0,2,3,0,7,0,1,0,1,0.001,0), ncol = 2) sparsity_entries(PC, N = 2, p = 3, tolerance = 0, k = 1, scaled = FALSE) sparsity_entries(PC, N = 2, p = 3, tolerance = 0.001, k = 2, scaled = TRUE)
PC = matrix(c(1,0,2,3,0,7,0,1,0,1,0.001,0), ncol = 2) sparsity_entries(PC, N = 2, p = 3, tolerance = 0, k = 1, scaled = FALSE) sparsity_entries(PC, N = 2, p = 3, tolerance = 0.001, k = 2, scaled = TRUE)
Group-wise Sparsity in the Loadings
sparsity_group(PC, N, p, tolerance = 0, k = 1, scaled = TRUE)
sparsity_group(PC, N, p, tolerance = 0, k = 1, scaled = TRUE)
PC |
matrix-like object of PCs. |
N |
integer, number of groups. |
p |
integer, number of variables. |
tolerance |
tolerance for sparsity. |
k |
integer, which components should be used. Does not work for multiple PCs simultaneously. |
scaled |
logical, if total number or percentage of possible sparse entries should be returned. |
Returns either a matrix of percentages (scaled = TRUE
) or the amounts
of zero-values entries (scaled = FALSE
) for each group/neighborhood.
PC = matrix(c(1,0,2,3,0,7,0,1,0,1,0.001,0), ncol = 2) sparsity_group(PC, N = 2, p = 3, tolerance = 0, k = 1, scaled = FALSE) sparsity_group(PC, N = 2, p = 3, tolerance = 0.001, k = 2, scaled = TRUE)
PC = matrix(c(1,0,2,3,0,7,0,1,0,1,0.001,0), ncol = 2) sparsity_group(PC, N = 2, p = 3, tolerance = 0, k = 1, scaled = FALSE) sparsity_group(PC, N = 2, p = 3, tolerance = 0.001, k = 2, scaled = TRUE)
Mixed Sparsity of the Loadings
sparsity_mixed(PC, p, N, k = 1, tolerance = 0.001, mean = "arithmetic")
sparsity_mixed(PC, p, N, k = 1, tolerance = 0.001, mean = "arithmetic")
PC |
matrix-like object of PCs. |
p |
integer, number of variables. |
N |
integer, number of groups. |
k |
integer, which components should be used. Does not work for multiple PCs simultaneously. |
tolerance |
tolerance for sparsity. |
mean |
if |
Returns the geometric mean of the percentage of entry-wise and group-wise sparsity.
PC = matrix(c(1,0,2,3,0,7,0,1,0,1,0.001,0), ncol = 2) sparsity_mixed(PC, N = 2, p = 3, tolerance = 0, k = 1) sparsity_mixed(PC, N = 2, p = 3, tolerance = 0.001, k = 2, mean = "geometric")
PC = matrix(c(1,0,2,3,0,7,0,1,0,1,0.001,0), ncol = 2) sparsity_mixed(PC, N = 2, p = 3, tolerance = 0, k = 1) sparsity_mixed(PC, N = 2, p = 3, tolerance = 0.001, k = 2, mean = "geometric")
Entry-wise Sparsity in the Loadings per Group
sparsity_summary(PC, N, p, tolerance = 0, k = 1, scaled = FALSE)
sparsity_summary(PC, N, p, tolerance = 0, k = 1, scaled = FALSE)
PC |
matrix-like object of PCs. |
N |
integer, number of groups. |
p |
integer, number of variables. |
tolerance |
tolerance for sparsity. |
k |
integer or integer vector of which component should be used. |
scaled |
logical, if total number or percentage of possible sparse entries should be returned. |
Returns either a matrix of percentages (scaled = TRUE
) or the amounts
of zero-values entries (scaled = FALSE
) for each group/neighborhood.
PC = matrix(c(1,0,2,3,0,7,0,1,0,1,0.001,0), ncol = 2) sparsity_summary(PC, N = 2, p = 3, tolerance = 0, k = 1, scaled = FALSE) sparsity_summary(PC, N = 2, p = 3, tolerance = 0.001, k = 2, scaled = TRUE)
PC = matrix(c(1,0,2,3,0,7,0,1,0,1,0.001,0), ncol = 2) sparsity_summary(PC, N = 2, p = 3, tolerance = 0, k = 1, scaled = FALSE) sparsity_summary(PC, N = 2, p = 3, tolerance = 0.001, k = 2, scaled = TRUE)
The ssMRCD function calculates the spatially smoothed MRCD estimator from Puchhammer and Filzmoser (2023).
ssMRCD( x, groups = NULL, weights, lambda, TM = NULL, alpha = 0.75, maxcond = 50, maxcsteps = 200, n_initialhsets = NULL )
ssMRCD( x, groups = NULL, weights, lambda, TM = NULL, alpha = 0.75, maxcond = 50, maxcsteps = 200, n_initialhsets = NULL )
x |
a list of matrices containing the observations per neighborhood sorted which can be obtained by the function |
groups |
vector of neighborhood assignments |
weights |
weighting matrix, symmetrical, rows sum up to one and diagonals need to be zero (see also |
lambda |
numeric between 0 and 1. |
TM |
target matrix (optional), default value is the covMcd from robustbase. |
alpha |
numeric, proportion of values included, between 0.5 and 1. |
maxcond |
optional, maximal condition number used for rho-estimation. |
maxcsteps |
maximal number of c-steps before algorithm stops. |
n_initialhsets |
number of initial h-sets, default is 6 times number of neighborhoods. |
An object of class "ssMRCD"
containing the following elements:
MRCDcov |
List of ssMRCD-covariance matrices sorted by neighborhood. |
MRCDicov |
List of inverse ssMRCD-covariance matrices sorted by neighborhood. |
MRCDmu |
List of ssMRCD-mean vectors sorted by neighborhood. |
mX |
List of data matrices sorted by neighborhood. |
N |
Number of neighborhoods. |
mT |
Target matrix. |
rho |
Vector of regularization values sorted by neighborhood. |
alpha |
Scalar what percentage of observations should be used. |
h |
Vector of how many observations are used per neighborhood, sorted. |
numiter |
The number of iterations for the best initial h-set combination. |
c_alpha |
Consistency factor for normality. |
weights |
The weighting matrix. |
lambda |
Smoothing factor. |
obj_fun_values |
A matrix with objective function values for all initial h-set combinations (rows) and iterations (columns). |
best6pack |
initial h-set combinations with best objective function value after c-step iterations. |
Kcov |
returns MRCD-estimates without smoothing. |
Puchhammer P. and Filzmoser P. (2023): Spatially smoothed robust covariance estimation for local outlier detection. doi:10.48550/arXiv.2305.05371
plot.ssMRCD, summary.ssMRCD, restructure_as_list
# create data set x1 = matrix(runif(200), ncol = 2) x2 = matrix(rnorm(200), ncol = 2) x = list(x1, x2) # create weighting matrix W = matrix(c(0, 1, 1, 0), ncol = 2) # calculate ssMRCD ssMRCD(x, weights = W, lambda = 0.5)
# create data set x1 = matrix(runif(200), ncol = 2) x2 = matrix(rnorm(200), ncol = 2) x = list(x1, x2) # create weighting matrix W = matrix(c(0, 1, 1, 0), ncol = 2) # calculate ssMRCD ssMRCD(x, weights = W, lambda = 0.5)
Prints a summary of the locOuts object obtained by the function local_outliers_ssMRCD
.
## S3 method for class 'locOuts' summary(object, ...)
## S3 method for class 'locOuts' summary(object, ...)
object |
a locOuts object. |
... |
further parameters passed on. |
Prints a summary of the locOuts
object.
# set seed set.seed(1) # make locOuts object data = matrix(rnorm(2000), ncol = 4) coords = matrix(rnorm(1000), ncol = 2) groups = sample(1:10, 500, replace = TRUE) lambda = 0.3 # local outlier detection outs = local_outliers_ssMRCD(data = data, coords = coords, groups = groups, lambda = lambda, k = 10) # summary method summary(outs)
# set seed set.seed(1) # make locOuts object data = matrix(rnorm(2000), ncol = 4) coords = matrix(rnorm(1000), ncol = 2) groups = sample(1:10, 500, replace = TRUE) lambda = 0.3 # local outlier detection outs = local_outliers_ssMRCD(data = data, coords = coords, groups = groups, lambda = lambda, k = 10) # summary method summary(outs)
Summary method for PCAloc
## S3 method for class 'PCAloc' summary(object, ...)
## S3 method for class 'PCAloc' summary(object, ...)
object |
object of class PCAloc |
... |
other input variables. |
Summary for PCAloc
#' C1 = diag(c(1.1, 0.9, 0.6)) C2 = matrix(c(1.1, 0.1, -0.1, 0.1, 1.0, -0.2, -0.1, -0.2, 0.7), ncol = 3) C3 = (C1 + C2)/2 pca = sparsePCAloc(eta = 1, gamma = 0.5, cor = FALSE, COVS = list(C1, C2, C3), n_max = 100, increase_rho = list(FALSE, 100, 1), trace = FALSE) summary(pca)
#' C1 = diag(c(1.1, 0.9, 0.6)) C2 = matrix(c(1.1, 0.1, -0.1, 0.1, 1.0, -0.2, -0.1, -0.2, 0.7), ncol = 3) C3 = (C1 + C2)/2 pca = sparsePCAloc(eta = 1, gamma = 0.5, cor = FALSE, COVS = list(C1, C2, C3), n_max = 100, increase_rho = list(FALSE, 100, 1), trace = FALSE) summary(pca)
Summarises most important information of output ssMRCD
.
## S3 method for class 'ssMRCD' summary(object, ...)
## S3 method for class 'ssMRCD' summary(object, ...)
object |
object of class |
... |
further parameters. |
Prints a summary of the ssMRCD
object.
See also ssMRCD
, plot.ssMRCD
.
Band weight matrix for time series groupings
time_weights(N, off_diag)
time_weights(N, off_diag)
N |
number of groups. |
off_diag |
vector for off-diagonal values unequal to zero. |
Returns weight matrix for time series groups appropriate for ssMRCD
.
time_weights(N = 10, off_diag = c(2,1))
time_weights(N = 10, off_diag = c(2,1))
This data is a subset of the GeoSphere Austria monthly weather data of 2021 averaged using the median. Stations with missing values are removed.
weatherAUT2021
weatherAUT2021
A data frame with 183 rows and 10 columns:
Unique name of the weather station in German.
Longitude and latitude of the weather station.
Altitude of the weather station (meter).
Average air pressure (hPa).
Monthly sum of sunshine duration (hours).
Wind velocity (meter/second).
Air temperature in 2 meters above the ground in (°C).
Average daily sum of precipitation (mm).
Relative air humidity (percent).
The original data was downloaded here (December 2022): https://data.hub.geosphere.at/dataset/klima-v1-1m.
Data Source: GeoSphere Austria - https://data.hub.geosphere.at.
data(weatherAUT2021) summary(weatherAUT2021)
data(weatherAUT2021) summary(weatherAUT2021)
This data is a subset of the GeoSphere Austria daily weather data of the time 1960-2023 for the weather station Hohe Warte in Vienna.
weatherHoheWarte
weatherHoheWarte
A data frame with 23372 rows and 18 columns including 13 weather measurements:
Time of measurement in date format.
Daily mean of cloud coverage, values between 1 and 100.
Daily sum of global radiation (J/cm^2).
Daily mean of vapour pressuer (hPa).
Maximal wind speed (m/s).
Daily mean of air pressure (hPa).
Daily mean of relative humidity (percent).
Daily sum of precepitation (mm).
Sight distance at 1pm (m).
Daily sum of sunshine duration (h).
Daily maximum of temperature at 2m air height (°C).
Daily minimum of temperature at 2m air height (°C).
Daily mean of temperature at 2m air height (°C).
Daily mean of wind speed (m/s).
Year of measurement.
Month of measurement.
Day of the year of measurement.
Season of measuremen (1 = winter, 2 = spring, 3 = summer, 4 = fall).
The original data was downloaded here (April 2024): https://data.hub.geosphere.at/dataset/klima-v2-1d.
Data Source: GeoSphere Austria - https://data.hub.geosphere.at.
data(weatherHoheWarte) summary(weatherHoheWarte)
data(weatherHoheWarte) summary(weatherHoheWarte)