| Title: | Analysis of Interval DAta |
|---|---|
| Description: | Tools for the analysis of interval-valued data, including construction, visualization, and statistical modeling. The package provides the 'intData' class for representing interval-valued data, along with functions to aggregate microdata and to estimate parameters of latent distributions. Barycenter and covariance matrix estimation is implemented based on the Mallows distance (Oliveira et al. (2025) <doi:10.48550/arXiv.2407.05105>). Robust estimation of the symbolic covariance matrix is implemented via the Interval Minimum Covariance Determinant (IMCD) estimator, enabling outlier detection based on the robust squared Interval-Mahalanobis distance, as proposed by Loureiro et al. (2026) <doi:10.48550/arXiv.2604.26769>. |
| Authors: | Catarina P. Loureiro [aut, cre] (ORCID: <https://orcid.org/0009-0001-9464-5824>) |
| Maintainer: | Catarina P. Loureiro <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 0.1.5 |
| Built: | 2026-05-12 23:03:17 UTC |
| Source: | https://github.com/cran/AIDA |
Extract a subset of rows and columns from an intData object.
## S4 method for signature 'intData' x[i, j, ..., drop = TRUE]## S4 method for signature 'intData' x[i, j, ..., drop = TRUE]
x |
An intData object. |
i |
Row indices or names to subset. Defaults to all rows. |
j |
Column indices or names to subset. Defaults to all columns. |
... |
Additional arguments (not used). |
drop |
Logical, passed to the underlying |
An intData object containing the specified subset of rows and columns.
Compare two intData objects for equality.
Compare two intData objects for inequality.
## S4 method for signature 'intData,intData' e1 == e2 ## S4 method for signature 'intData,intData' e1 != e2## S4 method for signature 'intData,intData' e1 == e2 ## S4 method for signature 'intData,intData' e1 != e2
e1 |
An intData object. |
e2 |
An intData object. |
A logical matrix indicating which elements are equal between the two intData objects.
A logical matrix indicating element-wise inequality of the two intData objects.
Computes the angle error between eigenvalues of the estimated covariance matrix and of the ground truth covariance matrix.
angle_error(est_cov, ground_truth_cov)angle_error(est_cov, ground_truth_cov)
est_cov |
Estimated covariance matrix. |
ground_truth_cov |
Ground truth covariance matrix. |
The angle error is given by:
where and are the eigenvalues of the estimated and ground truth covariance matrices, respectively.
Angle error between eigenvalues.
Centers Method for intData
Centers(Sdt) ## S4 method for signature 'intData' Centers(Sdt)Centers(Sdt) ## S4 method for signature 'intData' Centers(Sdt)
Sdt |
An object of class intData. |
A data.frame containing the centers of the intervals.
Column Names Method for intData
## S4 method for signature 'intData' colnames(x)## S4 method for signature 'intData' colnames(x)
x |
An object of class intData. |
A character vector of column names.
This dataset contains interval data of credit card expenses, including min-max values, centers and ranges, microdata, and an intData object. It is composed of 5 variables: Food, Social, Travel, Gas, and Clothes. It was aggregated by person-month.
data(creditcard)data(creditcard)
A list with the following components:
microdataA data frame with 1000 rows and 7 columns. It contains the microdata, with individual measurements of each variable for all observations.
min_maxA data frame with 36 rows and 10 columns. Each row corresponds to a different observation, and each column gives the minimum and maximum values for each variable.
centers_rangesA data frame with 36 rows and 10 columns. Each row corresponds to the centers and ranges of the interval data.
intDataAn intData object with 36 interval-valued observations and 5 variables, constructed assuming the microdata follow symmetric triangular distributions.
This data was retrieved from Billard, L. and Diday, E. (2006). Symbolic Data Analysis: Conceptual Statistics and Data Mining. John Wiley & Sons. doi:10.1002/9780470090183.
data(creditcard) head(creditcard$min_max) head(creditcard$microdata) head(creditcard$intData)data(creditcard) head(creditcard$min_max) head(creditcard$microdata) head(creditcard$intData)
Dimensions Method for intData
## S4 method for signature 'intData' dim(x)## S4 method for signature 'intData' dim(x)
x |
An object of class intData. |
A vector of the number of rows and columns.
This dataset contains interval data of air pollutants' concentrations, including min-max values and microdata.
This air quality dataset was obtained from a monitoring station in Entrecampos, Lisbon.
It is composed of 9 pollutants' concentration measures in µg/m3 during the years 2019, 2020, and 2021: sulphur dioxide (SO2), particles < 10µm, ozone (O3), nitrogen dioxide (NO2), carbon monoxide (CO), benzene (C6H6), particles < 2.5µm, nitrogen oxides (NOx), and nitrogen monoxide (NO).
For the microdata_transformed, min_max, and intData, the pollutant "benzene" was removed due to a high number of missing values.
The aggregation of the microdata was done by day.
data(entrecampos_air_quality)data(entrecampos_air_quality)
A list with the following components:
microdata_rawA data frame with 26304 rows and 11 columns. It contains the raw microdata, with individual measurements of each variable for all observations.
microdata_transformedA data frame with 26304 rows and 10 columns. It contains the microdata, with individual measurements of each variable for all observations. Logarithmic transformations were applied to all variables and interpolation to deal with missing values.
min_maxA data frame with 1096 rows and 17 columns. Each row corresponds to a different observation, and each column gives the minimum and maximum values for each variable. The first column corresponds to the day, the next 8 to the minimum and the last 8 to the maximum.
intDataAn intData object, constructed using KDE for estimating the parameters of the latent distributions.
This data was retrieved from the Portuguese Environment Agency database available at https://qualar.apambiente.pt/.
data(entrecampos_air_quality) head(entrecampos_air_quality$microdata_raw) head(entrecampos_air_quality$microdata_transformed) head(entrecampos_air_quality$min_max) head(entrecampos_air_quality$intData)data(entrecampos_air_quality) head(entrecampos_air_quality$microdata_raw) head(entrecampos_air_quality$microdata_transformed) head(entrecampos_air_quality$min_max) head(entrecampos_air_quality$intData)
Estimate farness from a distance vector in order to identify outlier observations.
farness(dist, cutoff_value = NULL)farness(dist, cutoff_value = NULL)
dist |
Vector of distances of each observation. |
cutoff_value |
Optional cutoff value between 0 and 1 to flag outliers. If provided, the function returns both the farness probabilities and the cutoff distance value in the original distance scale. |
Farness of each observation. Values between 0 and 1. If cutoff_value is provided, a list with the farness probabilities and the cutoff distance value in the original distance scale is returned.
J. Raymaekers and P.J. Rousseeuw (2021). Transforming variables to central normality. Machine Learning. doi:10.1007/s10994-021-05960-5
Based on the cellWise package: Raymaekers J, Rousseeuw P (2023). cellWise: Analyzing Data with Cellwise Outliers. R package version 2.5.3, https://CRAN.R-project.org/package=cellWise.
data(creditcard) credit_card_int <- creditcard$intData # Compute squared Interval-Mahalanobis distance z <- rep(1, nrow(credit_card_int)) credit_card_dist<-IMah_dist(credit_card_int,z) credit_card_farness <- farness(credit_card_dist, 0.9)data(creditcard) credit_card_int <- creditcard$intData # Compute squared Interval-Mahalanobis distance z <- rep(1, nrow(credit_card_int)) credit_card_dist<-IMah_dist(credit_card_int,z) credit_card_farness <- farness(credit_card_dist, 0.9)
Computes the relative Frobenius error between an estimated covariance matrix and the ground truth.
frobenius_error(est_cov, ground_truth_cov)frobenius_error(est_cov, ground_truth_cov)
est_cov |
Estimated covariance matrix. |
ground_truth_cov |
Ground truth covariance matrix. |
The relative Frobenius error is given by:
where and are the estimated and ground truth covariance matrices, respectively.
Frobenius error between the two matrices.
Obtain the parameters of the latent variables inherent to the macrodata.
get_latent_param( LatentCase = c("U_id_symmetric", "U_id", "General"), LatentDist = c("Unif", "Triang", "TNorm", "InvTri", "Beta", "KDE", "Degenerated"), TriangParam = 0, BetaParam.a = 1, BetaParam.b = 1, Umicro = NULL, p = NULL, estimate.DistParam = FALSE )get_latent_param( LatentCase = c("U_id_symmetric", "U_id", "General"), LatentDist = c("Unif", "Triang", "TNorm", "InvTri", "Beta", "KDE", "Degenerated"), TriangParam = 0, BetaParam.a = 1, BetaParam.b = 1, Umicro = NULL, p = NULL, estimate.DistParam = FALSE )
LatentCase |
A string specifying which of the three scenarios applies to the latent variables:
Defaults to |
LatentDist |
A string or vector of strings specifying the distribution(s) of the latent variables. If the variables are identically distributed it can be one of ( |
TriangParam |
Mode of the triangular distribution. If the latent variables are identically distributed, it is only necessary to provide a number, if not a vector is needed.
The default is |
BetaParam.a |
Parameter alpha of the Beta distribution. If the latent variables are identically distributed, it is only necessary to provide a number, if not a vector is needed.
The default is |
BetaParam.b |
Parameter beta of the Beta distribution. If the latent variables are identically distributed, it is only necessary to provide a number, if not a vector is needed.
The default is |
Umicro |
Latent microdata observations. Needed if |
p |
Number of variables. |
estimate.DistParam |
Logical parameter indicating if estimation of the parameters of the latent distributions should be performed. Can only be set to TRUE if |
The parameters of the latent variables inherent to the macrodata are defined according to the LatentCase:
"U_id_symmetric": The latent variables are identically distributed and symmetric, so its parameters are:
"U_id": The latent variables are identically distributed, so its parameters are:
"General": The latent variables do not have any nice properties, so its parameters are:
, , with , and ,
A list with the parameters of the latent variables.
Oliveira, M. R., Pinheiro, D., & Oliveira, L. (2025). Location and association measures for interval-valued data based on Mallows' distance. arXiv preprint arXiv:2407.05105. https://arxiv.org/abs/2407.05105
data(creditcard) CreditCard_min_max <- creditcard$min_max CreditCard_microdata <- creditcard$microdata credit_agrby<-paste(CreditCard_microdata$Name,CreditCard_microdata$Month,sep = "_") credit_card_U<-get_latent_var(CreditCard_microdata[,3:7], CreditCard_min_max, credit_agrby, agrlevels = row.names(CreditCard_min_max), Seq="LbUb_VarbyVar") credit_card_param<-get_latent_param(LatentCase="General",LatentDist="KDE",Umicro=credit_card_U)data(creditcard) CreditCard_min_max <- creditcard$min_max CreditCard_microdata <- creditcard$microdata credit_agrby<-paste(CreditCard_microdata$Name,CreditCard_microdata$Month,sep = "_") credit_card_U<-get_latent_var(CreditCard_microdata[,3:7], CreditCard_min_max, credit_agrby, agrlevels = row.names(CreditCard_min_max), Seq="LbUb_VarbyVar") credit_card_param<-get_latent_param(LatentCase="General",LatentDist="KDE",Umicro=credit_card_U)
Obtain the latent variables inherent to the macrodata.
get_latent_var( microdata, macrodata, agrby, agrlevels, Seq = c("AllLb_AllUb", "AllCen_AllRng", "LbUb_VarbyVar", "CenRng_VarbyVar") )get_latent_var( microdata, macrodata, agrby, agrlevels, Seq = c("AllLb_AllUb", "AllCen_AllRng", "LbUb_VarbyVar", "CenRng_VarbyVar") )
microdata |
A matrix containing the microdata. |
macrodata |
A data frame, matrix or intData object containing the macrodata/interval data. |
agrby |
A factor used to specify the grouping of the microdata. |
agrlevels |
The categories/levels on which the microdata was aggregated. |
Seq |
Format of macrodata if it is a data frame or matrix. Available options are:
|
The latent variables, , are defined according to the following model:
Let represent the macrodata and
the microdata with being random variables with support on , uncorrelated with .
A matrix with the same size as the microdata.
Oliveira, M.R., Azeitona, M., Pacheco, A., Valadas, R.. Association measures for interval variables. Advances in Data Analysis and Classification 16, 491–520 (2022). doi:10.1007/s11634-021-00445-8
data(creditcard) CreditCard_min_max <- creditcard$min_max CreditCard_microdata <- creditcard$microdata credit_agrby<-paste(CreditCard_microdata$Name,CreditCard_microdata$Month,sep = "_") credit_card_U<-get_latent_var(CreditCard_microdata[,3:7], CreditCard_min_max, credit_agrby, agrlevels = row.names(CreditCard_min_max), Seq="LbUb_VarbyVar")data(creditcard) CreditCard_min_max <- creditcard$min_max CreditCard_microdata <- creditcard$microdata credit_agrby<-paste(CreditCard_microdata$Name,CreditCard_microdata$Month,sep = "_") credit_card_U<-get_latent_var(CreditCard_microdata[,3:7], CreditCard_min_max, credit_agrby, agrlevels = row.names(CreditCard_min_max), Seq="LbUb_VarbyVar")
Returns the first n rows of an intData object.
## S4 method for signature 'intData' head(x, n = min(nrow(x), 6L))## S4 method for signature 'intData' head(x, n = min(nrow(x), 6L))
x |
An intData object. |
n |
The number of rows to return. |
A subset of the intData object.
Calculate the squared Interval-Mahalanobis distance of all rows in the data and the barycenter.
IMah_dist(data, z = NULL, mean_c = NULL, mean_r = NULL, cov = NULL)IMah_dist(data, z = NULL, mean_c = NULL, mean_r = NULL, cov = NULL)
data |
An intData object containing the macrodata/interval data |
z |
A vector of 0 and 1, indicating which observations should be considered for the calculation.
You must provide either |
mean_c |
The mean vector of the centers |
mean_r |
The mean vector of the ranges |
cov |
The symbolic covariance matrix |
The squared Interval-Mahalanobis distance is defined according to the LatentCase:
"U_id_symmetric": The latent variables are identically distributed and symmetric:
where is the parameter of the latent variables.
"U_id": The latent variables are identically distributed:
where and are the parameter of the latent variables.
"General": The latent variables do not have any nice properties:
where:
,
, , with ,
, ,
denotes the Schur (or entrywise) product of matrices.
A vector with the squared Interval-Mahalanobis distance of each observation.
Loureiro, C. P., Oliveira, M. R., Brito, P., & Oliveira, L. (2026). Minimum Covariance Determinant Estimator and Outlier Detection for Interval-valued Data. arXiv preprint arXiv:2604.26769. https://arxiv.org/abs/2604.26769
data(creditcard) credit_card_int <- creditcard$intData z <- rep(1, nrow(credit_card_int)) credit_card_dist<-IMah_dist(credit_card_int,z)data(creditcard) credit_card_int <- creditcard$intData z <- rep(1, nrow(credit_card_int)) credit_card_dist<-IMah_dist(credit_card_int,z)
Calculate the squared Interval-Mahalanobis distance of all pairs of observations in the data.
IMah_dist_pairs(data, cov = NULL)IMah_dist_pairs(data, cov = NULL)
data |
An intData object containing the macrodata/interval data |
cov |
The symbolic covariance matrix |
The squared Interval-Mahalanobis distance is defined according to the LatentCase:
"U_id_symmetric": The latent variables are identically distributed and symmetric:
where is the parameter of the latent variables.
"U_id": The latent variables are identically distributed:
where and are the parameter of the latent variables.
"General": The latent variables do not have any nice properties:
where:
,
, , with ,
, ,
denotes the Schur (or entrywise) product of matrices.
A matrix with the squared Interval-Mahalanobis distance of each pair of observations.
Loureiro, C. P., Oliveira, M. R., Brito, P., & Oliveira, L. (2026). Minimum Covariance Determinant Estimator and Outlier Detection for Interval-valued Data. arXiv preprint arXiv:2604.26769. https://arxiv.org/abs/2604.26769
data(creditcard) credit_card_int <- creditcard$intData credit_card_dist<-IMah_dist_pairs(credit_card_int)data(creditcard) credit_card_int <- creditcard$intData credit_card_dist<-IMah_dist_pairs(credit_card_int)
Applies an adaptation of the FAST-MCD algorithm to estimate location and scatter for interval-valued data.
IMCD( data, m = 0, cutoff = c("farness", "adjbox", "chi-squared", "F-dist", "raw"), cutoff_lvl = NULL )IMCD( data, m = 0, cutoff = c("farness", "adjbox", "chi-squared", "F-dist", "raw"), cutoff_lvl = NULL )
data |
An intData object containing the interval-valued dataset (macrodata). |
m |
An integer specifying the subset size to use for the estimation. Defaults to |
cutoff |
Indicates which cutoff should be considered for reweighting the estimates:
Defaults to |
cutoff_lvl |
A numeric value specifying the level of the cutoff to be used.
If no value is provided, the function uses the default values associated with each cutoff method. |
A list containing the robustly estimated parameters:
mean_IMCD_c |
Estimated mean of the centers of the interval data. |
mean_IMCD_r |
Estimated mean of the ranges of the interval data. |
cov_IMCD |
Estimated covariance (scatter) matrix ( |
final_z |
Binary vector indicating the inclusion of each observation in the reweighted subset. |
cutoff |
The cutoff method used for reweighting. |
cutoff_value |
Cutoff value used for reweighting. |
robust_dist |
Robust distances ( |
farness_probs |
Farness probabilities (if |
Loureiro, C. P., Oliveira, M. R., Brito, P., & Oliveira, L. (2026). Minimum Covariance Determinant Estimator and Outlier Detection for Interval-valued Data. arXiv preprint arXiv:2604.26769. https://arxiv.org/abs/2604.26769
Adapted from https://github.com/frankp-0/fastMCD.
The case cutoff=="F-dist" is adapted from package CerioliOutlierDetection (https://cran.r-project.org/package=CerioliOutlierDetection).
# Example using creditcard dataset data(creditcard) credit_card_int <- creditcard$intData credit_card_IMCD <- IMCD(credit_card_int, floor(0.75*credit_card_int@NObs), "farness", 0.9)# Example using creditcard dataset data(creditcard) credit_card_int <- creditcard$intData credit_card_IMCD <- IMCD(credit_card_int, floor(0.75*credit_card_int@NObs), "farness", 0.9)
Calculate the interval-valued covariance matrix based on the covariance matrices of the centers and ranges or data.
int_cov( data = NULL, sigma_cc = NULL, sigma_rr = NULL, sigma_cr = NULL, LatentParam = NULL, LatentCase = c("U_id_symmetric", "U_id", "General") )int_cov( data = NULL, sigma_cc = NULL, sigma_rr = NULL, sigma_cr = NULL, LatentParam = NULL, LatentCase = c("U_id_symmetric", "U_id", "General") )
data |
An intData object containing the macrodata/interval data. |
sigma_cc |
Covariance matrix of the centers. |
sigma_rr |
Covariance matrix of the ranges. |
sigma_cr |
Covariance matrix between the centers and ranges. |
LatentParam |
A list with the parameters of the latent variables. |
LatentCase |
A string specifying which of the three scenarios applies to the latent variables:
Defaults to |
This function calculates the interval-valued covariance matrix, , based on the covariance matrices of the centers, , ranges, , and the covariance matrix between the centers and ranges, .
The covariance matrix is defined according to the LatentCase:
"U_id_symmetric": The latent variables are identically distributed and symmetric:
where is the parameter of the latent variables.
"U_id": The latent variables are identically distributed:
where and are the parameters of the latent variables.
"General": The latent variables do not have any nice properties:
where:
,
, , with ,
, ,
denotes the Schur (or entrywise) product of matrices.
The symbolic covariance matrix.
Oliveira, M. R., Pinheiro, D., & Oliveira, L. (2025). Location and association measures for interval-valued data based on Mallows' distance. arXiv preprint arXiv:2407.05105. https://arxiv.org/abs/2407.05105
data(creditcard) credit_card_int <- creditcard$intData credit_card_cov<-int_cov(credit_card_int)data(creditcard) credit_card_int <- creditcard$intData credit_card_cov<-int_cov(credit_card_int)
Calculate the interval-valued covariance matrix in function of z
int_cov_z(z, data)int_cov_z(z, data)
z |
A vector of 0 and 1, indicating which observations should be considered for the calculation |
data |
An intData object containing the macrodata/interval data |
Let be a vector indicating which observations are “active”. This function calculates the sample interval-valued covariance matrix in function of : .
Let , be the matrices of centers and ranges, respectively. Additionally, set:
The sample interval-valued covariance matrix is obtained according to the LatentCase:
"U_id_symmetric": The latent variables are identically distributed and symmetric:
where is the parameter of the latent variables.
"U_id": The latent variables are identically distributed:
where and are the parameters of the latent variables.
"General": The latent variables do not have any nice properties:
where:
,
, , with ,
, ,
denotes the Schur (or entrywise) product of matrices.
The symbolic covariance matrix
Oliveira, M. R., Pinheiro, D., & Oliveira, L. (2025). Location and association measures for interval-valued data based on Mallows' distance. arXiv preprint arXiv:2407.05105. https://arxiv.org/abs/2407.05105
Loureiro, C. P., Oliveira, M. R., Brito, P., & Oliveira, L. (2026). Minimum Covariance Determinant Estimator and Outlier Detection for Interval-valued Data. arXiv preprint arXiv:2604.26769. https://arxiv.org/abs/2604.26769
data(creditcard) credit_card_int <- creditcard$intData z <- rep(1, nrow(credit_card_int)) credit_card_cov<-int_cov_z(z,credit_card_int)data(creditcard) credit_card_int <- creditcard$intData z <- rep(1, nrow(credit_card_int)) credit_card_cov<-int_cov_z(z,credit_card_int)
Calculate the mean of X in function of z
int_mean_z(z, X)int_mean_z(z, X)
z |
A vector of 0 and 1, indicating which observations should be considered for the calculation |
X |
A matrix where the rows correspond to observations and the columns to variables |
This function calculates the mean of in function of . If is a vector of 0 and 1, the mean is calculated for the observations that are equal to 1:
A vector where each element is the mean for each variable
n <- 100 p <- 4 X <- matrix(rnorm(n * p), ncol = p) #if we consider all the observations the result obtained is the same as colMeans() z <- c(rep(1, n)) int_mean_z(z, X) colMeans(X)n <- 100 p <- 4 X <- matrix(rnorm(n * p), ncol = p) #if we consider all the observations the result obtained is the same as colMeans() z <- c(rep(1, n)) int_mean_z(z, X) colMeans(X)
Identifies potential outliers in interval-valued data using robust distance-based methods with customizable cutoff criteria.
int_outliers( robust_dist, cutoff = c("farness", "adjbox", "chi-squared", "F-dist"), cutoff_lvl = NULL, p = NULL, z = NULL )int_outliers( robust_dist, cutoff = c("farness", "adjbox", "chi-squared", "F-dist"), cutoff_lvl = NULL, p = NULL, z = NULL )
robust_dist |
A numeric vector containing the robust distances for each observation. |
cutoff |
A character string specifying the method for setting the outlier cutoff threshold. Options include:
Default is |
cutoff_lvl |
A numeric value specifying the level of the cutoff to be used.
If no value is provided, the function uses the default values associated with each cutoff method. |
p |
The number of variables in the data. Required for |
z |
A binary vector indicating the subset of observations used for initial robust estimation. Required for the |
This function classifies observations as outliers based on robust distances and user-defined cutoff methods. It supports various approaches, including Chi-Squared quantiles, adjusted boxplots, F distribution quantiles, and farness probabilities.
A list with the following components:
outliers_names |
Character vector of names for observations classified as outliers. |
is_outlier |
Logical vector indicating whether each observation is an outlier (TRUE) or not (FALSE). |
cutoff |
The cutoff method used for detecting outliers. |
cutoff_value |
Cutoff value used for detecting outliers. |
farness_probs |
Numeric vector of farness probabilities for each observation (only if |
Loureiro, C. P., Oliveira, M. R., Brito, P., & Oliveira, L. (2026). Minimum Covariance Determinant Estimator and Outlier Detection for Interval-valued Data. arXiv preprint arXiv:2604.26769. https://arxiv.org/abs/2604.26769
Case cutoff=="F-dist" is adapted from package CerioliOutlierDetection (https://cran.r-project.org/package=CerioliOutlierDetection).
# Example of detecting outliers using robust distances set.seed(42) robust_dist <- abs(rnorm(100)) result <- int_outliers(robust_dist, cutoff="chi-squared", p=5) # Example using creditcard dataset data(creditcard) credit_card_int <- creditcard$intData credit_card_IMCD <- IMCD(credit_card_int, floor(0.75*credit_card_int@NObs), "farness", 0.9) credit_card_outliers <- int_outliers(credit_card_IMCD$robust_dist, "farness", 0.9)# Example of detecting outliers using robust distances set.seed(42) robust_dist <- abs(rnorm(100)) result <- int_outliers(robust_dist, cutoff="chi-squared", p=5) # Example using creditcard dataset data(creditcard) credit_card_int <- creditcard$intData credit_card_IMCD <- IMCD(credit_card_int, floor(0.75*credit_card_int@NObs), "farness", 0.9) credit_card_outliers <- int_outliers(credit_card_IMCD$robust_dist, "farness", 0.9)
This dataset contains interval data of car specifications, including min-max values. It is composed of 5 variables: Engine Capacity, Top Speed, Acceleration, Price and Class. The aggregation of the microdata was done by car model.
data(intCars)data(intCars)
A list with the following components:
min_maxA data frame with 27 rows and 9 columns. It contains the lower and upper bounds for each variable.
intDataAn intData object with 27 interval-valued observations and 4 variables. The variable "Price" was log-transformed into "lnPrice". The microdata are not available, thus the default parameters of the latent distributions were used assuming a uniform distribution.
This data was retrieved from the MAINT.Data package, available at https://cran.r-project.org/package=MAINT.Data.
data(intCars) head(intCars$min_max) head(intCars$intData)data(intCars) head(intCars$min_max) head(intCars$intData)
Constructs an interval data object.
intData( Data, Seq = c("AllLb_AllUb", "AllCen_AllRng", "LbUb_VarbyVar", "CenRng_VarbyVar"), LatentParam = NULL, LatentCase = c("U_id_symmetric", "U_id", "General"), LatentDist = c("Unif", "Triang", "TNorm", "InvTri", "Beta", "KDE", "Degenerated"), TriangParam = 0, BetaParam.a = 1, BetaParam.b = 1, Umicro = NULL, estimate.DistParam = FALSE, VarNames = NULL, ObsNames = row.names(Data), NbMicroUnits = integer(0) )intData( Data, Seq = c("AllLb_AllUb", "AllCen_AllRng", "LbUb_VarbyVar", "CenRng_VarbyVar"), LatentParam = NULL, LatentCase = c("U_id_symmetric", "U_id", "General"), LatentDist = c("Unif", "Triang", "TNorm", "InvTri", "Beta", "KDE", "Degenerated"), TriangParam = 0, BetaParam.a = 1, BetaParam.b = 1, Umicro = NULL, estimate.DistParam = FALSE, VarNames = NULL, ObsNames = row.names(Data), NbMicroUnits = integer(0) )
Data |
A data frame or matrix containing the data. |
Seq |
Format of macrodata if it is a data frame or matrix. Available options are:
|
LatentParam |
A list with the parameters of the latent variables. |
LatentCase |
A string specifying which of the three scenarios applies to the latent variables:
Defaults to |
LatentDist |
A string or vector of strings specifying the distribution(s) of the latent variables. If the variables are identically distributed it can be one of ( |
TriangParam |
Mode of the triangular distribution. If the latent variables are identically distributed, it is only necessary to provide a number, if not a vector is needed.
The default is |
BetaParam.a |
Parameter alpha of the Beta distribution. If the latent variables are identically distributed, it is only necessary to provide a number, if not a vector is needed.
The default is |
BetaParam.b |
Parameter beta of the Beta distribution. If the latent variables are identically distributed, it is only necessary to provide a number, if not a vector is needed.
The default is |
Umicro |
Latent microdata observations. Needed if |
estimate.DistParam |
Logical parameter indicating if estimation of the parameters of the latent distributions should be performed. Can only be set to TRUE if |
VarNames |
A character vector of variable names. |
ObsNames |
A character vector of observation names. |
NbMicroUnits |
An integer specifying the number of micro units. |
An object of class intData.
Oliveira, M. R., Pinheiro, D., & Oliveira, L. (2025). Location and association measures for interval-valued data based on Mallows' distance. arXiv preprint arXiv:2407.05105. https://arxiv.org/abs/2407.05105
Adapted from package MAINT.Data (https://cran.r-project.org/package=MAINT.Data).
A class to represent interval data.
CentersA data frame of centers of the intervals.
RangesA data frame of ranges of the intervals.
LatentParamA list with the parameters of the latent variables.
LatentCaseA string specifying which of the three scenarios applies to the latent variables:
"General": The case where the latent variables do not have any nice properties.
"U_id": The case where the latent variables are identically distributed.
"U_id_symmetric": The case where the latent variables are identically distributed and symmetric.
Defaults to "U_id_symmetric".
LatentDistA string or vector of strings specifying the distribution(s) of the latent variables. If the variables are identically distributed it can be one of ("Unif","Triang","TNorm","InvTri","Beta","KDE","Degenerated"), if not, it is a vector with the distribution for each variable.
ObsNamesA character vector of observation names.
VarNamesA character vector of variable names.
NObsA numeric value indicating the number of observations.
NIVarA numeric value indicating the number of interval variables.
NbMicroUnitsAn integer indicating the number of micro units.
Oliveira, M. R., Pinheiro, D., & Oliveira, L. (2025). Location and association measures for interval-valued data based on Mallows' distance. arXiv preprint arXiv:2407.05105. https://arxiv.org/abs/2407.05105
Adapted from package MAINT.Data (https://cran.r-project.org/package=MAINT.Data).
Computes the Kullback-Leibler (KL) divergence between an estimated covariance matrix and the ground truth. Assumes normal multivariate distributions.
KL_divergence(est_cov, ground_truth_cov)KL_divergence(est_cov, ground_truth_cov)
est_cov |
Estimated covariance matrix. |
ground_truth_cov |
Ground truth covariance matrix. |
The KL divergence between two -dimensional Gaussians and is given by:
where and are the estimated and ground truth covariance matrices, respectively.
KL divergence between the two matrices.
Yufeng Zhang, Wanwei Liu, Zhenbang Chen, Ji Wang, and Kenli Li. On the properties of Kullback-Leibler divergence between multivariate gaussian distributions, 2023. https://arxiv.org/abs/2102.05485
Latent Case Method for intData
LatentCase(Sdt) ## S4 method for signature 'intData' LatentCase(Sdt)LatentCase(Sdt) ## S4 method for signature 'intData' LatentCase(Sdt)
Sdt |
An object of class intData. |
A character with the latent case.
Latent Distribution Method for intData
LatentDist(Sdt) ## S4 method for signature 'intData' LatentDist(Sdt)LatentDist(Sdt) ## S4 method for signature 'intData' LatentDist(Sdt)
Sdt |
An object of class intData. |
A character with the latent distribution(s).
Latent Parameters Method for intData
LatentParam(Sdt) ## S4 method for signature 'intData' LatentParam(Sdt)LatentParam(Sdt) ## S4 method for signature 'intData' LatentParam(Sdt)
Sdt |
An object of class intData. |
A list with the latent parameters.
LogRanges Method for intData
LogRanges(Sdt) ## S4 method for signature 'intData' LogRanges(Sdt)LogRanges(Sdt) ## S4 method for signature 'intData' LogRanges(Sdt)
Sdt |
An object of class intData. |
A data.frame containing the logarithms of the ranges.
Lower Bounds Method for intData
LowerBounds(Sdt) ## S4 method for signature 'intData' LowerBounds(Sdt)LowerBounds(Sdt) ## S4 method for signature 'intData' LowerBounds(Sdt)
Sdt |
An object of class intData. |
A data.frame containing the lower bounds of the intervals.
Calculate the squared Mallows distance between all rows in data and the barycenter.
Mallows_dist(data, mean_c = NULL, mean_r = NULL)Mallows_dist(data, mean_c = NULL, mean_r = NULL)
data |
An intData object containing the macrodata/interval data |
mean_c |
The mean vector of the centers |
mean_r |
The mean vector of the ranges |
The squared Mallows distance is defined according to the LatentCase:
"U_id_symmetric": The latent variables are identically distributed and symmetric:
where is the parameter of the latent variables.
"U_id": The latent variables are identically distributed:
where and are the parameter of the latent variables.
"General": The latent variables do not have any nice properties:
where:
,
.
A vector with the squared Mallows distance of each observation.
Oliveira, M. R., Pinheiro, D., & Oliveira, L. (2025). Location and association measures for interval-valued data based on Mallows' distance. arXiv preprint arXiv:2407.05105. https://arxiv.org/abs/2407.05105
data(creditcard) credit_card_int <- creditcard$intData credit_card_dist<-Mallows_dist(credit_card_int)data(creditcard) credit_card_int <- creditcard$intData credit_card_dist<-Mallows_dist(credit_card_int)
Aggregates microdata from a data frame into interval-valued data using various criteria and latent distribution settings.
micro2intData( MicDtDF, agrby, agrcrt = "minmax", LatentParam = NULL, LatentCase = c("U_id_symmetric", "U_id", "General"), LatentDist = c("Unif", "Triang", "TNorm", "InvTri", "Beta", "KDE", "Degenerated"), TriangParam = 0, BetaParam.a = 1, BetaParam.b = 1, estimate.DistParam = FALSE )micro2intData( MicDtDF, agrby, agrcrt = "minmax", LatentParam = NULL, LatentCase = c("U_id_symmetric", "U_id", "General"), LatentDist = c("Unif", "Triang", "TNorm", "InvTri", "Beta", "KDE", "Degenerated"), TriangParam = 0, BetaParam.a = 1, BetaParam.b = 1, estimate.DistParam = FALSE )
MicDtDF |
A data frame containing the microdata. All columns should be numeric. |
agrby |
A factor used to specify the grouping of the microdata for aggregation. |
agrcrt |
A string or numeric vector of length 2 specifying the aggregation criterion. The default is |
LatentParam |
Optional latent parameter used for certain types of latent distributions. |
LatentCase |
A string specifying which of the three scenarios applies to the latent variables:
Defaults to |
LatentDist |
A string or vector of strings specifying the distribution(s) of the latent variables. If the variables are identically distributed it can be one of ( |
TriangParam |
Mode of the triangular distribution. If the latent variables are identically distributed, it is only necessary to provide a number, if not a vector is needed.
The default is |
BetaParam.a |
Parameter alpha of the Beta distribution. If the latent variables are identically distributed, it is only necessary to provide a number, if not a vector is needed.
The default is |
BetaParam.b |
Parameter beta of the Beta distribution. If the latent variables are identically distributed, it is only necessary to provide a number, if not a vector is needed.
The default is |
estimate.DistParam |
Logical parameter indicating if estimation of the parameters of the latent distributions should be performed. Can only be set to TRUE if |
This function processes a data frame of microdata and aggregates it into interval-valued data according to the specified grouping factor and aggregation criteria. It can handle different latent distribution cases and parameter settings.
If some rows contain invalid (non-finite or missing) values, those rows are removed before aggregation. If all rows in the resulting interval-valued data are degenerate (i.e., the lower bound equals the upper bound), the function will return NULL.
An intData object containing the aggregated interval-valued data, or NULL if all units lead to degenerate intervals.
Adapted from package MAINT.Data (https://cran.r-project.org/package=MAINT.Data).
data(creditcard) CreditCard_microdata <- creditcard$microdata credit_agrby<-factor(paste(CreditCard_microdata$Name,CreditCard_microdata$Month,sep = "_")) credit_agr<-micro2intData(CreditCard_microdata[,3:7],credit_agrby,LatentCase = "General")data(creditcard) CreditCard_microdata <- creditcard$microdata credit_agrby<-factor(paste(CreditCard_microdata$Name,CreditCard_microdata$Month,sep = "_")) credit_agr<-micro2intData(CreditCard_microdata[,3:7],credit_agrby,LatentCase = "General")
Variable Names Method for intData
## S4 method for signature 'intData' names(x)## S4 method for signature 'intData' names(x)
x |
An object of class intData. |
A character vector of variable names.
Number of Micro Units Method for intData
NbMicroUnits(x) ## S4 method for signature 'intData' NbMicroUnits(x)NbMicroUnits(x) ## S4 method for signature 'intData' NbMicroUnits(x)
x |
An object of class intData. |
An integer specifying the number of micro units.
Number of Columns Method for intData
## S4 method for signature 'intData' ncol(x)## S4 method for signature 'intData' ncol(x)
x |
An object of class intData. |
The number of columns.
Number of Rows Method for intData
## S4 method for signature 'intData' nrow(x)## S4 method for signature 'intData' nrow(x)
x |
An object of class intData. |
The number of rows.
Distance-Distance plot for interval-valued data.
plot_dist_dist( class_dist, class_cutoff = NULL, class_cutoff_label = NULL, rob_dist, rob_cutoff = NULL, rob_cutoff_label = NULL, obs_names = NULL, ggplotly = TRUE, color_class = NULL, color_label = NULL, palette = NULL, shape_class = NULL, shape_label = NULL, label_obs = NULL )plot_dist_dist( class_dist, class_cutoff = NULL, class_cutoff_label = NULL, rob_dist, rob_cutoff = NULL, rob_cutoff_label = NULL, obs_names = NULL, ggplotly = TRUE, color_class = NULL, color_label = NULL, palette = NULL, shape_class = NULL, shape_label = NULL, label_obs = NULL )
class_dist |
A numeric vector containing the classical distances for each observation. |
class_cutoff |
Numeric. The cutoff value for the classical distances. |
class_cutoff_label |
Character. Label for the classical cutoff. If NULL (default), no legend for the classical cutoff is shown. |
rob_dist |
A numeric vector containing the robust distances for each observation. |
rob_cutoff |
Numeric. The cutoff value for the robust distances. |
rob_cutoff_label |
Character. Label for the robust cutoff. If NULL (default), no legend for the robust cutoff is shown. |
obs_names |
A character vector containing the names of the observations. If NULL (default), the names are taken from the names of class_dist. |
ggplotly |
Logical. If |
color_class |
A vector indicating the color class of each observation. If NULL (default), all points have the same color. |
color_label |
Character. Label for the color class. If NULL (default), no legend for the color class is shown. |
palette |
A vector with colors for each color class. If NULL (default), default ggplot2::ggplot2 colors are used. |
shape_class |
A vector indicating the shape class of each observation. If NULL (default), all points have the same shape. |
shape_label |
Character. Label for the shape class. If NULL (default), no legend for the shape class is shown. |
label_obs |
A vector with the names of the observations to be labeled in the plot when |
Returns a Distance-Distance plot that displays the classical distances against the robust distances for each observation, highlighting outliers.
#Create intData object data(creditcard) credit_card_int <- creditcard$intData #Estimate the mean and covariance matrix credit_card_IMCD<-IMCD(credit_card_int, floor(nrow(credit_card_int)*0.75), "farness", 0.9) credit_card_outliers <- int_outliers(credit_card_IMCD$robust_dist, p=credit_card_int@NIVar, cutoff_lvl = 0.9) #Plot Distance-Distance plot class_dist <- IMah_dist(credit_card_int, z=rep(1,credit_card_int@NObs)) class_outliers <- int_outliers(class_dist,cutoff = "adjbox",p=p,cutoff_lvl = 1.5) credit_card_is_outliers <- as.character(credit_card_outliers$is_outlier) credit_card_is_outliers[credit_card_outliers$is_outlier] <- "Outlier" credit_card_is_outliers[!credit_card_outliers$is_outlier] <- "Inlier" plot_dist_dist(class_dist, class_outliers$cutoff_value[2], "1.5 adjusted boxplot", credit_card_IMCD$robust_dist, credit_card_outliers$cutoff_value, "0.9 farness", color_class = credit_card_is_outliers, palette = c("grey50", "red"))#Create intData object data(creditcard) credit_card_int <- creditcard$intData #Estimate the mean and covariance matrix credit_card_IMCD<-IMCD(credit_card_int, floor(nrow(credit_card_int)*0.75), "farness", 0.9) credit_card_outliers <- int_outliers(credit_card_IMCD$robust_dist, p=credit_card_int@NIVar, cutoff_lvl = 0.9) #Plot Distance-Distance plot class_dist <- IMah_dist(credit_card_int, z=rep(1,credit_card_int@NObs)) class_outliers <- int_outliers(class_dist,cutoff = "adjbox",p=p,cutoff_lvl = 1.5) credit_card_is_outliers <- as.character(credit_card_outliers$is_outlier) credit_card_is_outliers[credit_card_outliers$is_outlier] <- "Outlier" credit_card_is_outliers[!credit_card_outliers$is_outlier] <- "Inlier" plot_dist_dist(class_dist, class_outliers$cutoff_value[2], "1.5 adjusted boxplot", credit_card_IMCD$robust_dist, credit_card_outliers$cutoff_value, "0.9 farness", color_class = credit_card_is_outliers, palette = c("grey50", "red"))
Interval-Mahalanobis distance plot for interval-valued data.
plot_interval_dist( dist, cutoff = NULL, cutoff_label = NULL, obs_names = NULL, sort.obs = TRUE, color_class = NULL, color_label = NULL, palette = NULL, shape_class = NULL, shape_label = NULL, label_obs = NULL )plot_interval_dist( dist, cutoff = NULL, cutoff_label = NULL, obs_names = NULL, sort.obs = TRUE, color_class = NULL, color_label = NULL, palette = NULL, shape_class = NULL, shape_label = NULL, label_obs = NULL )
dist |
A numeric vector containing the Interval-Mahalanobis distances for each observation. |
cutoff |
A numeric vector containing cutoff values to be displayed as horizontal lines. |
cutoff_label |
A character vector containing labels for each cutoff. If NULL (default), default labels are generated. |
obs_names |
A character vector containing the names of the observations. If NULL (default), the names are taken from the names of dist. |
sort.obs |
Logical. If |
color_class |
A vector indicating the color class of each observation. If NULL (default), all points have the same color. |
color_label |
Character. Label for the color class. If NULL (default), no legend for the color class is shown. |
palette |
A vector with colors for each color class. If NULL (default), default ggplot2::ggplot2 colors are used. |
shape_class |
A vector indicating the shape class of each observation. If NULL (default), all points have the same shape. |
shape_label |
Character. Label for the shape class. If NULL (default), no legend for the shape class is shown. |
label_obs |
A vector with the names of the observations to be labeled in the plot. If NULL (default), no labels are shown and x-axis labels are displayed. |
Returns a plot that displays the Interval-Mahalanobis distances for each observation, highlighting outliers based on specified cutoffs.
#Create intData object data(creditcard) credit_card_int <- creditcard$intData #Estimate the mean and covariance matrix credit_card_IMCD<-IMCD(credit_card_int, floor(nrow(credit_card_int)*0.75), "farness", 0.9) credit_card_outliers <- int_outliers(credit_card_IMCD$robust_dist, p=credit_card_int@NIVar, cutoff_lvl = 0.9) credit_card_is_outliers <- as.character(credit_card_outliers$is_outlier) credit_card_is_outliers[credit_card_outliers$is_outlier] <- "Outlier" credit_card_is_outliers[!credit_card_outliers$is_outlier] <- "Inlier" #Plot Interval-Mahalanobis distance plot plot_interval_dist(credit_card_IMCD$robust_dist, cutoff = credit_card_outliers$cutoff_value, cutoff_label = c("0.9 farness"), obs_names = rownames(credit_card_int), sort.obs = FALSE, color_class = credit_card_is_outliers, palette = c("grey50", "red"))#Create intData object data(creditcard) credit_card_int <- creditcard$intData #Estimate the mean and covariance matrix credit_card_IMCD<-IMCD(credit_card_int, floor(nrow(credit_card_int)*0.75), "farness", 0.9) credit_card_outliers <- int_outliers(credit_card_IMCD$robust_dist, p=credit_card_int@NIVar, cutoff_lvl = 0.9) credit_card_is_outliers <- as.character(credit_card_outliers$is_outlier) credit_card_is_outliers[credit_card_outliers$is_outlier] <- "Outlier" credit_card_is_outliers[!credit_card_outliers$is_outlier] <- "Inlier" #Plot Interval-Mahalanobis distance plot plot_interval_dist(credit_card_IMCD$robust_dist, cutoff = credit_card_outliers$cutoff_value, cutoff_label = c("0.9 farness"), obs_names = rownames(credit_card_int), sort.obs = FALSE, color_class = credit_card_is_outliers, palette = c("grey50", "red"))
Plots one intData object against another, with options to visualize the intervals as crosses or rectangles.
Plots a single intData object, either in a vertical or horizontal layout.
## S4 method for signature 'intData,intData' plot( x, y, type = c("crosses", "rectangles", "crosses2"), append = FALSE, palette = rainbow(x@NObs), ... ) ## S4 method for signature 'intData,missing' plot( x, casen = NULL, layout = c("vertical", "horizontal"), append = FALSE, ... )## S4 method for signature 'intData,intData' plot( x, y, type = c("crosses", "rectangles", "crosses2"), append = FALSE, palette = rainbow(x@NObs), ... ) ## S4 method for signature 'intData,missing' plot( x, casen = NULL, layout = c("vertical", "horizontal"), append = FALSE, ... )
x |
An intData object. |
y |
An intData object to plot on the y-axis. |
type |
The type of plot to generate: "crosses" or "rectangles" or "crosses2". Default is "crosses". |
append |
Logical, if |
palette |
A vector with colors for each observation. |
... |
Additional graphical parameters. |
casen |
A vector specifying the case numbers to plot. Default is |
layout |
The layout of the plot: "vertical" or "horizontal". |
A plot showing the relationship between the two intData objects.
A plot showing the intervals of the intData object.
Print Method for Summary intData
## S4 method for signature 'summaryintData' print(x, ...)## S4 method for signature 'summaryintData' print(x, ...)
x |
An object of class |
... |
Additional arguments passed to print. |
The object itself, returned invisibly. Called for its side effects (printing).
Ranges Method for intData
Ranges(Sdt) ## S4 method for signature 'intData' Ranges(Sdt)Ranges(Sdt) ## S4 method for signature 'intData' Ranges(Sdt)
Sdt |
An object of class intData. |
A data.frame containing the ranges of the intervals.
Row.Names Method for intData
## S4 method for signature 'intData' row.names(x)## S4 method for signature 'intData' row.names(x)
x |
An object of class intData. |
A character vector of row names.
Row Names Method for intData
## S4 method for signature 'intData' rownames(x)## S4 method for signature 'intData' rownames(x)
x |
An object of class intData. |
A character vector of row names.
Show Method for intData
Show Method for Summary intData
## S4 method for signature 'intData' show(object) ## S4 method for signature 'summaryintData' show(object)## S4 method for signature 'intData' show(object) ## S4 method for signature 'summaryintData' show(object)
object |
An object of class |
The object itself, returned invisibly. Called for its side effects (printing).
This dataset contains interval data of Spotify tracks' audio features, including min-max values and trimmed intervals, as well as the microdata. It is composed of 11 audio features: duration, danceability, energy, loudness, speechiness, acousticness, instrumentalness, liveness, valence, tempo, and popularity. The aggregation of the microdata was done by track genre.
data(spotify_tracks)data(spotify_tracks)
A list with the following components:
microdataA data frame with 81033 rows and 20 columns. It contains the microdata, with individual measurements of each variable for all observations.
microdata_transformedA data frame with 81033 rows and 20 columns. It contains the transformed microdata, with individual measurements of each variable for all observations. Logarithmic transformations were applied to "loudness" and "tempo". "duration_ms" in milliseconds was converted to "duration" in minutes. "popularity" was scaled to the range [0,1].
intData_minmaxAn intData object with 111 interval-valued observations and 11 variables, constructed using min-max aggregation based on the transformed microdata.
intData_trimmedAn intData object with 111 interval-valued observations and 11 variables, constructed using trimmed aggregation (1\% trimming) based on the transformed microdata.
This data was retrieved from Kaggle, available at https://www.kaggle.com/datasets/maharshipandya/-spotify-tracks-dataset.
data(spotify_tracks) head(spotify_tracks$intData_minmax) head(spotify_tracks$intData_trimmed) head(spotify_tracks$microdata) head(spotify_tracks$microdata_transformed)data(spotify_tracks) head(spotify_tracks$intData_minmax) head(spotify_tracks$intData_trimmed) head(spotify_tracks$microdata) head(spotify_tracks$microdata_transformed)
Summary Method for intData
## S4 method for signature 'intData' summary(object)## S4 method for signature 'intData' summary(object)
object |
An object of class intData. |
An object of class summaryintData.
A class to represent the summary of interval data.
CentersumarA table summarizing the centers.
RngsumarA table summarizing the ranges.
Create a biplot for interval-valued symbolic data, visualizing the symbolic data as rectangles or crosses, with the first two variables on the x and y axes. The function allows customization of colors, fill colors, and outlier representation.
SYMB.biplot( data, type = c("rectangles", "crosses", "crosses2"), palette = rainbow(nrow(data)), fill_col = "gray50", is_outlier = NULL, ... )SYMB.biplot( data, type = c("rectangles", "crosses", "crosses2"), palette = rainbow(nrow(data)), fill_col = "gray50", is_outlier = NULL, ... )
data |
An intData object containing the macrodata/interval data. The first two variables are used for the x and y axes. |
type |
The type of plot to generate: "rectangles", "crosses" or "crosses2". Default is "rectangles". |
palette |
A vector with colors for each observation. Default is |
fill_col |
If |
is_outlier |
A vector with logical values indicating if the observation is an outlier or not. It makes the line width of the outlying observations thicker. Default is NULL. |
... |
Additional graphical parameters. |
A biplot is drawn in the graphic window. The biplot shows the symbolic data as rectangles or crosses, with the first two variables on the x and y axes.
data(creditcard) credit_card_int <- creditcard$intData SYMB.biplot(credit_card_int[,c(3,5)]) # Highlight outliers in the biplot credit_card_IMCD <- IMCD(credit_card_int, floor(0.75*credit_card_int@NObs), "farness", 0.9) credit_card_outliers <- int_outliers(credit_card_IMCD$robust_dist, "farness", 0.9) outliers_colors<-rep('gray50',credit_card_int@NObs) names(outliers_colors)<-rownames(credit_card_int) outliers_colors[credit_card_outliers$outliers_names] = 'red' SYMB.biplot(credit_card_int[,c(3,5)], palette = outliers_colors, is_outlier = credit_card_outliers$is_outlier)data(creditcard) credit_card_int <- creditcard$intData SYMB.biplot(credit_card_int[,c(3,5)]) # Highlight outliers in the biplot credit_card_IMCD <- IMCD(credit_card_int, floor(0.75*credit_card_int@NObs), "farness", 0.9) credit_card_outliers <- int_outliers(credit_card_IMCD$robust_dist, "farness", 0.9) outliers_colors<-rep('gray50',credit_card_int@NObs) names(outliers_colors)<-rownames(credit_card_int) outliers_colors[credit_card_outliers$outliers_names] = 'red' SYMB.biplot(credit_card_int[,c(3,5)], palette = outliers_colors, is_outlier = credit_card_outliers$is_outlier)
Adapted from pairs.panels (R package "psych") shows a scatter plot of matrices, with bivariate symbolic scatter plots below the diagonal, variables' names on the diagonal, and all the symbolic correlations above the diagonal. Useful for descriptive statistics of symbolic objects described by interval variables.
SYMB.pairs.panels( data, type = c("rectangles", "crosses", "crosses2"), cex.cor = 2, corr = NULL, palette = rainbow(nrow(data)), fill_col = "gray50", is_outlier = NULL, ... )SYMB.pairs.panels( data, type = c("rectangles", "crosses", "crosses2"), cex.cor = 2, corr = NULL, palette = rainbow(nrow(data)), fill_col = "gray50", is_outlier = NULL, ... )
data |
An intData object containing the macrodata/interval data |
type |
The type of plot to generate: "rectangles" or "crosses" or "crosses2". Default is "rectangles". |
cex.cor |
Character expansion factor |
corr |
A matrix with the symbolic correlations; if not provided the upper panel is omitted |
palette |
A vector with colors for each observation. |
fill_col |
If |
is_outlier |
A vector with logical values indicating if the observation is an outlier or not. It makes the line width of the outlying observations thicker. Default is NULL. |
... |
Additional graphical parameters. |
A scatter plot matrix is drawn in the graphic window. The lower off diagonal draws scatter plots, the diagonal variables' names, the upper off diagonal reports all the symbolic correlations.
data(creditcard) credit_card_int <- creditcard$intData credit_card_cov<-int_cov(credit_card_int) credit_card_cor<-cov2cor(credit_card_cov) SYMB.pairs.panels(credit_card_int,corr=credit_card_cor,labels=colnames(credit_card_int)) # Highlight outliers in the biplot credit_card_IMCD <- IMCD(credit_card_int, floor(0.75*credit_card_int@NObs), "farness", 0.9) credit_card_outliers <- int_outliers(credit_card_IMCD$robust_dist, "farness", 0.9) outliers_colors<-rep('gray50',credit_card_int@NObs) names(outliers_colors)<-rownames(credit_card_int) outliers_colors[credit_card_outliers$outliers_names] = 'red' SYMB.pairs.panels(credit_card_int,corr=cov2cor(credit_card_IMCD$cov_IMCD), palette = outliers_colors,labels=colnames(credit_card_int), type = "rectangles",is_outlier = credit_card_outliers$is_outlier)data(creditcard) credit_card_int <- creditcard$intData credit_card_cov<-int_cov(credit_card_int) credit_card_cor<-cov2cor(credit_card_cov) SYMB.pairs.panels(credit_card_int,corr=credit_card_cor,labels=colnames(credit_card_int)) # Highlight outliers in the biplot credit_card_IMCD <- IMCD(credit_card_int, floor(0.75*credit_card_int@NObs), "farness", 0.9) credit_card_outliers <- int_outliers(credit_card_IMCD$robust_dist, "farness", 0.9) outliers_colors<-rep('gray50',credit_card_int@NObs) names(outliers_colors)<-rownames(credit_card_int) outliers_colors[credit_card_outliers$outliers_names] = 'red' SYMB.pairs.panels(credit_card_int,corr=cov2cor(credit_card_IMCD$cov_IMCD), palette = outliers_colors,labels=colnames(credit_card_int), type = "rectangles",is_outlier = credit_card_outliers$is_outlier)
Returns the last n rows of an intData object.
## S4 method for signature 'intData' tail(x, n = min(nrow(x), 6L))## S4 method for signature 'intData' tail(x, n = min(nrow(x), 6L))
x |
An intData object. |
n |
The number of rows to return. |
A subset of the intData object.
Upper Bounds Method for intData
UpperBounds(Sdt) ## S4 method for signature 'intData' UpperBounds(Sdt)UpperBounds(Sdt) ## S4 method for signature 'intData' UpperBounds(Sdt)
Sdt |
An object of class intData. |
A data.frame containing the upper bounds of the intervals.