Title: | Distances for Machine Learning and Statistics in the Context of Big Data |
---|---|
Description: | Functions to compute distances between probability measures or any other data object than can be posed in this way, entropy measures for samples of curves, distances and depth measures for functional data, and the Generalized Mahalanobis Kernel distance for high dimensional data. For further details about the metrics please refer to Martos et al (2014) <doi:10.3233/IDA-140706>; Martos et al (2018) <doi:10.3390/e20010033>; Hernandez et al (2018, submitted); Martos et al (2018, submitted). |
Authors: | Gabriel Martos [aut, cre], Nicolas Hernandez [aut] |
Maintainer: | Gabriel Martos <[email protected]> |
License: | GPL (>= 3) |
Version: | 1.1 |
Built: | 2024-11-20 06:39:57 UTC |
Source: | CRAN |
The data consist of set of measurements across years of male mortality rates
in Australia from package fds
.
Ausmale
Ausmale
A list with years in the first component and a 101 times 103 matrix, years in rows and cohorts in columns, in the second component.
fds
This function allows you to compute the family of alpha entropy as stated in Martos et al (2018).
entropy(X,alpha=2,k.neighbor,scale=FALSE)
entropy(X,alpha=2,k.neighbor,scale=FALSE)
X |
data in a matrix where variables are in columns and observations are in rows. |
alpha |
a parameter defining the entropy function. |
k.neighbor |
number of neighbour points to consider in the computation of entropy. |
scale |
logical variable indicating if scaling is required. |
The function computes the alpha entropy and the local alpha entropy (see reference for further details) of a data set using a non parametric density estimator.
local.entropy |
local entropy relative to each point in the sample. |
entropy |
estimated entropy. |
Martos, G. et al (2018): Entropy Measures for Stochastic Processes with Applications in Functional Anomaly Detection. Entropy 20(1): 33 (2018).
require(MASS); data = mvrnorm(100,c(0,0),diag(2)); entropy(data, alpha = 2, k.neighbor = 10, scale = FALSE)
require(MASS); data = mvrnorm(100,c(0,0),diag(2)); entropy(data, alpha = 2, k.neighbor = 10, scale = FALSE)
This function allows you to compute the family of alpha-Entropy for functional data as stated in Martos et al (2018).
entropy.fd(fdframe, gamma = 1, kerfunc="rbf", kerpar = list(sigma = 1, bias=0,degree=2), alpha=2,d=2,resol,k.neighbor)
entropy.fd(fdframe, gamma = 1, kerfunc="rbf", kerpar = list(sigma = 1, bias=0,degree=2), alpha=2,d=2,resol,k.neighbor)
fdframe |
functional data frame |
gamma |
regularization parameter. |
kerfunc |
kernel function ( |
kerpar |
a list of kernel parameters where sigma is the scale with both kernels. |
alpha |
Entropy parameter. |
d |
Dimension truncation in the Reproducing Kernel Hilbert Space representation. |
resol |
number of level sets used to compute the functional entropy. |
k.neighbor |
number of points to estimate the support of the distribution. |
This function estimates the entropy of a stochastic process. To this aim, the raw functional data is projected onto a Reproducing Kernel Hilbert Space, and the entropy is estimated using the coefficient of these functions.
local.entropy |
local entropy relative to each curve in the sample. |
entropy |
estimated entropy of the the set of functions. |
Hernandez and Martos
Martos, G. et al (2018). Entropy Measures for Stochastic Processes with Applications in Functional Anomaly Detection. Entropy 20(1), 33 (2018).
data(Ausmale); t <- Ausmale[[1]] t <- as.numeric(( t - min(t) ) / length(t)) raw.data <- fdframe(t=t, Y=Ausmale[[2]]) entropy.fd(raw.data,gamma=0.0001,kerfunc="rbf",kerpar=c(10), alpha=2, k.neighbor=15)
data(Ausmale); t <- Ausmale[[1]] t <- as.numeric(( t - min(t) ) / length(t)) raw.data <- fdframe(t=t, Y=Ausmale[[2]]) entropy.fd(raw.data,gamma=0.0001,kerfunc="rbf",kerpar=c(10), alpha=2, k.neighbor=15)
This function is used to create multivariate functional data frame
objects to be used in combination with the functions in the package
bigdatadist
.
fdframe(t, Y)
fdframe(t, Y)
t |
abscissa values at which observations took place. |
Y |
matrix with functions in columns and observations in rows. |
t = 1:10; Y = cbind(sin(t),cos(t)) fdata = fdframe(t,Y) plot(fdata, xlab='Time', ylab='')
t = 1:10; Y = cbind(sin(t),cos(t)) fdata = fdframe(t,Y) plot(fdata, xlab='Time', ylab='')
This function allows you to compute the Generalized Kernel Mahalanobis depth measure as stated in Hernandez et al (2018, submitted) and the Generalized Mahalanobis distance in Martos et al (2014).
gmdepth(A,b,resol,k.neighbor)
gmdepth(A,b,resol,k.neighbor)
A |
data matrix where variables in columns, observations in rows. |
b |
a new point in the support of the distribution to evaluate the depth. If omitted, the function compute the distances and depth between all points in the sample. |
resol |
resolution level, i.e. number of density level sets to consider. |
k.neighbor |
number of local neighbours to estimate the support. |
depth |
the generalized Mahalanobis depth measure. |
distance |
the generalized Mahalanobis distance measure. |
Hernandez and Martos
Hernandez N. et al (2018). Generalized Mahalanobis depth functions (submitted). Martos, G. et al (2014). Generalizing the Mahalanobis distance via density kernels. Inteligent Data Anal.
require(MASS) set.seed(1) A=mvrnorm(450,c(0,0),Sigma=diag(2)) b=mvrnorm(50,c(10,10),Sigma=diag(c(0.1,0.1))) C=rbind(A,b) plot(C, pch=20, col=c(rep('black',450),rep('red',50)), xlab='x1',ylab='x2') gmd.fit = gmdepth(A=C) depth = gmd.fit$depth distance = gmd.fit$distance plot(depth,distance, pch=20, col=c(rep('black',450),rep('red',50))) gmdepth(A=A,b=mvrnorm(1,c(0,0),Sigma=diag(2)))
require(MASS) set.seed(1) A=mvrnorm(450,c(0,0),Sigma=diag(2)) b=mvrnorm(50,c(10,10),Sigma=diag(c(0.1,0.1))) C=rbind(A,b) plot(C, pch=20, col=c(rep('black',450),rep('red',50)), xlab='x1',ylab='x2') gmd.fit = gmdepth(A=C) depth = gmd.fit$depth distance = gmd.fit$distance plot(depth,distance, pch=20, col=c(rep('black',450),rep('red',50))) gmdepth(A=A,b=mvrnorm(1,c(0,0),Sigma=diag(2)))
This function allows you to compute the Generalized Kernel Mahalanobis depth measure as stated in Hernandez et al (2018, submitted) and the Generalized Mahalanobis distance in Martos et al (2014), for functional data represented in a Reproducing Kernel Hilbert Space.
gmdepth.fd(fdframe, gamma = 1,kerfunc="rbf" , kerpar=list(sigma=1,bias=0,degree=2),d=2,resol,k.neighbor)
gmdepth.fd(fdframe, gamma = 1,kerfunc="rbf" , kerpar=list(sigma=1,bias=0,degree=2),d=2,resol,k.neighbor)
fdframe |
an |
gamma |
regularization parameter. |
kerfunc |
kernel function to be used. |
kerpar |
a list of kernel parameters where sigma is the scale with both kernels. |
d |
truncation parameter in the Reproducing Kernel Hilbert Space representation. |
resol |
resolution level to estimate the generalized Mahalanobis distance. |
k.neighbor |
number of neighbours to estimate the support of the disitribution. |
depth |
the generalized Mahalanobis depth measure for the curves in the sample. |
distance |
the generalized Mahalanobis distance for the curves in the sample. |
Hernandez and Martos
Hernandez N. et al (2018, submitted). Generalized Mahalanobis depth functions. Martos, G. et al (2014). Generalizing the Mahalanobis distance via density kernels. Inteligent Data Anal.
data(Ausmale); t <- Ausmale[[1]] t = as.numeric(( t - min(t) ) / length(t)) raw.data = fdframe(t=t, Y=Ausmale[[2]]) gmd.fit.fd = gmdepth.fd(raw.data,gamma=0.001, kerfunc="rbf",kerpar=list(sigma = 10)) gmd.fit.fd$distance gmd.fit.fd$depth rbPal <- colorRampPalette(c('red','black')) color = rbPal(5)[as.numeric(cut(gmd.fit.fd$depth,breaks = 5))] plot(rkhs(raw.data,gamma=0.0001,kerfunc="rbf",kerpar=list(sigma = 10)), col = color, xlab='time',ylab='')
data(Ausmale); t <- Ausmale[[1]] t = as.numeric(( t - min(t) ) / length(t)) raw.data = fdframe(t=t, Y=Ausmale[[2]]) gmd.fit.fd = gmdepth.fd(raw.data,gamma=0.001, kerfunc="rbf",kerpar=list(sigma = 10)) gmd.fit.fd$distance gmd.fit.fd$depth rbPal <- colorRampPalette(c('red','black')) color = rbPal(5)[as.numeric(cut(gmd.fit.fd$depth,breaks = 5))] plot(rkhs(raw.data,gamma=0.0001,kerfunc="rbf",kerpar=list(sigma = 10)), col = color, xlab='time',ylab='')
This function allows you to compute the Generalized Kernel Mahalanobis depth measure for a sample of functional data as stated in Hernandez et al (2018, submitted).
kmdepth.fd(fdframe, gamma = 1, kerfunc = "rbf" , kerpar = list(sigma = 1, bias = 0, degree = 2) , d = 2 , robust=TRUE , h=0.1 , nsamp=250)
kmdepth.fd(fdframe, gamma = 1, kerfunc = "rbf" , kerpar = list(sigma = 1, bias = 0, degree = 2) , d = 2 , robust=TRUE , h=0.1 , nsamp=250)
fdframe |
an |
gamma |
regularization parameter. |
kerfunc |
kernel function to be used. |
kerpar |
a list of kernel parameters where sigma is the scale with both kernels. |
d |
truncation parameter in the Reproducing Kernel Hilbert Space representation. |
robust |
TRUE if the covariance matrix is estimated through Robust Maximum Likelihood method. |
h |
numeric parameter controlling the a-prioir precentage of outliers in the sample (value between 0 and 1, by def = 0.1). |
nsamp |
number of subsets used for initial estimates (by def = 250). |
depth |
the kernel-mahalanobis depth measure for the curves in the sample. |
Hernandez and Martos
Hernandez N. et al (2018, submitted). Generalized Mahalanobis depth functions.
data(Ausmale); t <- Ausmale[[1]] t = as.numeric(( t - min(t) ) / length(t)) raw.data = fdframe(t=t, Y=Ausmale[[2]]) kmd.fit.fd = kmdepth.fd(raw.data, gamma = 0.0001, kerfunc = "rbf" , kerpar = list(sigma = 10) , d = 2 , robust=TRUE) kmd.fit.fd$depth rbPal <- colorRampPalette(c('red','black')) color = rbPal(5)[as.numeric(cut(kmd.fit.fd$depth,breaks = 5))] plot(rkhs(raw.data,gamma=0.0001,kerfunc="rbf",kerpar=list(sigma = 10)), col = color, xlab='time',ylab='')
data(Ausmale); t <- Ausmale[[1]] t = as.numeric(( t - min(t) ) / length(t)) raw.data = fdframe(t=t, Y=Ausmale[[2]]) kmd.fit.fd = kmdepth.fd(raw.data, gamma = 0.0001, kerfunc = "rbf" , kerpar = list(sigma = 10) , d = 2 , robust=TRUE) kmd.fit.fd$depth rbPal <- colorRampPalette(c('red','black')) color = rbPal(5)[as.numeric(cut(kmd.fit.fd$depth,breaks = 5))] plot(rkhs(raw.data,gamma=0.0001,kerfunc="rbf",kerpar=list(sigma = 10)), col = color, xlab='time',ylab='')
This function allows you to compute the alpha level set distances as proposed in Martos et al. Nomparametric distances for probability measures with applications, 2018 (submitted).
levelsetdist(A,B,n.level=10,k.neighbor=10)
levelsetdist(A,B,n.level=10,k.neighbor=10)
A |
data set in a matrix where variables are in columns and observations are in rows. |
B |
data set in a matrix where variables are in columns and observations are in rows. |
n.level |
the number of level sets to consider for distance computation. |
k.neighbor |
number of neighbour points to consider in the estimation of the support of the distribution on each class. |
The function computes the alpha level set distance between two (samples from) different probability measures. Details about the distance and the criterion to fix its hyperparameter can be found in Martos et al (2018, submitted).
distance |
distance estimation between the two data sets or distributions. |
Martos, G. et al (2018): Nomparametric distances for probability measures with applications in classification. J. of Calssification, 2018 (submitted).
require(MASS); set.seed(1) A = mvrnorm(100,c(0,0),diag(2)); B = mvrnorm(150,c(1,1),diag(2)); levelsetdist(A, B)
require(MASS); set.seed(1) A = mvrnorm(100,c(0,0),diag(2)); B = mvrnorm(150,c(1,1),diag(2)); levelsetdist(A, B)
The data consist of an low and high values of the Merval Index stock market from Argentina; the data were gathered from Yahoo Finance.
merval
merval
A dataframe with 5269 observations with daily minimum, maximum, open and close index values.
Yahoo Finance
This function allows you to fit discrete functional data (fdframe) as functions in RKHS solving a regularization problem as stated in Munoz (2010).
rkhs(fdframe, gamma=1, kerfunc='rbf', kerpar=list(sigma=1, bias=0, degree=2))
rkhs(fdframe, gamma=1, kerfunc='rbf', kerpar=list(sigma=1, bias=0, degree=2))
fdframe |
functional data |
gamma |
regularization parameter. |
kerfunc |
kernel function rbf or poly to be used. |
kerpar |
a list of kernel parameters where sigma is the scale with both kernels. |
fdframe |
raw data in an fdframe object. |
f |
estimated functional data |
alpha |
coefficients for the linear combination. |
lambda.star |
reduced coefficients for the linear combination. |
Hernandez and Martos
Munoz A. et al (2010). Representing functional data using support vector machines. Pattern recognition letters, 31(6).
data(merval); t <- as.Date(merval[1:100,1]) t <- as.numeric(( t - min(t) ) / 154) raw.data <-fdframe(t = t, Y = merval[1:100,2:5]) plot(raw.data, xlab='time', ylab='Merval raw data') f.data <- rkhs(raw.data, gamma = 0.001, kerpar = list(sigma = 8)) print(f.data) plot(f.data, xlab='time', ylab='Merval data')
data(merval); t <- as.Date(merval[1:100,1]) t <- as.numeric(( t - min(t) ) / 154) raw.data <-fdframe(t = t, Y = merval[1:100,2:5]) plot(raw.data, xlab='time', ylab='Merval raw data') f.data <- rkhs(raw.data, gamma = 0.001, kerpar = list(sigma = 8)) print(f.data) plot(f.data, xlab='time', ylab='Merval data')