Title: | A New Gini Correlation Between Quantitative and Qualitative Variables |
---|---|
Description: | An implementation of a new Gini covariance and correlation to measure dependence between a categorical and numerical variables. Dang, X., Nguyen, D., Chen, Y. and Zhang, J., (2018) <arXiv:1809.09793>. |
Authors: | Dao Nguyen and Xin Dang |
Maintainer: | Dao Nguyen <[email protected]> |
License: | GPL (>= 2) |
Version: | 0.1.1 |
Built: | 2024-11-26 06:25:44 UTC |
Source: | CRAN |
A new Gini correlation to measure dependence between categorical and numerical variables are implemented. Analogous to Pearson in ANOVA model, the Gini correlation is interpreted as the ratio of the between-group variation and the total variation, but it characterizes independence (zero Gini correlation mutually implies independence). Closely related to the distance correlation, the Gini correlation is of the simple formulation by considering the nature of the categorical variable. As a result, the Gini correlation has a lower computational cost than the distance correlation and is more straightforward to perform inference. The dependence test and confidence interval are implemented. Also, the corresponding kernelized dependence measures are also implemented.
The details are described in the following papers "A new Gini correlation between quantitative and qualitative variables" and "Estimating Feature-Label Dependence Using Gini Distance Statistics"
Dao Nguyen [email protected] and Xin Dang [email protected]
Dang, X., Nguyen, D., Chen, Y. and Zhang, J., (2019). A new Gini correlation between quantitative and qualitative variables, Journal of the American Statistical Association (submitted), https://arxiv.org/pdf/1809.09793.pdf
Zhang, S., Dang, X., Nguyen, D. and Chen, Y. (2019). Estimating feature - label dependence using Gini distance statistics. IEEE Transactions on Pattern Analysis and Machine Intelligence (submitted), https://arXiv.org/pdf/1906.02171.pdf
Find confidence intervals for dependence measures in which Xs are quantitative, Y are categorical using jack-knife method.
ConfidenceInterval(x, y, sigma, alpha, level, method)
ConfidenceInterval(x, y, sigma, alpha, level, method)
x |
data |
y |
label of data or univariate response variable |
sigma |
kernel parameter |
alpha |
exponent on Euclidean distance, in (0,2] |
level |
level of confidence, in [0,1] |
method |
name of dependence measure which can chosen from "gCor","gCov","dCor","dCov","KgCor", "KgCov", "KdCor" and "KdCov" |
ConfidenceInterval
compute the confidence interval of the distance correlation statistics.
It is a self-contained R function returning a variance of the measure of dependence statistics.
The sample size (number of rows) of the data must agree with the length of the label vector, and samples must not contain missing values. Arguments
x
, y
are treated as data and labels. alpha
if missing by default is 1, otherwise it is exponent on the Euclidean distance.
Suppose a sample data for
available. The confidence interval is built upon the asymptotic normality of sample dependence statistic. The asymptotic variance is estimated by the Jackknife method.
More details refer to Shao and Tu (1996).
ConfidenceInterval
returns the confidence interval of distance correlation
Dang, X., Nguyen, D., Chen, Y. and Zhang, J. (2019). A new Gini correlation between quantitative and qualitative variables. Submitted.
Shao, J. and Tu, D. (1996). The Jackknife and Bootstrap. Springer, New York.
x <- iris[,1:4] y <- unclass(iris[,5]) ConfidenceInterval(x, y, alpha=1, level=0.95, method='gCor')
x <- iris[,1:4] y <- unclass(iris[,5]) ConfidenceInterval(x, y, alpha=1, level=0.95, method='gCor')
Find a critical value by permutation test using variance of kernel (Gini) distance covariance or correlation statistics, in which Xs are quantitative, Y are categorical, sigma is kernel standard deviation, alpha is an exponent on Euclidean distance and returns the critical value of the measures of dependence.
CriticalValue(x, y, sigma, alpha, level, M = 1000, method)
CriticalValue(x, y, sigma, alpha, level, M = 1000, method)
x |
data |
y |
label of data or univariate response variable |
sigma |
kernel standard deviation |
alpha |
exponent on Euclidean distance, in (0,2] |
level |
significance level of the test, the default value = 0.05 |
M |
number of permutations |
method |
string name of the method for permutation test, e.g. gCov |
CriticalValue
compute the critical value of a dependence test of a kernel (Gini) distance covariance or correlation statistics.
It is a self-contained R function returning the critical value of the measure of dependence statistics.
The critical value of the test of significance level , however, is obtained by a permutation procedure.
Let
be the vector of original sample indices of the sample for
labels and
.
Let
denote a permutation of the elements of
and the corresponding
is computed.
Under the
,
and
are identically distributed for every permutation
of
.
Hence, based on
permutations, the critical value
is estimated by the
sample
quantile of
,
. Usually
is sufficient
for a good estimation on the critical value.
See PermutationTest
for a test of multivariate independence
based on the (Gini) distance statistic.
CriticalValue
returns return the critical value of the measures of the dependence of the permutation test of a specified function
n = 50 x <- runif(n) y <- c(rep(1,n/2),rep(2,n/2)) CriticalValue(x, y, sigma=1, alpha=2, level=0.04, M = 1000, method='KgCov')
n = 50 x <- runif(n) y <- c(rep(1,n/2),rep(2,n/2)) CriticalValue(x, y, sigma=1, alpha=2, level=0.04, M = 1000, method='KgCov')
Computes distance covariance and correlation statistics, in which Xs are quantitative and Ys are categorical and return the measures of dependence.
dCor(x, y, alpha)
dCor(x, y, alpha)
x |
data |
y |
label of data or univariate response variable |
alpha |
exponent on Euclidean distance, in (0,2] |
The sample size (number of rows) of the data must agree with the length of the label vector, and samples must not contain missing values. Arguments
x
, y
are treated as data and labels.
dCor
calls dcor
function from energy package which computes the distance correlation between X and Y where both are numerical variables. If Y is categorical, the set difference metric on the support of is used. That is,
where
is the indicator function. Then the sample distance correlation between data and labels is computed as follows.
Let be a symmetric,
, centered distance matrix of sample
. The
-th entry of
is
if
and 0 if
,
where
,
,
, and
. Similarly, using the set difference metric, a symmetric,
, centered distance matrix is calculated for samples
and denoted by
. Unbiased estimators of
,
and
are given respectively as,
,
and
. Then the distance correlation is
dCor
returns the sample distance variance of x
, distance variance of y
, distance covariance of x
and y
and distance correlation of x
, y
.
Lyons, R. (2013). Distance covariance in metric spaces. The Annals of Probability, 41 (5), 3284-3305.
Szekely, G. J., Rizzo, M. L. and Bakirov, N. (2007). Measuring and testing dependence by correlation of distances. Annals of Statistics, 35 (6), 2769-2794.
Rizzo, M.L. and Szekely, G.J., (2017). Energy: E-Statistics: Multivariate Inference via the Energy of Data (R Package), Version 1.7-0.
x <- iris[,1:4] y <- unclass(iris[,5]) dCor(x, y, alpha = 1)
x <- iris[,1:4] y <- unclass(iris[,5]) dCor(x, y, alpha = 1)
Computes distance covariance statistic, in which Xs are quantitative and Y are categorical and return the measures of dependence.
dCov(x, y, alpha)
dCov(x, y, alpha)
x |
data |
y |
label of data or response variable |
alpha |
exponent on Euclidean distance, in (0,2] |
dCov
calls dcov
function from energy package to compute distance covariance statistic.
The sample size (number of rows) of the data must agree with the length of the label vector, and samples must not contain missing values. Arguments
x
, y
are treated as data and labels.
The distance covariance (Sezekley07) is extended from Euclidean space to general metric spaces by Lyons (2013). Based on that idea, we define the discrete metric
where is the indicator function. Equipped with this set difference metric on the support of
and Euclidean
distance on the support of
, the corresponding distance covariance and distance correlation for numerical
and categorical
variables are as follows.
Let be a symmetric,
, centered distance matrix of sample
. The
-th entry of
is
if
and 0 if
,
where
,
,
, and
. Similarly, using the set difference metric, a symmetric,
, centered distance matrix is calculated for samples
and denoted by
. Unbiased estimators of
is
.
dCov
returns the sample distance covariance between data x
and label y
.
Lyons, R. (2013). Distance covariance in metric spaces. The Annals of Probability, 41 (5), 3284-3305.
Rizzo, M.L. and Szekely, G.J., (2017). Energy: E-Statistics: Multivariate Inference via the Energy of Data (R Package), Version 1.7-0.
Szekely, G. J., Rizzo, M. L. and Bakirov, N. (2007). Measuring and testing dependence by correlation of distances. Annals of Statistics, 35 (6), 2769-2794.
x <- iris[,1:4] y <- unclass(iris[,5]) dCov(x, y, alpha = 1)
x <- iris[,1:4] y <- unclass(iris[,5]) dCov(x, y, alpha = 1)
Computes Gini distance covariance and correlation statistics, in which Xs are quantitative, Y are categorical, alpha is exponent on the Euclidean distance and returns the measures of dependence.
gCor(x, y, alpha)
gCor(x, y, alpha)
x |
data |
y |
label of data or univariate response variable |
alpha |
exponent on Euclidean distance, in (0,2) |
gCor
compute Gini distance correlation statistics.
It is a self-contained R function returning a measure of dependence statistics.
The sample size (number of rows) of the data must agree with the length of the label vector, and samples must not contain missing values. Arguments
x
, y
are treated as data and labels. alpha
if missing by default is 1, otherwise it is exponent on the Euclidean distance.
Suppose a sample data for
available. The sample counterparts can be easily computed. Let
be the index set of sample points with
, then
is estimated by the sample proportion of that category, that is,
where
is the number of elements in
. With a given
, a point estimator of
is given as follows.
gCor
returns the sample Gini distance covariacne and correlation between x
and y
.
Dang, X., Nguyen, D., Chen, Y. and Zhang, J. (2019). A new Gini correlation between quantitative and qualitative variables. Submitted to Journal of American Statistics Association.
x <- iris[,1:4] y <- unclass(iris[,5]) gCor(x, y, alpha = 1)
x <- iris[,1:4] y <- unclass(iris[,5]) gCor(x, y, alpha = 1)
Computes Gini distance covariance statistics, in which Xs are quantitative, Y are categorical, alpha is an exponent on Euclidean distance and returns the measures of dependence.
gCov(x, y, alpha)
gCov(x, y, alpha)
x |
data |
y |
label of data or univariate response variable |
alpha |
exponent on Euclidean distance, in (0,2] |
gCov
compute Gini distance covariance statistics.
It is a self-contained R function returning a measure of dependence statistics.
The sample size (number of rows) of the data must agree with the length of the label vector, and samples must not contain missing values. Arguments
x
, y
are treated as data and labels. alpha
if missing by default is 1, otherwise it is exponent on the Euclidean distance.
Gini distance covariance is a new measure of dependence between random vectors and its labels. For all distributions with finite first moments, Gini distance correlation gCov has the following fundamental properties:
(1) gCov(X,Y) is defined for in arbitrary dimension quantitive variable and
a univariate categorical variable.
(2) gCov(X,Y)=0 characterizes independence of and
.
Gini distance covariance satisfies , and
only if
and
are independent. Gini distance
covariance gCov provides a new approach to the problem of
testing the joint independence of random vectors. The formal
definitions of the population coefficients gCov is given in (DNCZ 2018). The empirical Gini distance covariance
is the nonnegative number computed as follows.
Suppose a sample data for
available. The sample counterparts can be easily computed. Let
be the index set of sample points with
, then
is estimated by the sample proportion of that category, that is,
where
is the number of elements in
. With a given
, a point estimator of
is given as follows.
gCov
returns the sample Gini distance covariance
Dang, X., Nguyen, D., Chen, Y. and Zhang, J., (2019). A new Gini correlation between quantitative and qualitative variables, Journal of the American Statistical Association (submitted), https://arxiv.org/pdf/1809.09793.pdf
x <- iris[,1:4] y <- unclass(iris[,5]) gCov(x, y, alpha = 1)
x <- iris[,1:4] y <- unclass(iris[,5]) gCov(x, y, alpha = 1)
Computes Gini mean difference of x, where alpha is an exponent on the Euclidean distance and return the Gini mean difference. The default value for alpha is 1.
gmd(x, alpha)
gmd(x, alpha)
x |
data |
alpha |
exponent on Euclidean distance, in (0,2) |
gmd
compute Gini mean difference of data.
It is a self-contained R function dealing with both univariate and multivariate data.
The samples must not contain missing values. alpha
if missing by default is 1, otherwise it is exponent on the Euclidean distance.
Gini mean difference (GMD) was originally introduced as an alternative measure of variability to the usual standard deviation (Gini14, Yitzhaki13). Let and
be independent random variables from a univariate distribution
with finite first moment in
. The GMD of
is
the expected distance between two independent random variables. If the sample data is available, the sample Gini mean difference is calculated by
where are the order statistics of
(Schechtman87). The computation complexity for univariate Gini Mean difference is
.
Gini mean difference has been generalized for multivariate distributions (Koshvoy97) That is, the Gini mean difference of a distribution F in is
or even more generally for some
,
,
where is the Euclidean norm. The sample Gini mean difference is computed by
Its computation complexity is .
gmd
returns the sample Gini mean distance.
Gini, C. (1914). Sulla misura della concentrazione e della variabilita dei caratteri. Atti del Reale Istituto Veneto di Scienze, Lettere ed Aeti, 62, 1203-1248. English Translation: On the measurement of concentration and variability of characters (2005). Metron, LXIII(1), 3-38.
Koshevoy, G. and Mosler, K. (1997). Multivariate Gini indices. Journal of Multivariate Analysis, 60, 252-276.
Schechtman, E. and Yitzhaki, S. (1987). A measure of association based on Gini's mean difference. Communication in Statistics-Theory and Methods, 16 (1), 207-231.
Yitzhaki, S. and Schechtman, E. (2013). The Gini Methodology, Springer, New York.
n = 100 x <- runif(n) t0 = proc.time() gmd(x, alpha=1) proc.time()- t0 t1 = proc.time() gmd(x, alpha=0.5) proc.time()- t1 x <- matrix(runif(n), n/2, 2) gmd(x,alpha=1)
n = 100 x <- runif(n) t0 = proc.time() gmd(x, alpha=1) proc.time()- t0 t1 = proc.time() gmd(x, alpha=0.5) proc.time()- t1 x <- matrix(runif(n), n/2, 2) gmd(x,alpha=1)
Computes Kernel distance correlation statistics, in which Xs are quantitative, Y are categorical, sigma is kernel standard deviation and returns the measures of dependence.
KdCor(x, y, sigma)
KdCor(x, y, sigma)
x |
data |
y |
label of data or univariate response variable |
sigma |
kernel standard deviation |
KdCor
compute distance correlation statistics.
The sample size (number of rows) of the data must agree with the length of the label vector, and samples must not contain missing values. Arguments
x
, y
are treated as data and labels.
The kernel distance correlation is defined as follow.
where
KdCor
returns the sample kernel distance correlation
Sejdinovic, D., Sriperumbudur, B., Gretton, A. and Fukumizu, K. (2013). Equivalence of Distance-based and RKHS-based Statistics in Hypothesis Testing, The Annals of Statistics, 41 (5), 2263-2291.
Zhang, S., Dang, X., Nguyen, D. and Chen, Y. (2019). Estimating feature - label dependence using Gini distance statistics. IEEE Transactions on Pattern Analysis and Machine Intelligence (submitted).
x<-iris[,1:4] y<-unclass(iris[,5]) KdCor(x, y, sigma=1)
x<-iris[,1:4] y<-unclass(iris[,5]) KdCor(x, y, sigma=1)
Computes Kernel distance covariance statistics, in which Xs are quantitative, Y are categorical, sigma is kernel standard deviation and returns the measures of dependence.
KdCov(x, y, sigma)
KdCov(x, y, sigma)
x |
data |
y |
label of data or univariate response variable |
sigma |
kernel standard deviation |
KdCov
compute distance correlation statistics.
The sample size (number of rows) of the data must agree with the length of the label vector, and samples must not contain missing values. Arguments
x
, y
are treated as data and labels.
Distance covariance was introduced in (Szekely07) as a dependence measure between random variables and
. If
and
are embedded into RKHS's induced by
and
, respectively, the generalized distance covariance of
and
is (Sejdinovic13):
In the case of being categorical, one may embed it using a set difference kernel
,
This is equivalent to embedding as a simplex with edges of unit length (Lyons13), i.e.,
is represented by a
dimensional vector of all zeros except its
-th dimension, which has the value
.
The distance induced by
is called the set distance, i.e.,
if
and
otherwise. Using the set distance, we have the following results on the generalized distance covariance between a numerical
and a categorical random variable.
KdCov
returns the sample kernel distance correlation
Sejdinovic, D., Sriperumbudur, B., Gretton, A. and Fukumizu, K. (2013). Equivalence of Distance-based and RKHS-based Statistics in Hypothesis Testing, The Annals of Statistics, 41 (5), 2263-2291.
Zhang, S., Dang, X., Nguyen, D. and Chen, Y. (2019). Estimating feature - label dependence using Gini distance statistics. IEEE Transactions on Pattern Analysis and Machine Intelligence (submitted).
x<-iris[,1:4] y<-unclass(iris[,5]) KdCov(x, y, sigma=1)
x<-iris[,1:4] y<-unclass(iris[,5]) KdCov(x, y, sigma=1)
Computes Kernel Gini distance correlation statistics, in which Xs are quantitative, Y are categorical, sigma is kernel standard deviation, alpha is an exponent on the Euclidean distance and returns the kernel Gini mean difference.
KgCor(x, y, sigma)
KgCor(x, y, sigma)
x |
data |
y |
label of data or univariate response variable |
sigma |
kernel standard deviation |
Kgcor
compute kernel Gini distance correlation statistics for data.
It is a self-contained R function dealing with both univariate and multivariate data.
The sample size (number of rows) of the data must agree with the length of the label vector, and samples must not contain missing values. Arguments
x
, y
are treated as data and labels.
Gini distance correlation are generalized to RKHS, , as
In this case, we use the default Gaussian distance function
induced by a weighted Gaussian kernel,
KgCor
returns the sample Kernel Gini distance correlation between x
and y
.
Zhang, S., Dang, X., Nguyen, D. and Chen, Y. (2019). Estimating feature - label dependence using Gini distance statistics. IEEE Transactions on Pattern Analysis and Machine Intelligence (submitted), https://arXiv.org/pdf/1906.02171.pdf
x<-iris[,1:4] y<-unclass(iris[,5]) KgCor(x, y, sigma=1)
x<-iris[,1:4] y<-unclass(iris[,5]) KgCor(x, y, sigma=1)
Computes Kernel Gini distance covariance statistics, in which Xs are quantitative, Y are categorical, sigma is kernel standard deviation and returns the kernel Gini covariance.
KgCov(x, y, sigma)
KgCov(x, y, sigma)
x |
data |
y |
label of data or univariate response variable |
sigma |
kernel standard deviation |
Kgcov
compute kernel Gini distance covariance statistics for data.
It is a self-contained R function dealing with both univariate and multivariate data.
The sample size (number of rows) of the data must agree with the length of the label vector, and samples must not contain missing values. Arguments
x
, y
are treated as data and labels.
Gini distance covariance are generalized to reproducing kernel Hilbert space (RKHS), , as
In this case, we use the default Gaussian distance function
induced by a weighted Gaussian kernel,
KgCov
returns the sample Kernel Gini distance covariance of x
and y
.
Zhang, S., Dang, X., Nguyen, D. and Chen, Y. (2019). Estimating feature - label dependence using Gini distance statistics. IEEE Transactions on Pattern Analysis and Machine Intelligence (submitted), https://arXiv.org/pdf/1906.02171.pdf
x<-iris[,1:4] y<-unclass(iris[,5]) KgCov(x, y, sigma=1)
x<-iris[,1:4] y<-unclass(iris[,5]) KgCov(x, y, sigma=1)
Computes Kernel Gini mean difference statistics, in which Xs are quantitative, sigma is kernel standard deviation, alpha is an exponent on the Euclidean distance and returns the kernel Gini mean difference.
Kgmd(x, sigma)
Kgmd(x, sigma)
x |
data |
sigma |
kernel standard deviation |
Kgmd
compute kernel Gini mean difference statistics for data.
It is a self-contained R function dealing with both univariate and multivariate data.
The sample size (number of rows) of the data must agree with the length of the label vector, and samples must not contain missing values. Argument
x
, is treated as data.
Energy distance based statistics naturally generalizes from a Euclidean space to metric spaces (Lyons13). By using a positive definite kernel (Mercer kernel) (Mercer1909), distributions are mapped into a RKHS (Smola07) with a kernel induced distance. Hence one can extend energy distances to a much richer family of statistics defined in RKHS (Sejdinovic13). Let be a Mercer kernel (Mercer1909). There is an associated RKHS
of real functions on
with reproducing kernel
, where the function
defines a distance in
,
Here Kgcov
is defined as Gini distance covariance between and
.
Kgmd
returns the sample Kernel Gini distance
Lyons, R. (2013). Distance covariance in metric spaces. The Annals of Probability, 41 (5), 3284-3305.
x<-iris[,1] Kgmd(x, sigma=1)
x<-iris[,1] Kgmd(x, sigma=1)
Perform permutation test using various dependence measures, in which Xs are quantitative, Y are categorical, alpha is an exponent on Euclidean distance, sigma is kernel parameter in kernel methods and return the test statistic, critical value, p-value and decision of the test.
PermutationTest(x, y, method, sigma, alpha, M = 200, level = 0.05)
PermutationTest(x, y, method, sigma, alpha, M = 200, level = 0.05)
x |
data |
y |
label of data or univariate response variable |
method |
name of permutation test method and is chosen from one of the method list: dCov, dCor, KdCov, KdCor, gCov, gCor, KgCov, Kgcor |
sigma |
kernel parameter for kenerl methods |
alpha |
exponent on Euclidean distance, in (0,2), the default value = 1 |
M |
number of permutations |
level |
significance level of the test, the default value = 0.05 |
X and Y are independent
PermutationTest
compute the p-value value of a permutation test of a (Gini) distance covariance or correlation statistics.
It is a self-contained R function the measure of dependence statistics.
The p-value is obtained by a permutation procedure.
Let be the sample dependnce measure based on the orginal sample indexed by
. Let
denote a permutation of the elements of
and the corresponding
is computed for the permutated data on y labels.
Under the
,
and
are identically distributed for every permutation
of
.
Hence, based on
permutations, the critical value
is estimated by the
sample
quantile of
,
and the p-value is estimated by the proportion of
greater than
. Usually
is sufficient for a good estimation on the critical value or p-value. The default value is
.
PermutationTest
returns the p-value, critical value and decision of the permutation test of a specified method.
gCor
gCov
dCor
dCov
KgCov
KgCov
KdCov
n = 50 x <- runif(n) y <- c(rep(1,n/2),rep(2,n/2)) PermutationTest(x, y, method = "gCor", alpha = 2, M = 50 )
n = 50 x <- runif(n) y <- c(rep(1,n/2),rep(2,n/2)) PermutationTest(x, y, method = "gCor", alpha = 2, M = 50 )
Computes Gini distance correlation statistics, in which Xs are quantitative, Y are categorical, alpha is exponent on the Euclidean distance and returns the measures of dependence.
RcppgCor(x, y, alpha)
RcppgCor(x, y, alpha)
x |
data |
y |
label of data or univariate response variable |
alpha |
exponent on Euclidean distance, in (0,2] |
RcppgCor
compute Gini distance correlation statistic between x
and y
.
It is a Rcpp version of gCor.
RcppgCor
returns the sample Gini distance correlation
x<-iris[,1:4] y<-unclass(iris[,5]) RcppgCor(x, y, alpha=2)
x<-iris[,1:4] y<-unclass(iris[,5]) RcppgCor(x, y, alpha=2)
Computes Gini distance covariance statistics, in which Xs are quantitative, Y are categorical, alpha is an exponent on Euclidean distance and returns the measures of dependence.
RcppgCov(x, y, alpha)
RcppgCov(x, y, alpha)
x |
data |
y |
label of data or univariate response variable |
alpha |
exponent on Euclidean distance, in (0,2] |
RcppgCov
compute Gini distance covariance statistics.
It is Rcpp version of gCov.
RcppgCov
returns the sample Gini distance covariance
x<-iris[,1:4] y<-unclass(iris[,5]) RcppgCov(x, y, alpha=2)
x<-iris[,1:4] y<-unclass(iris[,5]) RcppgCov(x, y, alpha=2)
Computes Gini mean difference of x, where alpha is an exponent on the Euclidean distance and return the Gini mean difference. The default value for alpha is 1.
RcppGmd(x, alpha)
RcppGmd(x, alpha)
x |
data |
alpha |
exponent on Euclidean distance, in (0,2] |
RcppGmd
compute Gini mean difference statistics for data.
It is a Rcpp version of gmd.
RcppGmd
returns the sample Gini mean difference of x
.
n=1000 x<-runif(n) RcppGmd(x, alpha=1)
n=1000 x<-runif(n) RcppGmd(x, alpha=1)
Computes Kernel Gini distance correlation statistics, in which Xs are quantitative, Y are categorical, sigma is kernel standard deviation and return the kernel Gini mean difference.
RcppKgCor(x, y, sigma)
RcppKgCor(x, y, sigma)
x |
data |
y |
label of data or univariate response variable |
sigma |
kernel standard deviation |
RcppKgCor
compute kernel Gini distance correlation statistics for data.
It is Rcpp version of KgCor.
RcppKgCor
returns the sample Kernel Gini distance covariance
n=100 x<-runif(n) y<-c(rep(1,n/2),rep(2,n/2)) RcppKgCor(x, y, sigma=1)
n=100 x<-runif(n) y<-c(rep(1,n/2),rep(2,n/2)) RcppKgCor(x, y, sigma=1)
Computes Kernel Gini distance covariance statistics, in which Xs are quantitative, Y are categorical, sigma is kernel standard deviation and return the kernel Gini mean difference.
RcppKgCov(x, y, sigma)
RcppKgCov(x, y, sigma)
x |
data |
y |
label of data or univariate response variable |
sigma |
kernel standard deviation |
RcppKgCov
compute kernel Gini distance covariance statistics for data.
It is Rcpp version of KgCov.
RcppKgCov
returns the sample Kernel Gini distance covariance
n=100 x<-runif(n) y<-c(rep(1,n/2),rep(2,n/2)) RcppKgCov(x, y, sigma=1)
n=100 x<-runif(n) y<-c(rep(1,n/2),rep(2,n/2)) RcppKgCov(x, y, sigma=1)
Computes Kernel Gini mean difference of X, sigma is the kernel parameter and returns the kernel Gini mean difference.
RcppKGmd(x, sigma)
RcppKGmd(x, sigma)
x |
data |
sigma |
kernel parameter for Gaussian kernel |
RcppKGmd
compute kernel Gini mean difference for data
It is Rcpp version of Kgmd.
RcppKGmd
returns the sample Kernel Gini distance
x<-iris[,1] RcppKGmd(x, sigma=1)
x<-iris[,1] RcppKGmd(x, sigma=1)