Package 'discretization' reference manual

Title:	Data Preprocessing, Discretization for Classification
Description:	A collection of supervised discretization algorithms. It can also be grouped in terms of top-down or bottom-up, implementing the discretization algorithms.
Authors:	HyunJi Kim
Maintainer:	HyunJi Kim <[email protected]>
License:	GPL
Version:	1.0-1.1
Built:	2025-02-14 06:30:13 UTC
Source:	CRAN

Data preprocessing, discretization for classification.

Description

This package is a collection of supervised discretization algorithms. It can also be grouped in terms of top-down or bottom-up, implementing the discretization algorithms.

Details

Package:	discretization
Type:	Package
Version:	1.0-1
Date:	2010-12-02
License: GPL LazyLoad:	yes

Author(s)

Maintainer: HyunJi Kim <[email protected]>

References

Choi, B. S., Kim, H. J., Cha, W. O. (2011). A Comparative Study on Discretization Algorithms for Data Mining, Communications of the Korean Statistical Society, to be published.

Chmielewski, M. R. and Grzymala-Busse, J. W. (1996). Global Discretization of Continuous Attributes as Preprocessing for Machine Learning, International journal of approximate reasoning, Vol. 15, No. 4, 319–331.

Fayyad, U. M. and Irani, K. B.(1993). Multi-interval discretization of continuous-valued attributes for classification learning, Artificial intelligence, 13, 1022–1027.

Gonzalez-Abril, L., Cuberos, F. J., Velasco, F. and Ortega, J. A. (2009), Ameva: An autonomous discretization algorithm,Expert Systems with Applications, 36, 5327–5332.

Kerber, R. (1992). ChiMerge : Discretization of numeric attributes, In Proceedings of the Tenth National Conference on Artificial Intelligence, 123–128.

Kurgan, L. A. and Cios, K. J. (2004). CAIM Discretization Algorithm, IEEE Transactions on knowledge and data engineering, 16, 145-153.

Liu, H. and Setiono, R. (1995). Chi2: Feature selection and discretization of numeric attributes, Tools with Artificial Intelligence, 388–391.

Liu, H. and Setiono, R. (1997). Feature selection and discretization, IEEE transactions on knowledge and data engineering, 9, 642–645.

Pawlak, Z. (1982). Rough Sets, International Journal of Computer and Information Sciences, vol.11, No.5, 341–356.

Su, C. T. and Hsu, J. H. (2005). An Extended Chi2 Algorithm for Discretization of Real Value Attributes, IEEE transactions on knowledge and data engineering, 17, 437–441.

Tay, F. E. H. and Shen, L. (2002). Modified Chi2 Algorithm for Discretization, IEEE Transactions on knowledge and data engineering, 14, 666–670.

Tsai, C. J., Lee, C. I. and Yang, W. P. (2008). A discretization algorithm based on Class-Attribute Contingency Coefficient, Information Sciences, 178, 714–731.

Ziarko, W. (1993). Variable Precision Rough Set Model, Journal of computer and system sciences, Vol. 46, No. 1, 39–59.

Auxiliary function for Ameva algorithm

Description

This function is required to compute the ameva value for Ameva algorithm.

Usage

ameva(tb)
ameva(tb)

Arguments

`tb`	a vector of observed frequencies, $k*l$

Details

This function implements the Ameva criterion proposed in Gonzalez-Abril, Cuberos, Velasco and Ortega (2009) for Discretization. An autonomous discretization algorithm(Ameva) implements in disc.Topdown(data,method=1) It uses a measure based on $chi^2$ as the criterion for the optimal discretization which has the minimum number of discrete intervals and minimum loss of class variable interdependence. The algorithm finds local maximum values of Ameva criterion and a stopping criterion.

Ameva coefficient is defined as follows:

$Ameva(k)=\frac{\chi^2(k)}{k*(l-1)}$

for $k, l >=2$ , k is a number of intervals, l is a number of classes.

This value calculates in contingency table between class variable and discrete interval, row matrix representing the class variable and each column of discrete interval.

Value

val

numeric value of Ameva coefficient

`data`	the dataset to be discretize
`alp`	significance level; $\alpha$
`del`	$Inconsistency(data)< \delta$ , (Liu and Setiono(1995))

`cutp`	list of cut-points for each variable
`Disc.data`	discretized data matrix

`data`	numeric data matrix to discretized dataset
`alpha`	significance level; $\alpha$

`data`	numeric data matrix to discretized dataset
`method`	`1`: CAIM algorithm, `2`: CACC algorithm, `3`: Ameva algorithm.

`cutp`	list of cut-points for each variable(minimun value, cut-points and maximum value)
`Disc.data`	discretized data matrix

`data`	data matrix to discretized dataset
`alp`	significance level; $\alpha$

`x`	a vector of numeric value
`y`	class variable vector
`bd`	current cut points
`di`	candidate cut-points
`method`	each `method` number indicates three top-down discretization. `1` for CAIM algorithm, `2` for CACC algorithm, `3` for Ameva algorithm.

`ci`	cut index
`y`	class variable
`entropy`	this value is calculated by `cutIndex()`

`n`	table, column: intervals, row: variables
`minimum`	min # observations in col or row to merge

`i`	$i$ th variable in data matrix to discretized
`data`	numeric data matrix
`alpha`	significance level; $\alpha$

`cuts`	list of cut-points for any variable
`disc`	discretized $i$ th variable and data matrix of other variables

Package 'discretization'

Help Index

Data preprocessing, discretization for classification.

Description

Details

Author(s)

References

Auxiliary function for Ameva algorithm

Description

Usage

Arguments

Details

Value

Author(s)

References

See Also

Examples

Auxiliary function for CACC discretization algorithm

Description

Usage

Arguments

Details

Value

Author(s)

References

See Also

Examples

Auxiliary function for caim discretization algorithm

Description

Usage

Arguments

Details

Author(s)

References

See Also

Examples

Discretization using the Chi2 algorithm

Description

Usage

Arguments

Details

Value

Author(s)

References

See Also

Examples

Discretization using ChiMerge algorithm

Description

Usage

Arguments

Details

Value

Author(s)

References

See Also

Examples

Auxiliary function for discretization using Chi-square statistic

Description

Usage

Arguments

Details

Value

Author(s)

References

See Also

Examples

Auxiliary function for the MDLP

Description

Usage

Arguments

Details

Author(s)

See Also

Auxiliary function for the MDLP

Description

Usage

Arguments

Author(s)

See Also

Top-down discretization