--- title: "Introduction to the noisemodel package" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Introduction to the noisemodel package} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) getInfo <- function(what = "Suggests") { text <- packageDescription("noisemodel")[what][[1]] text <- gsub("\n", ", ", text, fixed = TRUE) text <- gsub(">=", "$\\\\ge$", text, fixed = TRUE) eachPkg <- strsplit(text, ", ", fixed = TRUE)[[1]] eachPkg <- gsub(",", "", eachPkg, fixed = TRUE) #out <- paste("\\\**", eachPkg[order(tolower(eachPkg))], "}", sep = "") #paste(out, collapse = ", ") length(eachPkg) } ```
The **noisemodel** package contains the first extensive implementation of noise models for classification datasets. It provides 72 noise models found in the specialized literature that allow errors to be introduced in different ways in class labels, attributes or both in combination. Each of them is properly documented and referenced, unifying their results through a specific S3 class, which benefits from customized `print`, `summary` and `plot` methods.
## InstallationThe **noisemodel** package can be installed in R from **CRAN** servers using the command:
```{r install1} # install.packages("noisemodel") ```This command installs all the dependencies of the package that are necessary for the operation of the noise models. In order to access all the functions of the package, it is necessary to use the R command:
```{r install2} library(noisemodel) ``` ## DocumentationAll the information corresponding to each noise model can be consulted from the **CRAN** website. Additionally, the `help()` command can be used. For example, in order to check the documentation of the Symmetric uniform label noise model, we can use the command:
```{r help} # help(sym_uni_ln) ``` ## Usage of noise modelsFor introducing noise in a dataset, each noise model in the **noisemodel** package provides two standard ways of use:
An example on how to use these two methods for introducing noise in the `iris2D` dataset with the `sym_uni_ln` model is shown below:
```{r example 1} # load the dataset data(iris2D) # usage of the default method set.seed(9) outdef <- sym_uni_ln(x = iris2D[,-ncol(iris2D)], y = iris2D[,ncol(iris2D)], level = 0.1) # show results summary(outdef, showid = TRUE) plot(outdef) # usage of the method for class formula set.seed(9) outfrm <- sym_uni_ln(formula = Species ~ ., data = iris2D, level = 0.1) # check the match of noisy indices identical(outdef$idnoise, outfrm$idnoise) ```Note that, the `$` operator is used to access the elements returned by the noise model in the objects `outdef` and `outfrm`.
## Output valuesAll noise models return an object of class `ndmodel`. It is designed to unify the output value of the methods included in the **noisemodel** package. The class `ndmodel` is a list of elements with the most relevant information of the noise introduction process:
In order to display the results of the class `ndmodel` in a friendly way in the R console, specific `print`, `summary` and `plot` functions are implemented. The `print` function presents the basic information about the noise introduction process contained in an object of class `ndmodel`:
```{r example 3} print(outdef) ```The information offered by `print` is as follows:
On the other hand, the `summary` function displays a summary containing information about the noise introduction process contained in an object of class `ndmodel`, with other additional details. This function can be called by typing the following R command:
```{r example 4} summary(outdef, showid = TRUE) ```The information offered by this function is as follows:
Finally, the `plot` function displays a representation of the dataset contained in an object of class `ndmodel` after the application of a noise introduction model.
```{r example 5} plot(outdef) ```This function performs a two-dimensional representation using the **ggplot2** package of the dataset contained in the object *x* of class `ndmodel`. Each of the classes in the dataset (available in `x$ynoise`) is represented by a different color.