--- title: "imanr: Identify the Racial Complex of Native Corns from Mexico" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Identify the Racial Complex of Native Corns from Mexico} %\VignetteEncoding{UTF-8} %\VignetteEngine{knitr::rmarkdown} editor_options: chunk_output_type: console --- ```{r setup, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", warning = FALSE, message = FALSE ) library(imanr) ``` # imanr: Identificador de Maíz Nativo en R ## Introduction | | | |--------------------------------------------------------|----------------| | **imanr** is a novel machine learning package to help producers and researchers on the identification of racial complexes for native corn from Mexico which is fundamental to enhance the understanding of the distribution and characteristics of native corn in Mexico's agriculture. The package was developed thanks to the information collected by CONABIO for the [Proyecto Global de Maíces Nativos de México](https://www.biodiversidad.gob.mx/diversidad/proyectoMaices) whose goal was to update the available information on the many different corn varieties, their geographic origin and the implications to maize genetic diversity. There are many documents that were published with this information, and so, this package aims to expand the reach of this national project by allowing for pinpointing, with a high accuracy level, the most plausible racial complexes for a corn sample. | ![](images/hex-imanr.png){width="250"} | ## Usage The package is composed of two functions: `find_racial_complex()` and `impute_data()`. ### find_racial_complex() The main function in the package. In this we have loaded the machine learning model that computes the classification for the corn sample that is being fed to the function. The function takes only one argument which is a dataframe including qualitative and quantitative characteristics of the corn as can be seen in the included data: ```{r show_data_template} data("data31") # Necessary fields names(data31) ``` These are the required fields for the model to work properly. In future versions of this package we will work on the flexibility of what can be done and how can it be done. Once the data is loaded, it can be tested with the model and the results will show the racial complex to which each sample belongs to. ```{r using_find_racial_complex} # test for racial complexes find_racial_complex(data31) #> [1] Tropicales tardíos Dentados tropicales Dentados tropicales Dentados tropicales #> [5] Dentados tropicales Dentados tropicales Dentados tropicales Dentados tropicales #> [9] Dentados tropicales Dentados tropicales Dentados tropicales Dentados tropicales #> [13] Dentados tropicales Dentados tropicales Dentados tropicales Dentados tropicales #> [17] Dentados tropicales Dentados tropicales Tropicales tardíos Dentados tropicales #> [21] Dentados tropicales Dentados tropicales Dentados tropicales Dentados tropicales #> [25] Dentados tropicales Dentados tropicales Dentados tropicales Dentados tropicales #> [29] Dentados tropicales Dentados tropicales Dentados tropicales #> 7 Levels: Chapalote Cónico Dentados tropicales Ocho hileras ... Tropicales tardíos ``` ### impute_data() This function is complementary, and it aids the user to impute the missing data by comparing the absent fields with the full information from the Proyecto Nacional de Maíz Nativo database and then filling the gaps with adequate data that is computed through a random forests approach. The function takes two arguments, (1) `data` the dataset with missing information and which should have the same columns as the data that will be used for working with `find_racial_complex()`, and (2) `useParallel`, which can be helpful as the process can be intensive in terms of computation times and therefore the option to use parallel computing was added to improve the life quality of the user. ```{r using_impute_data, eval=FALSE} # testing the function imputed_data24 <- impute_data(data24, useParallel = TRUE) # test for racial complexes find_racial_complex(imputed_data24) ``` ## Contact - Rafael Nieves-Alvarez ([email](mailto:nievesalvarez1618@gmail.com)) - Arturo Sanchez-Porras ([email](mailto:sp.arturo@gmail.com)) - Aline Romero-Natale ([email](mailto:ronatale@gmail.com)) - Otilio Arturo Acevedo-Sandoval ([email](mailto:acevedo@uaeh.edu.mx)) ## imanr package installation Install from GitHub or CRAN: ```{r installation, eval = FALSE} #> From GitHub install.packages("devtools") library(devtools) install_github(repo = "rafa6174/imanr", build_vignettes = TRUE) #> From CRAN (recommended) # install.packages("imanr") # not just yet... ``` Load **imanr** package: ```{r loading_the_package, eval = FALSE} library(imanr) ``` ------------------------------------------------------------------------ ## References - Báez Vergara, K. J. Estimación de datos faltantes a través de redes neuronales, una comparación con métodos simpes y múltiples (Doctoral dissertation, Universidad Santo Tomás). - James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). An introduction to statistical learning: With applications in R (1a ed.). Springer. - Kuhn, M., & Johnson, K. (2013). Applied predictive modeling (1a ed.). Springer. - Monroy, L. G. D. (2007). Estadística Multivariada: Inferencia y Métodos. Univ. Nacional.