imanr: Identify the Racial Complex of Native Corns from Mexico

imanr: Identificador de Maíz Nativo en R

Introduction

imanr is a novel machine learning package to help producers and researchers on the identification of racial complexes for native corn from Mexico which is fundamental to enhance the understanding of the distribution and characteristics of native corn in Mexico’s agriculture. The package was developed thanks to the information collected by CONABIO for the Proyecto Global de Maíces Nativos de México whose goal was to update the available information on the many different corn varieties, their geographic origin and the implications to maize genetic diversity. There are many documents that were published with this information, and so, this package aims to expand the reach of this national project by allowing for pinpointing, with a high accuracy level, the most plausible racial complexes for a corn sample.

Usage

The package is composed of two functions: find_racial_complex() and impute_data().

find_racial_complex()

The main function in the package. In this we have loaded the machine learning model that computes the classification for the corn sample that is being fed to the function. The function takes only one argument which is a dataframe including qualitative and quantitative characteristics of the corn as can be seen in the included data:

data("data31")

# Necessary fields
names(data31)
#>  [1] "Altitud"                                    
#>  [2] "Longitud"                                   
#>  [3] "Latitud"                                    
#>  [4] "Color.de.grano.crema"                       
#>  [5] "Color.de.grano..blanco.puro..H."            
#>  [6] "Color.de.grano.amarillo..B."                
#>  [7] "Color.de.grano.morado..C."                  
#>  [8] "Color.de.grano.jaspeado..D."                
#>  [9] "Color.de.grano.amarillo.claro"              
#> [10] "Color.de.grano.amarillo.medio"              
#> [11] "Color.de.grano.amarillo.naranja..F."        
#> [12] "Color.de.grano.azul..K."                    
#> [13] "Color.de.grano.azul.oscuro..L."             
#> [14] "Color.de.grano.blanco..A."                  
#> [15] "Color.de.grano.blanco.cremoso"              
#> [16] "Color.de.grano.café..E."                    
#> [17] "Color.de.grano.naranja"                     
#> [18] "Color.de.grano.negro"                       
#> [19] "Color.de.grano.rojo..I."                    
#> [20] "Color.de.grano.rojo.naranja..J."            
#> [21] "Color.de.grano.rojo.oscuro"                 
#> [22] "Color.de.grano.rosa"                        
#> [23] "Color.de.olote.amarillo.claro"              
#> [24] "Color.de.olote.amarillo.medio"              
#> [25] "Color.de.olote.amarillo.naranja"            
#> [26] "Color.de.olote.azul"                        
#> [27] "Color.de.olote.azul.oscuro"                 
#> [28] "Color.de.olote.blanco"                      
#> [29] "Color.de.olote.blanco.cremoso"              
#> [30] "Color.de.olote.café"                        
#> [31] "Color.de.olote.naranja"                     
#> [32] "Color.de.olote.negro"                       
#> [33] "Color.de.olote.rojo"                        
#> [34] "Color.de.olote.rojo.naranja"                
#> [35] "Color.de.olote.rojo.oscuro"                 
#> [36] "Color.de.tallo"                             
#> [37] "Color.de.tallo.verde"                       
#> [38] "Color.de.tallo.morado"                      
#> [39] "Color.de.tallo.rojo"                        
#> [40] "Disposición.de.hileras.en.espiral"          
#> [41] "Disposición.de.hileras.irregular"           
#> [42] "Disposición.de.hileras.recta"               
#> [43] "Disposición.de.hileras.semirecta"           
#> [44] "Disposición.de.hileras.regular"             
#> [45] "Forma.de.mazorca.cilíndrica"                
#> [46] "Forma.de.mazorca.cónica"                    
#> [47] "Forma.de.mazorca.cónica.cilíndrica"         
#> [48] "Forma.de.mazorca.esférica"                  
#> [49] "Tipo.de.grano.ceroso"                       
#> [50] "Tipo.de.grano.cristalino..F."               
#> [51] "Tipo.de.grano.dentado...C."                 
#> [52] "Tipo.de.grano.dulce..H."                    
#> [53] "Tipo.de.grano.harinoso..A."                 
#> [54] "Tipo.de.grano.reventador..G."               
#> [55] "Tipo.de.grano.semi.cristalino..E."          
#> [56] "Tipo.de.grano.semi.dentado..D."             
#> [57] "Tipo.de.grano.semi.harinoso"                
#> [58] "Longitud.de.mazorca"                        
#> [59] "Diametro.de.mazorca"                        
#> [60] "Hileras.por.mazorca"                        
#> [61] "Diámetro.longitud.de.la.mazorca_recalculado"

These are the required fields for the model to work properly. In future versions of this package we will work on the flexibility of what can be done and how can it be done.

Once the data is loaded, it can be tested with the model and the results will show the racial complex to which each sample belongs to.

# test for racial complexes
find_racial_complex(data31)
#>  [1] Tropicales tardíos  Dentados tropicales Dentados tropicales
#>  [4] Dentados tropicales Dentados tropicales Dentados tropicales
#>  [7] Dentados tropicales Dentados tropicales Dentados tropicales
#> [10] Dentados tropicales Dentados tropicales Dentados tropicales
#> [13] Dentados tropicales Dentados tropicales Dentados tropicales
#> [16] Dentados tropicales Dentados tropicales Dentados tropicales
#> [19] Tropicales tardíos  Dentados tropicales Dentados tropicales
#> [22] Dentados tropicales Dentados tropicales Dentados tropicales
#> [25] Dentados tropicales Dentados tropicales Dentados tropicales
#> [28] Dentados tropicales Dentados tropicales Dentados tropicales
#> [31] Dentados tropicales
#> 7 Levels: Chapalote Cónico Dentados tropicales ... Tropicales tardíos

#> [1] Tropicales tardíos  Dentados tropicales Dentados tropicales Dentados tropicales
#> [5] Dentados tropicales Dentados tropicales Dentados tropicales Dentados tropicales
#> [9] Dentados tropicales Dentados tropicales Dentados tropicales Dentados tropicales
#> [13] Dentados tropicales Dentados tropicales Dentados tropicales Dentados tropicales
#> [17] Dentados tropicales Dentados tropicales Tropicales tardíos  Dentados tropicales
#> [21] Dentados tropicales Dentados tropicales Dentados tropicales Dentados tropicales
#> [25] Dentados tropicales Dentados tropicales Dentados tropicales Dentados tropicales
#> [29] Dentados tropicales Dentados tropicales Dentados tropicales
#> 7 Levels: Chapalote Cónico Dentados tropicales Ocho hileras ... Tropicales tardíos

impute_data()

This function is complementary, and it aids the user to impute the missing data by comparing the absent fields with the full information from the Proyecto Nacional de Maíz Nativo database and then filling the gaps with adequate data that is computed through a random forests approach. The function takes two arguments, (1) data the dataset with missing information and which should have the same columns as the data that will be used for working with find_racial_complex(), and (2) useParallel, which can be helpful as the process can be intensive in terms of computation times and therefore the option to use parallel computing was added to improve the life quality of the user.

# testing the function
imputed_data24 <- impute_data(data24, useParallel = TRUE)

# test for racial complexes
find_racial_complex(imputed_data24)

Contact

  • Rafael Nieves-Alvarez (email)
  • Arturo Sanchez-Porras (email)
  • Aline Romero-Natale (email)
  • Otilio Arturo Acevedo-Sandoval (email)

imanr package installation

Install from GitHub or CRAN:

#> From GitHub
install.packages("devtools")
library(devtools)
install_github(repo = "rafa6174/imanr", build_vignettes = TRUE)

#> From CRAN (recommended)
# install.packages("imanr") # not just yet...

Load imanr package:

library(imanr)

References

  • Báez Vergara, K. J. Estimación de datos faltantes a través de redes neuronales, una comparación con métodos simpes y múltiples (Doctoral dissertation, Universidad Santo Tomás).
  • James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). An introduction to statistical learning: With applications in R (1a ed.). Springer.
  • Kuhn, M., & Johnson, K. (2013). Applied predictive modeling (1a ed.). Springer.
  • Monroy, L. G. D. (2007). Estadística Multivariada: Inferencia y Métodos. Univ. Nacional.