--- title: "Using Customized Models" output: rmarkdown::html_vignette author: Fangzhou Xie date: "`r format(Sys.time(), '%B %d, %Y')`" vignette: > %\VignetteIndexEntry{Using Customized Models} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` ```{r setup} library(rethnicity) ``` # Design of the Package I built this package to help applied researchers for research on ethnic equality/inequality. More specifically, this package provides a race-prediction method based on names. I designed the package in such way that the method is empowered by deep learning models, without the need to install the deep learning libraries, the installations of which are usually a daunting task. Hence, the methods provided in this package are not designed to be updated/fine-tuned/trained on custom datasets. This is the trade-off one has to be willing to make for the ease of use. That said, from version `0.2.0` onward, I provide two additional lower-level functions: `predict_fullname` and `predict_lastname`, which would allow users to provided their customized models. (There is only one function prior to `v0.2.0`: `predict_ethnicity`. This function is still the RECOMMENDED one to use for most people.) # Usage on Customized Models Since the package disables training by design, you need to train your own model in Keras and then convert the trained model to `.json` format by the [frugally-deep](https://github.com/Dobiasd/frugally-deep) project. ## Train the model in Keras If you are reading this vignette, most likely you know what you are doing and you must have heard `Keras`. Otherwise, you will have to stick to the default method `predict_ethnicity`. You can refer to the following links to see how I trained the models and create your own version: [fullname model](https://github.com/fangzhou-xie/rethnicity/blob/main/data-raw/rethnicity_singlechar_distill_fullname_aligned.ipynb), [lastname model](https://github.com/fangzhou-xie/rethnicity/blob/main/data-raw/rethnicity_singlechar_distill_lastname.ipynb). Before training the model, you need to process your dataset and you will need to use `keras.utils.to_categorical()` to transform the outcome variable into integers and you need to know the mapping between them. For example, `0, 1, 2, 3` refer to `asian, black, hispanic, white` respectively. You will need this and we will call it `labels = c("asian", "black", "hispanic", "white")`. Just remember to save the model without the optimizers (more on the [`frugally-deep` website](https://github.com/Dobiasd/frugally-deep)): ``` model.save('keras_model.h5', include_optimizer=False) ``` ## Convert the Model to `.json` Then, use the [`convert_model.py` script](https://github.com/Dobiasd/frugally-deep/tree/master/keras_export) to convert your model into `.json` format. This is what I did as well. You will encounter an error in the conversion process, if you include the optimizers in the saved model. ``` python convert_model.py keras_model.h5 keras_model.json ``` ## Predict with Your Own Model Now you have the model trained and converted and you need the file path of this model file. I am loading the default models without training new ones. ```{r} # remember the list of labels we mentioned? labels <- c("asian", "black", "hispanic", "white") # change to your own model file path model_path <- system.file("models", "fullname_aligned_distill.json", package = "rethnicity", mustWork = TRUE) # run the prediction predict_fullname(firstnames = "Alan", lastnames = "Turing", labels = labels, model_path = model_path) ``` In fact, if you tweak the code to predict gender from names, this will also work.