Title: | Neural Network Weights Transformation into Polynomial Coefficients |
---|---|
Description: | Implements a method that builds the coefficients of a polynomial model that performs almost equivalently as a given neural network (densely connected). This is achieved using Taylor expansion at the activation functions. The obtained polynomial coefficients can be used to explain features (and their interactions) importance in the neural network, therefore working as a tool for interpretability or eXplainable Artificial Intelligence (XAI). See Morala et al. 2021 <doi:10.1016/j.neunet.2021.04.036>, and 2023 <doi:10.1109/TNNLS.2023.3330328>. |
Authors: | Pablo Morala [aut, cre] , Iñaki Ucar [aut] , Jose Ignacio Diez [ctr] |
Maintainer: | Pablo Morala <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.1.2 |
Built: | 2024-12-25 07:11:18 UTC |
Source: | CRAN |
This function sets up a neural network object with the constraints required
by the nn2poly
algorithm. Currently supported neural network
frameworks are keras/tensorflow
and luz/torch
.
add_constraints(object, type = c("l1_norm", "l2_norm"), ...)
add_constraints(object, type = c("l1_norm", "l2_norm"), ...)
object |
A neural network object in sequential form from one of the supported frameworks. |
type |
Constraint type. Currently, |
... |
Additional arguments (unused). |
Constraints are added to the model object using callbacks in their specific framework. These callbacks are used during training when calling fit on the model. Specifically we are using callbacks that are applied at the end of each train batch.
Models in luz/torch
need to use the luz_model_sequential
helper in order to have a sequential model in the appropriate form.
A nn2poly
neural network object.
## Not run: if (requireNamespace("keras", quietly=TRUE)) { # ---- Example with a keras/tensorflow network ---- # Build a small nn: nn <- keras::keras_model_sequential() nn <- keras::layer_dense(nn, units = 10, activation = "tanh", input_shape = 2) nn <- keras::layer_dense(nn, units = 1, activation = "linear") # Add constraints nn_constrained <- add_constraints(nn, constraint_type = "l1_norm") # Check that class of the constrained nn is "nn2poly" class(nn_constrained)[1] } if (requireNamespace("luz", quietly=TRUE)) { # ---- Example with a luz/torch network ---- # Build a small nn nn <- luz_model_sequential( torch::nn_linear(2,10), torch::nn_tanh(), torch::nn_linear(10,1) ) # With luz/torch we need to setup the nn before adding the constraints nn <- luz::setup(module = nn, loss = torch::nn_mse_loss(), optimizer = torch::optim_adam, ) # Add constraints nn <- add_constraints(nn) # Check that class of the constrained nn is "nn2poly" class(nn)[1] } ## End(Not run)
## Not run: if (requireNamespace("keras", quietly=TRUE)) { # ---- Example with a keras/tensorflow network ---- # Build a small nn: nn <- keras::keras_model_sequential() nn <- keras::layer_dense(nn, units = 10, activation = "tanh", input_shape = 2) nn <- keras::layer_dense(nn, units = 1, activation = "linear") # Add constraints nn_constrained <- add_constraints(nn, constraint_type = "l1_norm") # Check that class of the constrained nn is "nn2poly" class(nn_constrained)[1] } if (requireNamespace("luz", quietly=TRUE)) { # ---- Example with a luz/torch network ---- # Build a small nn nn <- luz_model_sequential( torch::nn_linear(2,10), torch::nn_tanh(), torch::nn_linear(10,1) ) # With luz/torch we need to setup the nn before adding the constraints nn <- luz::setup(module = nn, loss = torch::nn_mse_loss(), optimizer = torch::optim_adam, ) # Add constraints nn <- add_constraints(nn) # Check that class of the constrained nn is "nn2poly" class(nn)[1] } ## End(Not run)
Evaluates one or several polynomials on the given data.
eval_poly(poly, newdata)
eval_poly(poly, newdata)
poly |
List containing 2 items:
Example: If |
newdata |
Input data as matrix, vector or dataframe. Number of columns (or elements in vector) should be the number of variables in the polynomial (dimension p). Response variable to be predicted should not be included. |
Note that this function is unstable and subject to change. Therefore it is
not exported but this documentations is left available so users can use it if
needed to simulate data by using nn2poly:::eval_poly()
Returns a matrix containing the evaluation of the polynomials. Each column corresponds to each polynomial used and each row to each observation, meaning that each column vector corresponds to the results of evaluating all the given data for each polynomial.
eval_poly()
is also used in predict.nn2poly()
.
luz
model composed of a linear stack of layersHelper function to build luz
models as a sequential model, by feeding
it a stack of luz
layers.
luz_model_sequential(...)
luz_model_sequential(...)
... |
Sequence of modules to be added. |
This step is needed so we can get the activation functions and
layers and neurons architecture easily with nn2poly:::get_parameters()
.
Furthermore, this step is also needed to be able to impose the needed
constraints when using the luz/torch
framework.
A nn_sequential
module.
## Not run: if (requireNamespace("luz", quietly=TRUE)) { # Create a NN using luz/torch as a sequential model # with 3 fully connected linear layers, # the first one with input = 5 variables, # 100 neurons and tanh activation function, the second # one with 50 neurons and softplus activation function # and the last one with 1 linear output. nn <- luz_model_sequential( torch::nn_linear(5,100), torch::nn_tanh(), torch::nn_linear(100,50), torch::nn_softplus(), torch::nn_linear(50,1) ) nn # Check that the nn is of class nn_squential class(nn) } ## End(Not run)
## Not run: if (requireNamespace("luz", quietly=TRUE)) { # Create a NN using luz/torch as a sequential model # with 3 fully connected linear layers, # the first one with input = 5 variables, # 100 neurons and tanh activation function, the second # one with 50 neurons and softplus activation function # and the last one with 1 linear output. nn <- luz_model_sequential( torch::nn_linear(5,100), torch::nn_tanh(), torch::nn_linear(100,50), torch::nn_softplus(), torch::nn_linear(50,1) ) nn # Check that the nn is of class nn_squential class(nn) } ## End(Not run)
Implements the main NN2Poly algorithm to obtain a polynomial representation of a trained neural network using its weights and Taylor expansion of its activation functions.
nn2poly( object, max_order = 2, keep_layers = FALSE, taylor_orders = 8, ..., all_partitions = NULL )
nn2poly( object, max_order = 2, keep_layers = FALSE, taylor_orders = 8, ..., all_partitions = NULL )
object |
An object for which the computation of the NN2Poly algorithm is desired. Currently supports models from the following deep learning frameworks:
It also supports a named At any layer |
max_order |
|
keep_layers |
Boolean that determines if all polynomials computed in
the internal layers have to be stored and given in the output ( |
taylor_orders |
|
... |
Ignored. |
all_partitions |
Optional argument containing the needed multipartitions
as list of lists of lists. If set to |
Returns an object of class nn2poly
.
If keep_layers = FALSE
(default case), it returns a list with two
items:
An item named labels
that is a list of integer vectors. Those vectors
represent each monomial in the polynomial, where each integer in the vector
represents each time one of the original variables appears in that term.
As an example, vector c(1,1,2) represents the term . Note that
the variables are numbered from 1 to p, with the intercept is represented by
zero.
An item named values
which contains a matrix in which each column contains
the coefficients of the polynomial associated with an output neuron. That is,
if the neural network has a single output unit, the matrix values
will have
a single column and if it has multiple output units, the matrix values
will
have several columns. Each row will be the coefficient associated with the
label in the same position in the labels list.
If keep_layers = TRUE
, it returns a list of length the number of
layers (represented by layer_i
), where each one is another list with
input
and output
elements. Each of those elements contains an
item as explained before. The last layer output item will be the same element
as if keep_layers = FALSE
.
The polynomials obtained at the hidden layers are not needed to represent the NN but can be used to explore other insights from the NN.
Predict method for nn2poly
output predict.nn2poly()
.
# Build a NN estructure with random weights, with 2 (+ bias) inputs, # 4 (+bias) neurons in the first hidden layer with "tanh" activation # function, 4 (+bias) neurons in the second hidden layer with "softplus", # and 1 "linear" output unit weights_layer_1 <- matrix(rnorm(12), nrow = 3, ncol = 4) weights_layer_2 <- matrix(rnorm(20), nrow = 5, ncol = 4) weights_layer_3 <- matrix(rnorm(5), nrow = 5, ncol = 1) # Set it as a list with activation functions as names nn_object = list("tanh" = weights_layer_1, "softplus" = weights_layer_2, "linear" = weights_layer_3) # Obtain the polynomial representation (order = 3) of that neural network final_poly <- nn2poly(nn_object, max_order = 3) # Change the last layer to have 3 outputs (as in a multiclass classification) # problem weights_layer_4 <- matrix(rnorm(20), nrow = 5, ncol = 4) # Set it as a list with activation functions as names nn_object = list("tanh" = weights_layer_1, "softplus" = weights_layer_2, "linear" = weights_layer_4) # Obtain the polynomial representation of that neural network # In this case the output is formed by several polynomials with the same # structure but different coefficient values final_poly <- nn2poly(nn_object, max_order = 3) # Polynomial representation of each hidden neuron is given by final_poly <- nn2poly(nn_object, max_order = 3, keep_layers = TRUE)
# Build a NN estructure with random weights, with 2 (+ bias) inputs, # 4 (+bias) neurons in the first hidden layer with "tanh" activation # function, 4 (+bias) neurons in the second hidden layer with "softplus", # and 1 "linear" output unit weights_layer_1 <- matrix(rnorm(12), nrow = 3, ncol = 4) weights_layer_2 <- matrix(rnorm(20), nrow = 5, ncol = 4) weights_layer_3 <- matrix(rnorm(5), nrow = 5, ncol = 1) # Set it as a list with activation functions as names nn_object = list("tanh" = weights_layer_1, "softplus" = weights_layer_2, "linear" = weights_layer_3) # Obtain the polynomial representation (order = 3) of that neural network final_poly <- nn2poly(nn_object, max_order = 3) # Change the last layer to have 3 outputs (as in a multiclass classification) # problem weights_layer_4 <- matrix(rnorm(20), nrow = 5, ncol = 4) # Set it as a list with activation functions as names nn_object = list("tanh" = weights_layer_1, "softplus" = weights_layer_2, "linear" = weights_layer_4) # Obtain the polynomial representation of that neural network # In this case the output is formed by several polynomials with the same # structure but different coefficient values final_poly <- nn2poly(nn_object, max_order = 3) # Polynomial representation of each hidden neuron is given by final_poly <- nn2poly(nn_object, max_order = 3, keep_layers = TRUE)
If the points come from the predictions of an NN and a PM and the line
(plot.line = TRUE)
is displayed, in case the method does exhibit
asymptotic behavior, the points should not fall in the line.
plot_diagonal( x_axis, y_axis, xlab = NULL, ylab = NULL, title = NULL, plot.line = TRUE )
plot_diagonal( x_axis, y_axis, xlab = NULL, ylab = NULL, title = NULL, plot.line = TRUE )
x_axis |
Values to plot in the |
y_axis |
Values to plot in the |
xlab |
Lab of the |
ylab |
Lab of the |
title |
Title of the plot. |
plot.line |
If a red line with |
Plot (ggplot object).
Function that allows to take a NN and the data input values
and plot the distribution of data activation potentials
(sum of input values * weights) at all neurons together at each layer
with the Taylor expansion used in the activation functions. If any layer
is 'linear'
(usually will be the output), then that layer will not
be an approximation as Taylor expansion is not needed.
plot_taylor_and_activation_potentials( object, data, max_order, taylor_orders = 8, constraints, taylor_interval = 1.5, ... )
plot_taylor_and_activation_potentials( object, data, max_order, taylor_orders = 8, constraints, taylor_interval = 1.5, ... )
object |
An object for which the computation of the NN2Poly algorithm is desired. Currently supports models from the following deep learning frameworks:
It also supports a named At any layer |
data |
Matrix or data frame containing the predictor variables (X) to be used as input to compute their activation potentials. The response variable column should not be included. |
max_order |
|
taylor_orders |
|
constraints |
Boolean parameter determining if the NN is constrained (TRUE) or not (FALSE). This only modifies the plots title to show "constrained" or "unconstrained" respectively. |
taylor_interval |
optional parameter determining the interval in which the Taylor expansion is represented. Default is 1.5. |
... |
Additional parameters. |
A list of plots.
nn2poly
objects.A function that takes a polynomial (or several ones) as given by the nn2poly algorithm, and then plots their absolute magnitude as barplots to be able to compare the most important coefficients.
## S3 method for class 'nn2poly' plot(x, ..., n = NULL)
## S3 method for class 'nn2poly' plot(x, ..., n = NULL)
x |
A |
... |
Ignored. |
n |
An integer denoting the number of coefficients to be plotted, after ordering them by absolute magnitude. |
The plot method represents only the polynomials at the final layer, even if
x
is generated using nn2poly()
with keep_layers=TRUE
.
A plot showing the n
most important coefficients.
# --- Single polynomial output --- # Build a NN structure with random weights, with 2 (+ bias) inputs, # 4 (+bias) neurons in the first hidden layer with "tanh" activation # function, 4 (+bias) neurons in the second hidden layer with "softplus", # and 2 "linear" output units weights_layer_1 <- matrix(rnorm(12), nrow = 3, ncol = 4) weights_layer_2 <- matrix(rnorm(20), nrow = 5, ncol = 4) weights_layer_3 <- matrix(rnorm(5), nrow = 5, ncol = 1) # Set it as a list with activation functions as names nn_object = list("tanh" = weights_layer_1, "softplus" = weights_layer_2, "linear" = weights_layer_3) # Obtain the polynomial representation (order = 3) of that neural network final_poly <- nn2poly(nn_object, max_order = 3) # Plot all the coefficients, one plot per output unit plot(final_poly) # Plot only the 5 most important coeffcients (by absolute magnitude) # one plot per output unit plot(final_poly, n = 5) # --- Multiple output polynomials --- # Build a NN structure with random weights, with 2 (+ bias) inputs, # 4 (+bias) neurons in the first hidden layer with "tanh" activation # function, 4 (+bias) neurons in the second hidden layer with "softplus", # and 2 "linear" output units weights_layer_1 <- matrix(rnorm(12), nrow = 3, ncol = 4) weights_layer_2 <- matrix(rnorm(20), nrow = 5, ncol = 4) weights_layer_3 <- matrix(rnorm(10), nrow = 5, ncol = 2) # Set it as a list with activation functions as names nn_object = list("tanh" = weights_layer_1, "softplus" = weights_layer_2, "linear" = weights_layer_3) # Obtain the polynomial representation (order = 3) of that neural network final_poly <- nn2poly(nn_object, max_order = 3) # Plot all the coefficients, one plot per output unit plot(final_poly) # Plot only the 5 most important coeffcients (by absolute magnitude) # one plot per output unit plot(final_poly, n = 5)
# --- Single polynomial output --- # Build a NN structure with random weights, with 2 (+ bias) inputs, # 4 (+bias) neurons in the first hidden layer with "tanh" activation # function, 4 (+bias) neurons in the second hidden layer with "softplus", # and 2 "linear" output units weights_layer_1 <- matrix(rnorm(12), nrow = 3, ncol = 4) weights_layer_2 <- matrix(rnorm(20), nrow = 5, ncol = 4) weights_layer_3 <- matrix(rnorm(5), nrow = 5, ncol = 1) # Set it as a list with activation functions as names nn_object = list("tanh" = weights_layer_1, "softplus" = weights_layer_2, "linear" = weights_layer_3) # Obtain the polynomial representation (order = 3) of that neural network final_poly <- nn2poly(nn_object, max_order = 3) # Plot all the coefficients, one plot per output unit plot(final_poly) # Plot only the 5 most important coeffcients (by absolute magnitude) # one plot per output unit plot(final_poly, n = 5) # --- Multiple output polynomials --- # Build a NN structure with random weights, with 2 (+ bias) inputs, # 4 (+bias) neurons in the first hidden layer with "tanh" activation # function, 4 (+bias) neurons in the second hidden layer with "softplus", # and 2 "linear" output units weights_layer_1 <- matrix(rnorm(12), nrow = 3, ncol = 4) weights_layer_2 <- matrix(rnorm(20), nrow = 5, ncol = 4) weights_layer_3 <- matrix(rnorm(10), nrow = 5, ncol = 2) # Set it as a list with activation functions as names nn_object = list("tanh" = weights_layer_1, "softplus" = weights_layer_2, "linear" = weights_layer_3) # Obtain the polynomial representation (order = 3) of that neural network final_poly <- nn2poly(nn_object, max_order = 3) # Plot all the coefficients, one plot per output unit plot(final_poly) # Plot only the 5 most important coeffcients (by absolute magnitude) # one plot per output unit plot(final_poly, n = 5)
nn2poly
objects.Predicted values obtained with a nn2poly
object on given data.
## S3 method for class 'nn2poly' predict(object, newdata, layers = NULL, ...)
## S3 method for class 'nn2poly' predict(object, newdata, layers = NULL, ...)
object |
Object of class inheriting from 'nn2poly'. |
newdata |
Input data as matrix, vector or dataframe. Number of columns (or elements in vector) should be the number of variables in the polynomial (dimension p). Response variable to be predicted should not be included. |
layers |
Vector containing the chosen layers from |
... |
Further arguments passed to or from other methods. |
Internally uses eval_poly()
to obtain the predictions. However, this only
works with a objects of class nn2poly
while eval_poly()
can be used
with a manually created polynomial in list form.
When object
contains all the internal polynomials also, as given by
nn2poly(object, keep_layers = TRUE)
, it is important to note that there
are two polynomial items per layer (input/output). These polynomial items will
also contain several polynomials of the same structure, one per neuron in the
layer, stored as matrix rows in $values
. Please see the NN2Poly
original paper for more details.
Note also that "linear" layers will contain the same input and output results as Taylor expansion is not used and thus the polynomials are also the same. Because of this, in the situation of evaluating multiple layers we provide the final layer with "input" and "output" even if they are the same, for consistency.
Returns a matrix or list of matrices with the evaluation of each
polynomial at each layer as given by the provided object
of class
nn2poly
.
If object
contains the polynomials of the last layer, as given by
nn2poly(object, keep_layers = FALSE)
, then the output is a matrix with
the evaluation of each data point on each polynomial. In this matrix, each
column represents the evaluation of a polynomial and each column corresponds
to each point in the new data to be evaluated.
If object
contains all the internal polynomials also, as given by
nn2poly(object, keep_layers = TRUE)
, then the output is a list of
layers (represented by layer_i
), where each one is another list with
input
and output
elements, where each one contains a matrix
with the evaluation of the "input" or "output" polynomial at the given layer,
as explained in the case without internal polynomials.
nn2poly()
: function that obtains the nn2poly
polynomial
object, eval_poly()
: function that can evaluate polynomials in general,
stats::predict()
: generic predict function.
# Build a NN structure with random weights, with 2 (+ bias) inputs, # 4 (+bias) neurons in the first hidden layer with "tanh" activation # function, 4 (+bias) neurons in the second hidden layer with "softplus", # and 1 "linear" output unit weights_layer_1 <- matrix(rnorm(12), nrow = 3, ncol = 4) weights_layer_2 <- matrix(rnorm(20), nrow = 5, ncol = 4) weights_layer_3 <- matrix(rnorm(5), nrow = 5, ncol = 1) # Set it as a list with activation functions as names nn_object = list("tanh" = weights_layer_1, "softplus" = weights_layer_2, "linear" = weights_layer_3) # Obtain the polynomial representation (order = 3) of that neural network final_poly <- nn2poly(nn_object, max_order = 3) # Define some new data, it can be vector, matrix or dataframe newdata <- matrix(rnorm(10), ncol = 2, nrow = 5) # Predict using the obtained polynomial predict(object = final_poly, newdata = newdata) # Change the last layer to have 3 outputs (as in a multiclass classification) # problem weights_layer_4 <- matrix(rnorm(20), nrow = 5, ncol = 4) # Set it as a list with activation functions as names nn_object = list("tanh" = weights_layer_1, "softplus" = weights_layer_2, "linear" = weights_layer_4) # Obtain the polynomial representation of that neural network # Polynomial representation of each hidden neuron is given by final_poly <- nn2poly(nn_object, max_order = 3, keep_layers = TRUE) # Define some new data, it can be vector, matrix or dataframe newdata <- matrix(rnorm(10), ncol = 2, nrow = 5) # Predict using the obtained polynomials (for all layers) predict(object = final_poly, newdata = newdata) # Predict using the obtained polynomials (for chosen layers) predict(object = final_poly, newdata = newdata, layers = c(2,3))
# Build a NN structure with random weights, with 2 (+ bias) inputs, # 4 (+bias) neurons in the first hidden layer with "tanh" activation # function, 4 (+bias) neurons in the second hidden layer with "softplus", # and 1 "linear" output unit weights_layer_1 <- matrix(rnorm(12), nrow = 3, ncol = 4) weights_layer_2 <- matrix(rnorm(20), nrow = 5, ncol = 4) weights_layer_3 <- matrix(rnorm(5), nrow = 5, ncol = 1) # Set it as a list with activation functions as names nn_object = list("tanh" = weights_layer_1, "softplus" = weights_layer_2, "linear" = weights_layer_3) # Obtain the polynomial representation (order = 3) of that neural network final_poly <- nn2poly(nn_object, max_order = 3) # Define some new data, it can be vector, matrix or dataframe newdata <- matrix(rnorm(10), ncol = 2, nrow = 5) # Predict using the obtained polynomial predict(object = final_poly, newdata = newdata) # Change the last layer to have 3 outputs (as in a multiclass classification) # problem weights_layer_4 <- matrix(rnorm(20), nrow = 5, ncol = 4) # Set it as a list with activation functions as names nn_object = list("tanh" = weights_layer_1, "softplus" = weights_layer_2, "linear" = weights_layer_4) # Obtain the polynomial representation of that neural network # Polynomial representation of each hidden neuron is given by final_poly <- nn2poly(nn_object, max_order = 3, keep_layers = TRUE) # Define some new data, it can be vector, matrix or dataframe newdata <- matrix(rnorm(10), ncol = 2, nrow = 5) # Predict using the obtained polynomials (for all layers) predict(object = final_poly, newdata = newdata) # Predict using the obtained polynomials (for chosen layers) predict(object = final_poly, newdata = newdata, layers = c(2,3))