A Sequential
model is appropriate for a plain
stack of layers where each layer has exactly one input
tensor and one output tensor.
Schematically, the following Sequential
model:
model <- keras_model_sequential() |>
layer_dense(units = 2, activation = "relu", name = "layer1") |>
layer_dense(units = 3, activation = "relu", name = "layer2") |>
layer_dense(units = 4, name = "layer3")
# Call model on a test input
x <- op_ones(c(3, 3))
y <- model(x)
is equivalent to this function:
# Create 3 layers
layer1 <- layer_dense(units = 2, activation="relu", name="layer1")
layer2 <- layer_dense(units = 3, activation="relu", name="layer2")
layer3 <- layer_dense(units = 4, name="layer3")
# Call layers on a test input
x <- op_ones(c(3, 3))
y <- x |> layer1() |> layer2() |> layer3()
A Sequential model is not appropriate when:
You can create a Sequential model by piping layers into the
keras_model_sequential()
object:
model <- keras_model_sequential() |>
layer_dense(units = 2, activation = "relu") |>
layer_dense(units = 3, activation = "relu") |>
layer_dense(units = 4)
or by passing a list of layers to
keras_model_sequential()
:
model <- keras_model_sequential(layers = list(
layer_dense(units = 2, activation = "relu"),
layer_dense(units = 3, activation = "relu"),
layer_dense(units = 4)
))
Its layers are accessible via the layers
attribute:
## [[1]]
## <Dense name=dense_3, built=False>
## signature: (*args, **kwargs)
##
## [[2]]
## <Dense name=dense_4, built=False>
## signature: (*args, **kwargs)
##
## [[3]]
## <Dense name=dense_5, built=False>
## signature: (*args, **kwargs)
You can also create a Sequential model incrementally:
model <- keras_model_sequential()
model |> layer_dense(units = 2, activation="relu")
model |> layer_dense(units = 3, activation="relu")
model |> layer_dense(units = 4)
Note that there’s also a corresponding pop_layer()
method to remove layers: a Sequential model behaves very much like a
stack of layers.
## [1] 2
Also note that the Sequential constructor accepts a name
argument, just like any layer or model in Keras. This is useful to
annotate TensorBoard graphs with semantically meaningful names.
Generally, all layers in Keras need to know the shape of their inputs in order to be able to create their weights. So when you create a layer like this, initially, it has no weights:
## list()
It creates its weights the first time it is called on an input, since the shape of the weights depends on the shape of the inputs:
# Call layer on a test input
x <- op_ones(c(1, 4))
y <- layer(x)
layer$weights # Now it has weights, of shape (4, 3) and (3,)
## [[1]]
## <KerasVariable shape=(4, 3), dtype=float32, path=dense_9/kernel>
##
## [[2]]
## <KerasVariable shape=(3), dtype=float32, path=dense_9/bias>
Naturally, this also applies to Sequential models. When you
instantiate a Sequential model without an input shape, it isn’t “built”:
it has no weights (and calling model$weights
results in an
error stating just this). The weights are created when the model first
sees some input data:
model <- keras_model_sequential() |>
layer_dense(units = 2, activation = "relu") |>
layer_dense(units = 3, activation = "relu") |>
layer_dense(units = 4)
# No weights at this stage!
# At this point, you can't do this:
# model$weights
# Call the model on a test input
x <- op_ones(c(1, 4))
y <- model(x)
length(model$weights)
## [1] 6
Once a model is “built”, you can call its summary()
method to display its contents:
## [1mModel: "sequential_4"[0m
## ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
## ┃[1m [0m[1mLayer (type) [0m[1m [0m┃[1m [0m[1mOutput Shape [0m[1m [0m┃[1m [0m[1m Param #[0m[1m [0m┃
## ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
## │ dense_10 ([38;5;33mDense[0m) │ ([38;5;34m1[0m, [38;5;34m2[0m) │ [38;5;34m10[0m │
## ├─────────────────────────────────┼────────────────────────┼───────────────┤
## │ dense_11 ([38;5;33mDense[0m) │ ([38;5;34m1[0m, [38;5;34m3[0m) │ [38;5;34m9[0m │
## ├─────────────────────────────────┼────────────────────────┼───────────────┤
## │ dense_12 ([38;5;33mDense[0m) │ ([38;5;34m1[0m, [38;5;34m4[0m) │ [38;5;34m16[0m │
## └─────────────────────────────────┴────────────────────────┴───────────────┘
## [1m Total params: [0m[38;5;34m35[0m (140.00 B)
## [1m Trainable params: [0m[38;5;34m35[0m (140.00 B)
## [1m Non-trainable params: [0m[38;5;34m0[0m (0.00 B)
However, it can be very useful when building a Sequential model
incrementally to be able to display the summary of the model so far,
including the current output shape. In this case, you should start your
model by passing an input_shape
argument to your model, so
that it knows its input shape from the start:
model <- keras_model_sequential(input_shape = 4) |>
layer_dense(units = 2, activation = "relu")
summary(model)
## [1mModel: "sequential_5"[0m
## ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
## ┃[1m [0m[1mLayer (type) [0m[1m [0m┃[1m [0m[1mOutput Shape [0m[1m [0m┃[1m [0m[1m Param #[0m[1m [0m┃
## ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
## │ dense_13 ([38;5;33mDense[0m) │ ([38;5;45mNone[0m, [38;5;34m2[0m) │ [38;5;34m10[0m │
## └─────────────────────────────────┴────────────────────────┴───────────────┘
## [1m Total params: [0m[38;5;34m10[0m (40.00 B)
## [1m Trainable params: [0m[38;5;34m10[0m (40.00 B)
## [1m Non-trainable params: [0m[38;5;34m0[0m (0.00 B)
## [[1]]
## <Dense name=dense_13, built=True>
## signature: (*args, **kwargs)
Models built with a predefined input shape like this always have weights (even before seeing any data) and always have a defined output shape.
In general, it’s a recommended best practice to always specify the input shape of a Sequential model in advance if you know what it is.
summary()
When building a new Sequential architecture, it’s useful to
incrementally stack layers with |>
and frequently print
model summaries. For instance, this enables you to monitor how a stack
of Conv2D
and MaxPooling2D
layers is
downsampling image feature maps:
model <- keras_model_sequential(input_shape = c(250, 250, 3)) |>
layer_conv_2d(filters = 32, kernel_size = 5, strides = 2, activation = "relu") |>
layer_conv_2d(filters = 32, kernel_size = 3, activation = "relu") |>
layer_max_pooling_2d(pool_size = c(3, 3))
# Can you guess what the current output shape is at this point? Probably not.
# Let's just print it:
summary(model)
## [1mModel: "sequential_6"[0m
## ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
## ┃[1m [0m[1mLayer (type) [0m[1m [0m┃[1m [0m[1mOutput Shape [0m[1m [0m┃[1m [0m[1m Param #[0m[1m [0m┃
## ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
## │ conv2d ([38;5;33mConv2D[0m) │ ([38;5;45mNone[0m, [38;5;34m123[0m, [38;5;34m123[0m, [38;5;34m32[0m) │ [38;5;34m2,432[0m │
## ├─────────────────────────────────┼────────────────────────┼───────────────┤
## │ conv2d_1 ([38;5;33mConv2D[0m) │ ([38;5;45mNone[0m, [38;5;34m121[0m, [38;5;34m121[0m, [38;5;34m32[0m) │ [38;5;34m9,248[0m │
## ├─────────────────────────────────┼────────────────────────┼───────────────┤
## │ max_pooling2d ([38;5;33mMaxPooling2D[0m) │ ([38;5;45mNone[0m, [38;5;34m40[0m, [38;5;34m40[0m, [38;5;34m32[0m) │ [38;5;34m0[0m │
## └─────────────────────────────────┴────────────────────────┴───────────────┘
## [1m Total params: [0m[38;5;34m11,680[0m (45.62 KB)
## [1m Trainable params: [0m[38;5;34m11,680[0m (45.62 KB)
## [1m Non-trainable params: [0m[38;5;34m0[0m (0.00 B)
# The answer was: (40, 40, 32), so we can keep downsampling...
model |>
layer_conv_2d(filters = 32, kernel_size = 3, activation = "relu") |>
layer_conv_2d(filters = 32, kernel_size = 3, activation = "relu") |>
layer_max_pooling_2d(pool_size = 3) |>
layer_conv_2d(filters = 32, kernel_size = 3, activation = "relu") |>
layer_conv_2d(filters = 32, kernel_size = 3, activation = "relu") |>
layer_max_pooling_2d(pool_size = 2)
# And now?
summary(model)
## [1mModel: "sequential_6"[0m
## ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
## ┃[1m [0m[1mLayer (type) [0m[1m [0m┃[1m [0m[1mOutput Shape [0m[1m [0m┃[1m [0m[1m Param #[0m[1m [0m┃
## ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
## │ conv2d ([38;5;33mConv2D[0m) │ ([38;5;45mNone[0m, [38;5;34m123[0m, [38;5;34m123[0m, [38;5;34m32[0m) │ [38;5;34m2,432[0m │
## ├─────────────────────────────────┼────────────────────────┼───────────────┤
## │ conv2d_1 ([38;5;33mConv2D[0m) │ ([38;5;45mNone[0m, [38;5;34m121[0m, [38;5;34m121[0m, [38;5;34m32[0m) │ [38;5;34m9,248[0m │
## ├─────────────────────────────────┼────────────────────────┼───────────────┤
## │ max_pooling2d ([38;5;33mMaxPooling2D[0m) │ ([38;5;45mNone[0m, [38;5;34m40[0m, [38;5;34m40[0m, [38;5;34m32[0m) │ [38;5;34m0[0m │
## ├─────────────────────────────────┼────────────────────────┼───────────────┤
## │ conv2d_2 ([38;5;33mConv2D[0m) │ ([38;5;45mNone[0m, [38;5;34m38[0m, [38;5;34m38[0m, [38;5;34m32[0m) │ [38;5;34m9,248[0m │
## ├─────────────────────────────────┼────────────────────────┼───────────────┤
## │ conv2d_3 ([38;5;33mConv2D[0m) │ ([38;5;45mNone[0m, [38;5;34m36[0m, [38;5;34m36[0m, [38;5;34m32[0m) │ [38;5;34m9,248[0m │
## ├─────────────────────────────────┼────────────────────────┼───────────────┤
## │ max_pooling2d_1 ([38;5;33mMaxPooling2D[0m) │ ([38;5;45mNone[0m, [38;5;34m12[0m, [38;5;34m12[0m, [38;5;34m32[0m) │ [38;5;34m0[0m │
## ├─────────────────────────────────┼────────────────────────┼───────────────┤
## │ conv2d_4 ([38;5;33mConv2D[0m) │ ([38;5;45mNone[0m, [38;5;34m10[0m, [38;5;34m10[0m, [38;5;34m32[0m) │ [38;5;34m9,248[0m │
## ├─────────────────────────────────┼────────────────────────┼───────────────┤
## │ conv2d_5 ([38;5;33mConv2D[0m) │ ([38;5;45mNone[0m, [38;5;34m8[0m, [38;5;34m8[0m, [38;5;34m32[0m) │ [38;5;34m9,248[0m │
## ├─────────────────────────────────┼────────────────────────┼───────────────┤
## │ max_pooling2d_2 ([38;5;33mMaxPooling2D[0m) │ ([38;5;45mNone[0m, [38;5;34m4[0m, [38;5;34m4[0m, [38;5;34m32[0m) │ [38;5;34m0[0m │
## └─────────────────────────────────┴────────────────────────┴───────────────┘
## [1m Total params: [0m[38;5;34m48,672[0m (190.12 KB)
## [1m Trainable params: [0m[38;5;34m48,672[0m (190.12 KB)
## [1m Non-trainable params: [0m[38;5;34m0[0m (0.00 B)
# Now that we have 4x4 feature maps, time to apply global max pooling.
model |>
layer_global_max_pooling_2d()
# Finally, we add a classification layer.
model |>
layer_dense(units = 10, activation = "softmax")
Very practical, right?
Note that |>
is equivalent to calling
model$add()
, it modifies the model in-place, so you don’t
need to reassign the model
symbol at each step.
Once your model architecture is ready, you will want to:
Once a Sequential model has been built, it behaves like a Functional API model. This means that
every layer has an input
and output
attribute.
These attributes can be used to do neat things, like quickly creating a
model that extracts the outputs of all intermediate layers in a
Sequential model:
initial_model <- keras_model_sequential(input_shape = c(250, 250, 3)) |>
layer_conv_2d(filters = 32, kernel_size = 5, strides = 2, activation = "relu") |>
layer_conv_2d(filters = 32, kernel_size = 3, activation = "relu") |>
layer_conv_2d(filters = 32, kernel_size = 3, activation = "relu")
feature_extractor <- keras_model(
inputs = initial_model$inputs,
outputs = lapply(initial_model$layers, function(x) x$output),
)
# Call feature extractor on test input.
x <- op_ones(c(1, 250, 250, 3))
features <- feature_extractor(x)
Here’s a similar example that only extract features from one layer:
initial_model <-
keras_model_sequential(input_shape = c(250, 250, 3)) |>
layer_conv_2d(filters = 32, kernel_size = 5, strides = 2,
activation = "relu") |>
layer_conv_2d(filters = 32, kernel_size = 3, activation = "relu",
name = "my_intermediate_layer") |>
layer_conv_2d(filters = 32, kernel_size = 3, activation = "relu")
feature_extractor <- keras_model(
inputs = initial_model$inputs,
outputs = get_layer(initial_model, "my_intermediate_layer")$output,
)
# Call feature extractor on test input.
x <- op_ones(c(1, 250, 250, 3))
features <- feature_extractor(x)
Transfer learning consists of freezing the bottom layers in a model and only training the top layers. If you aren’t familiar with it, make sure to read our guide to transfer learning.
Here are two common transfer learning blueprint involving Sequential models.
First, let’s say that you have a Sequential model, and you want to
freeze all layers except the last one. In this case, you can call
freeze_weights()
. Alternatively, you can iterate over
model$layers
and set
layer$trainable <- FALSE
on each layer, except the last
one. Like this:
model <- keras_model_sequential(input_shape = 784) |>
layer_dense(units = 32, activation = "relu") |>
layer_dense(units = 32, activation = "relu") |>
layer_dense(units = 32, activation = "relu") |>
layer_dense(units = 10)
# Freeze all layers except the last one.
model |> freeze_weights(from = 1, to = -2)
model # note the "Trainable" column now visible in the summary table
## [1mModel: "sequential_9"[0m
## ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━┓
## ┃[1m [0m[1mLayer (type) [0m[1m [0m┃[1m [0m[1mOutput Shape [0m[1m [0m┃[1m [0m[1m Param #[0m[1m [0m┃[1m [0m[1mTrai…[0m[1m [0m┃
## ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━┩
## │ dense_15 ([38;5;33mDense[0m) │ ([38;5;45mNone[0m, [38;5;34m32[0m) │ [38;5;34m25,120[0m │ [1;91mN[0m │
## ├─────────────────────────────┼───────────────────────┼────────────┼───────┤
## │ dense_16 ([38;5;33mDense[0m) │ ([38;5;45mNone[0m, [38;5;34m32[0m) │ [38;5;34m1,056[0m │ [1;91mN[0m │
## ├─────────────────────────────┼───────────────────────┼────────────┼───────┤
## │ dense_17 ([38;5;33mDense[0m) │ ([38;5;45mNone[0m, [38;5;34m32[0m) │ [38;5;34m1,056[0m │ [1;91mN[0m │
## ├─────────────────────────────┼───────────────────────┼────────────┼───────┤
## │ dense_18 ([38;5;33mDense[0m) │ ([38;5;45mNone[0m, [38;5;34m10[0m) │ [38;5;34m330[0m │ [1;38;5;34mY[0m │
## └─────────────────────────────┴───────────────────────┴────────────┴───────┘
## [1m Total params: [0m[38;5;34m27,562[0m (107.66 KB)
## [1m Trainable params: [0m[38;5;34m330[0m (1.29 KB)
## [1m Non-trainable params: [0m[38;5;34m27,232[0m (106.38 KB)
# Another way to freeze all layers except the last one.
for (layer in model$layers[-length(model$layers)]) {
layer$trainable <- FALSE
}
# Recompile and train (this will only update the weights of the last layer).
model |> compile(...)
model |> fit(...)
Another common blueprint is to use a Sequential model to stack a pre-trained model and some freshly initialized classification layers. Like this:
# Load a convolutional base with pre-trained weights
base_model <- application_xception(weights = 'imagenet',
include_top = FALSE,
pooling = 'avg')
# Freeze the base model
freeze_weights(base_model)
# Use a Sequential model to add a trainable classifier on top
model <- keras_model_sequential() |>
base_model() |>
layer_dense(1000)
If you do transfer learning, you will probably find yourself frequently using these two patterns.
That’s about all you need to know about Sequential models!
To find out more about building models in Keras, see: