Keras
Keras
Manuel Sánchez-Montañés
Universidad Autónoma de Madrid
manuel.smontanes@uam.es
Phases in the construction of the model
Construction of the model in Keras: phases
Input Output
x_train.shape: (N_train, 3)
Sequential mode: y_train.shape: (N_train, 2), one-hot
Input Output
x_train.shape: (N_train, 3)
Functional mode: y_train.shape: (N_train, 2), one-hot
Input Output
x_train.shape: (N_train, 3)
Sequential mode: y_train.shape: (N_train, 2), one-hot
model = Sequential()
model.add(Dense(4, input_shape=(3,), activation='relu'))
model.add(Dense(2, activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam')
Shallow neural networks
Classifier model with 2 classes (input: 3 variables)
One hidden layer
Input Output
x_train.shape: (N_train, 3)
Functional mode: y_train.shape: (N_train, 2), one-hot
inputs = Input(shape=(3,))
x = Dense(4, activation='relu')(inputs)
x = Dense(2, activation='softmax')(x)
model = Model(inputs, x)
model.compile(loss='categorical_crossentropy', optimizer='adam')
Shallow neural networks
Classifier model with 2 classes (input: 3 variables)
Two hidden layers
Input Output
x_train.shape: (N_train, 3)
Sequential mode: y_train.shape: (N_train, 2), one-hot
model = Sequential()
model.add(Dense(4, input_shape=(3,), activation='relu'))
model.add(Dense(4, activation='relu'))
model.add(Dense(2, activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam')
Shallow neural networks
Classifier model with 2 classes (input: 3 variables)
Two hidden layers
Input Output
x_train.shape: (N_train, 3)
Functional mode: y_train.shape: (N_train, 2), one-hot
inputs = Input(shape=(3,))
x = Dense(4, activation='relu')(inputs)
x = Dense(4, activation='relu')(x)
x = Dense(2, activation='softmax')(x)
model = Model(inputs, x)
model.compile(loss='categorical_crossentropy', optimizer='adam')
Shallow neural networks
Classifier model with 2 classes (input: 3 variables)
Two hidden layers, one output neuron
Input Output
x_train.shape: (N_train, 3)
Sequential mode: y_train.shape: (N_train,), binary
model = Sequential()
model.add(Dense(4, input_shape=(3,), activation='relu'))
model.add(Dense(4, activation='relu'))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='adam')
Shallow neural networks
Classifier model with 2 classes (input: 3 variables)
Two hidden layers, one output neuron
Input Output
x_train.shape: (N_train, 3)
Functional mode: y_train.shape: (N_train,), binary
inputs = Input(shape=(3,))
x = Dense(4, activation='relu')(inputs)
x = Dense(4, activation='relu')(x)
x = Dense(1, activation='sigmoid')(x)
model = Model(inputs, x)
model.compile(loss='binary_crossentropy', optimizer='adam')
Creation of Convolutional Neural
Networks (CNNs) in Keras
Deep Convolutional Neural Network (CNN)
14
Deep Convolutional Neural Network (CNN)
https://developer.nvidia.com/discover/convolutional-neural-network
Imagen: Maurice Peemen
15
model = Sequential()
model.add(Conv2D(filters=4, input_shape=(32,32,1,),
kernel_size=(5,5), activation='relu'))
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Conv2D(filters=10,kernel_size=(5,5),activation='relu'))
model.add(MaxPooling2D())
model.add(Flatten())
model.add(Dense(100, activation='relu'))
model.add(Dense(8, activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam')
Kernels (convolution).
Example:
Input image Convolved (filtered) image
0 0 0 0 0 0 0 Kernel 3x3
0 0 1 1 1 0 0 0 2 3 0 -1
0 0 0 1 0 0 0 -1 2 0 0 2 5 -1 -1
0 0 0 1 0 0 0 -1 2 0 0 0 6 -3 0
0 0 0 1 0 0 0 -1 2 0 0 0 6 -3 0
0 0 0 1 0 0 0 0 0 4 -2 0
0 0 0 0 0 0 0
High activity pixels
0 if ReLU is applied
model = Sequential()
model.add(Conv2D(filters=4, input_shape=(32,32,1,),
kernel_size=(5,5), activation='relu'))
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Conv2D(filters=10,kernel_size=(5,5),activation='relu'))
model.add(MaxPooling2D())
model.add(Flatten())
model.add(Dense(100, activation='relu'))
model.add(Dense(8, activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam')
model = Sequential()
model.add(Conv2D(filters=4, input_shape=(32,32,1,),
kernel_size=(5,5), activation='relu'))
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Conv2D(filters=10,kernel_size=(5,5),activation='relu'))
model.add(MaxPooling2D())
model.add(Flatten())
model.add(Dense(100, activation='relu'))
model.add(Dense(8, activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam')
model = Sequential()
model.add(Conv2D(filters=4, input_shape=(32,32,1,),
kernel_size=(5,5), activation='relu'))
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Conv2D(filters=10,kernel_size=(5,5),activation='relu'))
model.add(MaxPooling2D())
model.add(Flatten())
model.add(Dense(100, activation='relu'))
model.add(Dense(8, activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam')
model = Sequential()
model.add(Conv2D(filters=4, input_shape=(32,32,1,),
kernel_size=(5,5), activation='relu'))
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Conv2D(filters=10,kernel_size=(5,5),activation='relu'))
model.add(MaxPooling2D())
model.add(Flatten())
model.add(Dense(100, activation='relu'))
model.add(Dense(8, activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam')
Including imports, the code for implementing the architecture is:
Sequential mode:
from keras.models import Sequential
from keras.layers import Conv2D, MaxPooling2D, Flatten, Dense
model = Sequential()
model.add(Conv2D(filters=4, input_shape=(32,32,1,),
kernel_size=(5,5), activation='relu'))
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Conv2D(filters=10,kernel_size=(5,5),activation='relu'))
model.add(MaxPooling2D())
model.add(Flatten())
model.add(Dense(100, activation='relu'))
model.add(Dense(8, activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam')
22
Functional mode:
from keras.models import Model
from keras.layers import Input, Conv2D, MaxPooling2D
from keras.layers import Flatten, Dense
inputs = Input(shape=(32,32,1))
x = Conv2D(filters=4, kernel_size=(5,5),
activation='relu')(inputs)
x = MaxPooling2D(pool_size=(2,2))(x)
x = Conv2D(filters=10, kernel_size=(5,5), activation='relu')(x)
x = MaxPooling2D()(x)
x = Flatten()(x)
x = Dense(100, activation='relu')(x)
x = Dense(8, activation=‘softmax')(x)
model = Model(inputs, x)
model.compile(loss='categorical_crossentropy', optimizer='adam')
23
Transfer Learning
One key problem: overfitting
Score (%) Model loss
Epoch Epoch 25
Strategies for avoiding overfitting
• Weights regularization
• Data augmentation
• Dropout
• Transfer Learning
• Batch normalization
26
Strategies for avoiding overfitting
- Data Augmentation
27
Strategies for avoiding overfitting
- Regularization
- The idea is to set to zero (or close to zero) the largest amount of
weights: “pruning”
- Usual mechanisms: introduction of regularization l1 or l2 in each
layer where pruning is desired. Adjust the regularization factor
(neither too large nor too small)
28
Strategies for avoiding overfitting
- Transfer Learning
29
Strategies for avoiding overfitting
- Transfer Learning with images
Example: VGG16
30
Strategies for avoiding overfitting
- Transfer Learning with images
Example: VGG16
!31
- Xception
- VGG16, VGG19
- ResNet, ResNetV2
- InceptionV3, InceptionResNetV2
- MobileNet, MobileNetV2
- DenseNet
- NASNet
- …
Many more are available on the internet
32
Strategies for avoiding overfitting
33
Creation of Deep Recurrent Neural
Networks (DRNNs) in Keras
Deep Recurrent Neural Networks (DRNNs)
○ They are used for time series prediction, speech recognition, language modeling,
translation, handwriting recognition, image captions, etc.
○ They learn “programs” automatically
○ Any issue can be processed sequentially
○ Most popular networks: LSTM and GRU
35
DRNNs in Keras
Many to one
36
DRNNs in Keras
Many to one, regression 1 neuron
Example:
The model takes the sales and number of 10 neurons
customers in the last 4 days and predicts
next day’s sales 0
(initial state)
Example:
The model takes the sales and number of 10 neurons
customers in the last 4 days and predicts
next day’s sales 0
(initial state)
Example:
The model takes the sales and number of 10 neurons
customers in the last 4 days and predicts
both the sales and number of clients
0
next day (initial state)
Example:
The model takes the sales and number of 10 neurons
customers in the last 4 days and predicts
both the sales and number of clients
0
next day (initial state)
Example:
The model predicts whether next day’s sales are
higher than average or not: 10 neurons
2 values
Sequential mode:
from keras.models import Sequential
from keras.layers import LSTM, Dense
model = Sequential()
model.add(LSTM(10, input_shape=(4,2)))
model.add(Dense(2, activation='softmax'))
model.compile(loss='categorical_crossentropy',optimizer='adam')
DRNNs in Keras
Many to one, classification 2 neurons
Example:
The model predicts whether next day’s sales are
higher than average or not: 10 neurons
2 values
Functional mode:
from keras.models import Model
from keras.layers import Input, LSTM, Dense
inputs = Input(shape=(4,2))
x = LSTM(10)(inputs)
x = Dense(2, activation='softmax')(x)
model = Model(inputs, x)
model.compile(loss='categorical_crossentropy',optimizer='adam')
DRNNs in Keras 2 neurons
Example:
The model predicts whether next day’s sales are 0
higher than average or not: (initial state)
2 values
Sequential mode:
from keras.models import Sequential
from keras.layers import LSTM, Dense
model = Sequential()
model.add(LSTM(10, input_shape=(4,2), return_sequences=True))
model.add(LSTM(20))
model.add(Dense(2, activation='softmax'))
model.compile(loss='categorical_crossentropy',optimizer='adam')
DRNNs in Keras 2 neurons
Example:
The model predicts whether next day’s sales are 0
higher than average or not: (initial state)
2 values
Functional mode:
from keras.models import Model
from keras.layers import Input, LSTM, Dense
inputs = Input(shape=(4,2))
x = LSTM(10, return_sequences=True)(inputs)
x = LSTM(20)(x)
x = Dense(2, activation='softmax')(x)
model = Model(inputs, x)
model.compile(loss='categorical_crossentropy',optimizer='adam')
DRNNs in Keras: NLP
NLP: Natural Language Processing
(start) i love cheesy horror flicks i don't care if the acting is sub par or
whether the monsters look corny i liked this movie except for the (oov)
feeling all the way from the beginning of the film to the very end look i
don't need a 10 page (oov) or a sign with big letters explaining a plot to me
but dark (oov) takes the what is this movie about thing to a whole new
annoying level what is this movie about br br this isn't exceptionally scary
or thrilling but if you have an hour and a half to kill and or you want to
end up feeling frustrated and confused rent this winner
[1, 13, 119, 954, 189, 1554, 13, 92, 459, 48, 4, 116, 9, 1492, 2291, 42, 726,
4, 1939, 168, 2031, 13, 423, 14, 20, 549, 18, 4, 2, 547, 32, 4, 96, 39, 4,
454, 7, 4, 22, 8, 4, 55, 130, 168, 13, 92, 359, 6, 158, 1511, 2, 42, 6, 1913,
19, 194, 4455, 4121, 6, 114, 8, 72, 21, 465, 2, 304, 4, 51, 9, 14, 20, 44,
155, 8, 6, 226, 162, 616, 651, 51, 9, 14, 20, 44, 10, 10, 14, 218, 4843, 629,
42, 3017, 21, 48, 25, 28, 35, 534, 5, 6, 320, 8, 516, 5, 42, 25, 181, 8, 130,
56, 547, 3571, 5, 1471, 851, 14, 2286]
DRNNs in Keras: NLP
Preprocessing:
Padding (match the length of all texts). For example, 500 words:
[1, 13, 119, 954, 189, 1554, 13, 92, 459, 48, 4, 116, 9, 1492, 2291, 42, 726,
4, 1939, 168, 2031, 13, 423, 14, 20, 549, 18, 4, 2, 547, 32, 4, 96, 39, 4,
454, 7, 4, 22, 8, 4, 55, 130, 168, 13, 92, 359, 6, 158, 1511, 2, 42, 6, 1913,
19, 194, 4455, 4121, 6, 114, 8, 72, 21, 465, 2, 304, 4, 51, 9, 14, 20, 44,
155, 8, 6, 226, 162, 616, 651, 51, 9, 14, 20, 44, 10, 10, 14, 218, 4843, 629,
42, 3017, 21, 48, 25, 28, 35, 534, 5, 6, 320, 8, 516, 5, 42, 25, 181, 8, 130,
56, 547, 3571, 5, 1471, 851, 14, 2286]
10 LSTM
neurons
Example:
Dataset with 20000 texts, vocabulary size=5000, 0
length of each text=500 words (initial state)
word id (0-4999)
Sequential mode:
from keras.models import Sequential
from keras.layers import LSTM, Dense, Embedding
model = Sequential()
model.add(Embedding(input_dim=5000, output_dim=100,
input_length=500))
model.add(LSTM(10))
model.add(Dense(2, activation='softmax'))
model.compile(loss='categorical_crossentropy',optimizer='adam')
DRNNs in Keras: NLP 2 neurons
10 LSTM
neurons
Example:
Dataset with 20000 texts, vocabulary size=5000, 0
length of each text=500 words (initial state)
word id (0-4999)
Functional mode:
from keras.models import Model
from keras.layers import Input, LSTM, Dense, Embedding
inputs = Input(shape=(500,))
x = Embedding(input_dim=5000, output_dim=100)(inputs)
x = LSTM(10)(x)
x = Dense(2, activation='softmax')(x)
model = Model(inputs, x)
model.compile(loss='categorical_crossentropy',optimizer='adam')
DRNNs in Keras: other types
0
(initial state)
0
(initial state)
Encoder - decoder
0
(initial state)
0
(initial state)
One to many
0
(initial state)
Strategies for avoiding overfitting
• Weights regularization
• Data augmentation
• Dropout
• Transfer Learning
54
Projects with heterogeneous inputs
Example: Model for valuing real estate
Dataset proposed in: https://arxiv.org/pdf/1609.08399.pdf
57
Option 1: model trained with only numerical data
Numerical inputs
Neural
network
Value estimation
58
Option 2: model trained with only front pictures
CNN
Value estimation 59
Option 3: model trained with “macro pictures”
CNN
Value estimation 60
Option 4 (I): we independently train 5 models:
Numerical data
61
Option 4 (II): we independently train 5 models:
And then we train a 6th model so that it learns to take into account the
estimates of the previous 5 models and generate a more accurate estimate:
Model 6
62
Final estimation
Option 4 (III): we independently train 5 models:
Cons: Each of the top 5 models has been trained independently. They have not
learned to “collaborate”
63
Option 5: train all models simultaneously
64
Implementation in Keras of a model with heterogeneous inputs
inputA
inputB
65
Image from https://www.pyimagesearch.com/2019/02/04/keras-
Implementation in Keras of a model with heterogeneous inputs
inputB = Input(shape=(32,))
x = Dense(8, activation="relu")(inputB)
x = Dense(4, activation="relu")(x)
x = Model(inputs=inputB, outputs=x)
inputA = Input(shape=(128,))
y = Dense(64, activation="relu")(inputA)
y = Dense(32, activation="relu")(y)
y = Dense(4, activation="relu")(y)
y = Model(inputs=inputA, outputs=y)
# Combination of x and y
combined = concatenate([x.output, y.output])
67
Material of interest
Chollet, F. (2017): “Deep Learning with Python”. Manning Publications
Keras (examples):
https://github.com/fchollet/deep-learning-with-python-notebooks
https://github.com/fchollet/keras/tree/master/examples