Skip to content

Commit cda8361

Browse files
committed
source commit: d24e048
0 parents  commit cda8361

File tree

89 files changed

+11028
-0
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

89 files changed

+11028
-0
lines changed

1-introduction.md

Lines changed: 535 additions & 0 deletions
Large diffs are not rendered by default.

2-keras.md

Lines changed: 904 additions & 0 deletions
Large diffs are not rendered by default.

3-monitor-the-model.md

Lines changed: 1029 additions & 0 deletions
Large diffs are not rendered by default.

4-advanced-layer-types.md

Lines changed: 1097 additions & 0 deletions
Large diffs are not rendered by default.

5-transfer-learning.md

Lines changed: 267 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,267 @@
1+
---
2+
title: "Transfer learning"
3+
teaching: 20
4+
exercises: 30
5+
---
6+
7+
::: questions
8+
- How do I apply a pre-trained model to my data?
9+
:::
10+
11+
::: objectives
12+
- Adapt a state-of-the-art pre-trained network to your own dataset
13+
:::
14+
15+
## What is transfer learning?
16+
Instead of training a model from scratch, with transfer learning you make use of models that are trained on another machine learning task. The pre-trained network captures generic knowledge during pre-training and will only be 'fine-tuned' to the specifics of your dataset.
17+
18+
An example: Let's say that you want to train a model to classify images of different dog breeds. You could make use of a pre-trained network that learned how to classify images of dogs and cats. The pre-trained network will not know anything about different dog breeds, but it will have captured some general knowledge of, on a high-level, what dogs look like, and on a low-level all the different features (eyes, ears, paws, fur) that make up an image of a dog. Further training this model on your dog breed dataset is a much easier task than training from scratch, because the model can use the general knowledge captured in the pre-trained network.
19+
20+
![](episodes/fig/05-transfer_learning.png)
21+
<!--
22+
Edit this plot using the Mermaid live editor:
23+
1. Open this link that includes the source code of the chart to open the live editor web interface:
24+
https://mermaid.live/edit#pako:eNpVkE1vgzAMhv9K5MPUSrQKAWUlh0kr9NZetp02drAgUCRIqhC0dZT_vizso_PJb_zYr-MRCl1KEFC1-q04orFk_5Ar4uL-ZZHpuic3JEXbkwwtLl_JanVHLk8GG0UOrrO9kO3CJ-QKXs4T0tGBqq-kIXuJRjWqnubK1s9JZ5F5I7I1Upb_fL7rqRe7a8g7LiGATpoOm9J9YPyCc7BH2ckchEtLWeHQ2hxyNTkUB6sfz6oAYc0gAzB6qI8gKmx7p4ZTiVZmDdYGu9_XE6pnrf-0LBurzWE-mb-cZ0CM8A5iRdfUBeObmEZJzKOEJRHnUQBnECwK15zRMGJxzNkmoXwK4MMPD30bpSHjt5SHSfyzzs7bzQtPn9Xpf_E
25+
2. Make changes to the chart as desired in the live editor
26+
3. Download the newly created diagram from the live editor (Actions / PNG) and replace the existing image in the episode folder (episodes/fig/05-transfer_learning.png)
27+
4. (optional) crop the image to remove the white space around the plot in a separate image editor
28+
5. Update the URL in step 1 of this comment to the new URL of the live editor
29+
-->
30+
31+
In this episode we will learn how use Keras to adapt a state-of-the-art pre-trained model to the [Dollar Street Dataset](https://zenodo.org/records/10970014).
32+
33+
34+
## 1. Formulate / Outline the problem
35+
36+
37+
Just like in the previous episode, we use the Dollar Street 10 dataset.
38+
39+
We load the data in the same way as the previous episode:
40+
```python
41+
import pathlib
42+
import numpy as np
43+
44+
DATA_FOLDER = pathlib.Path('data/dataset_dollarstreet/') # change to location where you stored the data
45+
train_images = np.load(DATA_FOLDER / 'train_images.npy')
46+
val_images = np.load(DATA_FOLDER / 'test_images.npy')
47+
train_labels = np.load(DATA_FOLDER / 'train_labels.npy')
48+
val_labels = np.load(DATA_FOLDER / 'test_labels.npy')
49+
```
50+
## 2. Identify inputs and outputs
51+
52+
As discussed in the previous episode, the input are images of dimension 64 x 64 pixels with 3 colour channels each.
53+
The goal is to predict one out of 10 classes to which the image belongs.
54+
55+
56+
## 3. Prepare the data
57+
We prepare the data as before, scaling the values between 0 and 1.
58+
```python
59+
train_images = train_images / 255.0
60+
val_images = val_images / 255.0
61+
```
62+
63+
## 4. Choose a pre-trained model or start building architecture from scratch
64+
Let's define our model input layer using the shape of our training images:
65+
```python
66+
# input tensor
67+
from tensorflow import keras
68+
inputs = keras.Input(train_images.shape[1:])
69+
```
70+
71+
Our images are 64 x 64 pixels, whereas the pre-trained model that we will use was
72+
trained on images of 160 x 160 pixels.
73+
To adapt our data accordingly, we add an upscale layer that resizes the images to 160 x 160 pixels during training and prediction.
74+
75+
```python
76+
# upscale layer
77+
import tensorflow as tf
78+
method = tf.image.ResizeMethod.BILINEAR
79+
upscale = keras.layers.Lambda(
80+
lambda x: tf.image.resize_with_pad(x, 160, 160, method=method))(inputs)
81+
```
82+
83+
From the `keras.applications` module we use the `DenseNet121` architecture.
84+
This architecture was proposed by the paper: [Densely Connected Convolutional Networks (CVPR 2017)](https://arxiv.org/abs/1608.06993). It is trained on the [Imagenet](https://www.image-net.org/) dataset, which contains 14,197,122 annotated images according to the WordNet hierarchy with over 20,000 classes.
85+
86+
We will have a look at the architecture later, for now it is enough to know
87+
that it is a convolutional neural network with 121 layers that was designed
88+
to work well on image classification tasks.
89+
90+
Let's configure the DenseNet121:
91+
```python
92+
base_model = keras.applications.DenseNet121(include_top=False,
93+
pooling='max',
94+
weights='imagenet',
95+
input_tensor=upscale,
96+
input_shape=(160,160,3),
97+
)
98+
```
99+
100+
::: callout
101+
## SSL: certificate verify failed error
102+
If you get the following error message: `certificate verify failed: unable to get local issuer certificate`,
103+
you can download [the weights of the model manually](https://storage.googleapis.com/tensorflow/keras-applications/densenet/densenet121_weights_tf_dim_ordering_tf_kernels_notop.h5)
104+
and then load in the weights from the downloaded file:
105+
106+
```python
107+
base_model = keras.applications.DenseNet121(
108+
include_top=False,
109+
pooling='max',
110+
weights='densenet121_weights_tf_dim_ordering_tf_kernels_notop.h5', # this should refer to the weights file you downloaded
111+
input_tensor=upscale,
112+
input_shape=(160,160,3),
113+
)
114+
```
115+
:::
116+
By setting `include_top` to `False` we exclude the fully connected layer at the
117+
top of the network, hence the final output layer. This layer was used to predict the Imagenet classes,
118+
but will be of no use for our Dollar Street dataset.
119+
Note that the 'top layer' appears at the bottom in the output of `model.summary()`.
120+
121+
We add `pooling='max'` so that max pooling is applied to the output of the DenseNet121 network.
122+
123+
By setting `weights='imagenet'` we use the weights that resulted from training
124+
this network on the Imagenet data.
125+
126+
We connect the network to the `upscale` layer that we defined before.
127+
128+
### Only train a 'head' network
129+
Instead of fine-tuning all the weights of the DenseNet121 network using our dataset,
130+
we choose to freeze all these weights and only train a so-called 'head network'
131+
that sits on top of the pre-trained network. You can see the DenseNet121 network
132+
as extracting a meaningful feature representation from our image. The head network
133+
will then be trained to decide on which of the 10 Dollar Street dataset classes the image belongs.
134+
135+
We will turn of the `trainable` property of the base model:
136+
```python
137+
base_model.trainable = False
138+
```
139+
140+
Let's define our 'head' network:
141+
```python
142+
out = base_model.output
143+
out = keras.layers.Flatten()(out)
144+
out = keras.layers.BatchNormalization()(out)
145+
out = keras.layers.Dense(50, activation='relu')(out)
146+
out = keras.layers.Dropout(0.5)(out)
147+
out = keras.layers.Dense(10)(out)
148+
```
149+
150+
Finally we define our model:
151+
```python
152+
model = keras.models.Model(inputs=inputs, outputs=out)
153+
```
154+
::: challenge
155+
## Inspect the DenseNet121 network
156+
Have a look at the network architecture with `model.summary()`.
157+
It is indeed a deep network, so expect a long summary!
158+
159+
### 1.Trainable parameters
160+
How many parameters are there? How many of them are trainable?
161+
162+
Why is this and how does it effect the time it takes to train the model?
163+
164+
### 2. Head and base
165+
Can you see in the model summary which part is the base network and which part is the head network?
166+
167+
### 3. Max pooling
168+
Which layer is added because we provided `pooling='max'` as argument for `DenseNet121()`?
169+
170+
:::: solution
171+
## Solutions
172+
### 1. Trainable parameters
173+
Total number of parameters: 7093360, out of which only 53808 are trainable.
174+
175+
The 53808 trainable parameters are the weights of the head network. All other parameters are 'frozen' because we set `base_model.trainable=False`. Because only a small proportion of the parameters have to be updated at each training step, this will greatly speed up training time.
176+
177+
### 2. Head and base
178+
The head network starts at the `flatten` layer, 5 layers before the final layer.
179+
180+
### 3. Max pooling
181+
The `max_pool` layer right before the `flatten` layer is added because we provided `pooling='max'`.
182+
::::
183+
:::
184+
185+
186+
187+
::: challenge
188+
## Training and evaluating the pre-trained model
189+
190+
### 1. Compile the model
191+
Compile the model:
192+
- Use the `adam` optimizer
193+
- Use the `SparseCategoricalCrossentropy` loss with `from_logits=True`.
194+
- Use 'accuracy' as a metric.
195+
196+
### 2. Train the model
197+
Train the model on the training dataset:
198+
- Use a batch size of 32
199+
- Train for 30 epochs, but use an earlystopper with a patience of 5
200+
- Pass the validation dataset as validation data so we can monitor performance on the validation data during training
201+
- Store the result of training in a variable called `history`
202+
- Training can take a while, it is a much larger model than what we have seen so far.
203+
204+
### 3. Inspect the results
205+
Plot the training history and evaluate the trained model. What do you think of the results?
206+
207+
### 4. (Optional) Try out other pre-trained neural networks
208+
Train and evaluate another pre-trained model from https://keras.io/api/applications/. How does it compare to DenseNet121?
209+
210+
211+
:::: solution
212+
## Solution
213+
214+
### 1. Compile the model
215+
```python
216+
model.compile(optimizer='adam',
217+
loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True),
218+
metrics=['accuracy'])
219+
```
220+
221+
### 2. Train the model
222+
Define the early stopper:
223+
```python
224+
early_stopper = keras.callbacks.EarlyStopping(monitor='val_accuracy',
225+
patience=5)
226+
```
227+
228+
Train the model:
229+
```python
230+
history = model.fit(x=train_images,
231+
y=train_labels,
232+
batch_size=32,
233+
epochs=30,
234+
callbacks=[early_stopper],
235+
validation_data=(val_images, val_labels))
236+
```
237+
238+
### 3. Inspect the results
239+
```python
240+
def plot_history(history, metrics):
241+
"""
242+
Plot the training history
243+
244+
Args:
245+
history (keras History object that is returned by model.fit())
246+
metrics(str, list): Metric or a list of metrics to plot
247+
"""
248+
history_df = pd.DataFrame.from_dict(history.history)
249+
sns.lineplot(data=history_df[metrics])
250+
plt.xlabel("epochs")
251+
plt.ylabel("metric")
252+
253+
plot_history(history, ['accuracy', 'val_accuracy'])
254+
```
255+
![](fig/05_training_history_transfer_learning.png){alt='Training history for training the pre-trained-model. The training accuracy slowly raises from 0.2 to 0.9 in 20 epochs. The validation accuracy starts higher at 0.25, but reaches a plateau around 0.64'}
256+
The final validation accuracy reaches 64%, this is a huge improvement over 30% accuracy we reached with the simple convolutional neural network that we build from scratch in the previous episode.
257+
258+
::::
259+
:::
260+
261+
## Concluding: The power of transfer learning
262+
In many domains, large networks are available that have been trained on vast amounts of data, such as in computer vision and natural language processing. Using transfer learning, you can benefit from the knowledge that was captured from another machine learning task. In many fields, transfer learning will outperform models trained from scratch, especially if your dataset is small or of poor quality.
263+
264+
::: keypoints
265+
- Large pre-trained models capture generic knowledge about a domain
266+
- Use the `keras.applications` module to easily use pre-trained models for your own datasets
267+
:::

0 commit comments

Comments
 (0)