-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathconvolutions.html
178 lines (171 loc) · 9.85 KB
/
convolutions.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1">
<title>Discovering Convolutions - Benjamin Ho</title>
<link rel="stylesheet" href="style.css">
<link rel="preconnect" href="https://fonts.googleapis.com">
<link rel="preconnect" href="https://fonts.gstatic.com" crossorigin>
<link href="https://fonts.googleapis.com/css2?family=Ubuntu:wght@400;500;700&display=swap" rel="stylesheet">
<script src="https://kit.fontawesome.com/8644701977.js" crossorigin="anonymous"></script>
</head>
<body>
<!--- Header Section--->
<section class="header">
<nav>
<h1>Benjamin Ho</h1>
<div class="nav-links" id="navLinks">
<i class="fas fa-times" onclick="hideMenu()"></i>
<ul>
<li><a href="index.html">home</a></li>
<li><a href="about.html">about</a></li>
<li><a href="">now</a></li>
<li><a href="">blog</a></li>
</ul>
</div>
<i class="fas fa-bars" onclick="showMenu()"></i>
</nav>
</section>
<section class="blog_post">
<div class="title">
<!--About-->
<h1>Learning About Convolutions Using PyTorch</h1>
<br>
<p>
I've been playing around with CNNs for about a year now, taking open-source projects and trying to run the inference codes and doing some training. Despite this experience,
I never really understood what was happening in the convolutional layers.
<br><br>
For example, what happens when I change the kernel size, or stride length, or add additional padding to my convolutional layers?
In order to truly master the art of deep learning, I felt that I had to go back to basics and analyse the convolution operation in isolation.
<br><br>
This blog post documents my attempt at understanding the convolution operation a little more.
</p>
</div>
</section>
<section class="blog_post">
<div class="blog_section">
<h2>Experiment Setup</h2>
<p>
For this experiment, I will be playing around with the <a href="https://pytorch.org/docs/stable/generated/torch.nn.Conv2d.html">Conv2d</a> operation from PyTorch
<br><br>
I will be changing the following parameters and observing the difference it makes to the convolution output for a sample image (shown below):
<ul>
<li>kernel size</li>
<li>stride length</li>
</ul>
<br>
</p>
<p>
To ensure that I get consistent results, I initialize the kernel to a constant value of 0.1, with zero bias.
</p>
<img src="images/kitti_l_color.png">
<figcaption>Sample image taken from the KITTI driving dataset</figcaption>
</div>
</section>
<section class="blog_post">
<div class="blog_section">
<h2>Experiment 1: Varying Kernel Size</h2>
<p>
For the first experiment, I ran the test image through the Conv2d operation using kernel sizes of 1 to 10. The observation of the output obtained was
that as kernel size increased, the output image became more blurry. As can be seen in the GIF below, showing the output image change with an increase in kernel size.
<br><br>
<img src="gifs/conv_kernels.gif">
<figcaption>Convolution output by varying kernel size from 1 to 10. Small kernel size produces a detailed output, while a large kernel produces a blurry image</figcaption>
<br><br>
</p>
<p>
This makes sense if we look more closely at what the 2D convolution operation does.
<br><br>
Essentially, convolution consists of six steps:
<br><br>
<h3><u>Step 1: Positioning of kernel</u></h3>
<p>
For an input image (or array, or matrix) of X-number of columns by Y-number of rows, the kernel starts at the top-left most corner of the input.
</p>
<br>
<h3><u>Step 2: Element-wise Multiplication</u></h3>
<p>
In this position, for every element in the kernel, it is multiplied by the value of the element in the same position as it is in the input.
For a kernel of size 2 (2 rows and 2 columns), we would have top-left, top-right, bottom-left and bottom-right elements.
</p>
<br>
<h3><u>Step 3: Summation of Products</u></h3>
<p>
The product of each element-wise multiplication operation is added together.
</p>
<br>
<h3><u>Step 4: Striding</u></h3>
<p>
The kernel takes a stride (or a step) down the row, depending on the stride length. If the stride length is 2, then the kernel moves two pixels down.
</p>
<br>
<h3><u>Step 5: Rinse & Repeat</u></h3>
<p>
The entire process is repeated until the kernel reaches the end of the row, then it moves down the start at the next column, and so on.
</p>
<br>
<h3><u>Step 6: Arrange Output</u></h3>
<p>
After the kernel reaches the end of the input, all the summation outputs are arranged in the order they were produced, and based on which row in the original
input they were produced from, to form the final output
</p>
<br>
<p>
For a visual illustration, refer to the GIF below.
</p>
<img src="gifs/conv_gif.gif">
<figcaption>
Illustration of a convolution operation using a kernel with size 3 (3x3) over a 5x5 image. The small red values in the yellow boxes
represent the kernel values.
</figcaption>
<br>
</p>
<h3><u>Explanation</u></h3>
<p>
What this means is that each time the kernel works goes over a set of pixels, it aggregates the total value of all the pixels in its area of influence.
<br><br>
The larger the kernel, the more the fine-grained details in an image are absorbed into the aggregate, whereas a small kernel will retain each pixel's
relative strength over neighbouring pixels, thus preserving the details of the image.
<br><br>
As such, we can conclude that a small kernel size is useful for extracting detailed features, while a large kernel size is good for getting
the general structure of an object.
<br><br>
On the application side, the visualised results above also show us how convolutions using a larger kernel size present a use case for blurring effects on images.
</p>
</div>
</section>
<section class="blog_post">
<div class="blog_section">
<h2>Experiment 2: Varying Stride Length</h2>
<p>
For the second experiment, I kept the kernel size constant at 1. This time, I wanted to test the effect of changing the stride length on the convolution result.
For a stride length of 1 to 10, the convolution result is as shown in the GIF below.
<br><br>
From the GIF, we can see how increasing the stride length causes a severe drop in image resolution. It is also interesting to note that the image details become patchy
rather than smooth unlike when the kernel changes.
<br><br>
<img src="gifs/conv_stride.gif">
<figcaption>Convolution output by varying stride length from 1 to 10. Blurring effect is more prominent when changine stride length compared to changing kernel size</figcaption>
<br><br>
</p>
<h2>Explanation</h2>
<p>
Similar to increasing the kernel size, increasing stride length causes a loss in information from the input image. This is because the kernel skips the number of pixels equal to
the stride length (Step 4 of the convolution operation summary above). For a kernel size of only 1, as used in my experiments, this causes information in the skipped pixels to be lost
in the final output.
</p>
</div>
</section>
<!---Javascript for Toggle Menu--->
<script>
var navLinks = document.getElementById("navLinks");
function showMenu(){
navLinks.style.right = "0";
}
function hideMenu(){
navLinks.style.right = "-150px";
}
</script>
</body>
</html>