diff --git a/README.md b/README.md
index f044c821..409b94f5 100644
--- a/README.md
+++ b/README.md
@@ -3,11 +3,265 @@ CUDA Denoiser For CUDA Path Tracer
 
 **University of Pennsylvania, CIS 565: GPU Programming and Architecture, Project 4**
 
-* (TODO) YOUR NAME HERE
-* Tested on: (TODO) Windows 22, i7-2222 @ 2.22GHz 22GB, GTX 222 222MB (Moore 2222 Lab)
+* Nick Moon
+  * [LinkedIn](https://www.linkedin.com/in/nick-moon1/), [personal website](https://nicholasmoon.github.io/)
+* Tested on: Windows 10, AMD Ryzen 9 5900HS @ 3.0GHz 32GB, NVIDIA RTX 3060 Laptop 6GB (Personal Laptop)
 
-### (TODO: Your README)
 
-*DO NOT* leave the README to the last minute! It is a crucial part of the
-project, and we will not be able to grade you without a good README.
+**This project is an implementation of the Edge-Avoiding À-Trous Wavelet Transform for Fast Global
+Illumination Filtering.
+This denoising algorithm uses a style of gaussian blurring to smooth noisy parts of 
+the render, while smartly detecting edges with G-Buffer values stored during path-tracing.
+This allows for segmented denoising that preserves object boundaries.**
 
+## RESULTS
+
+
+| Denoised 1 SPP   | Denoised 100 SPP | Denoised 1000 SPP |
+| ----------- | ----------- |  ----------- |
+| ![](img/results/render_denoised_1.PNG)      |   ![](img/results/render_denoised_100.PNG)     | ![](img/results/render_denoised_1000.PNG) |
+
+| Original 1 SPP   | Original 100 SPP | Original 1000 SPP |
+| ----------- | ----------- |  ----------- |
+| ![](img/results/render_1.PNG)      |   ![](img/results/render_100.PNG)     | ![](img/results/render_1000.PNG) |
+
+Adding denoising to these renders only incurred an additional constant 27ms of runtime, no
+matter how many path tracing iterations were used!
+
+
+## IMPLEMENTATION
+
+### Gaussian Blur and Filtering
+
+As a small introduction, the core of the denoising algorithm is based on filters/kernels. These
+are a collection of values that describe weighting around a center pixel ```p```. 
+So, for example, if you have a 5x5
+kernel ```k``` and are at pixel ```p``` then the middle element ```k[2][2]``` will be multiplied
+by the value at pixel p, and the result accumulated for each of the 25 pixels around ```p```.
+
+Below is an example of a kernel generated with the gaussian function (from Wikipedia):
+
+![](img/figures/gaussiankernel.PNG)
+
+Applying the kernel to every pixel in an image will result in blur (via Krita Art Application) like the image below:
+
+| Original   | Blurred |
+| ----------- | ----------- |
+| ![](img/results/iteration_1.PNG)      |   ![](img/figures/gaussian_blur.PNG)     |
+
+
+
+### À-Trous Wavelet Transform
+
+The À-Trous Wavelet Transform described in the paper is a filter similar to the gaussian kernel, 
+but optimized. Instead of having a kernel size that grows quadratically with the number of
+pixels desired to be sampled, the À-Trous Wavelet Transform instead reuses the same kernel,
+for example a 5x5 like used in this project, but performs multiple iterations of denoising
+using exponentially greater offsets between pixels sampled each time. This allows for a larger
+neighborhood of pixels to be sampled without significantly increasing the amount of computation
+required. An illustration of this is shown in the below figure:
+
+![](img/figures/kerneloffsets.PNG)
+
+Below is also a demonstration of the À-Trous Wavelet Transform applied to a noisy path-traced
+cornell box render, without the edge detection described in the next section:
+
+| Kernel Size 1 (1 iter)    | Kernel Size 4 (3 iter) | Kernel Size 16 (5 iter) | Kernel Size 64 (7 iter) |
+| ----------- | ----------- |  ----------- |  ----------- |
+| ![](img/results/no_edge_detection_filter1.PNG)      |   ![](img/results/no_edge_detection_filter3.PNG)     | ![](img/results/no_edge_detection_filter5.PNG) | ![](img/results/no_edge_detection_filter7.PNG) |
+
+As can be seen, this looks very similar to the pure gaussian blur from the previous section.
+It just blurs the entire screen, and it would be rare to describe it as "denoising".
+
+Specifically, the offset between pixels for each iteration ```i``` of the kernel is ```2^i```.
+
+In order to implement this blurring operation, I needed the blur filter, which was 5x5,
+a filter offset array, which was also 5x5, and two buffers of vec3s that stored the color
+information between blur processes and were eventually written to the openGL PBO to be rendered.
+I needed to buffers because I needed to ping-pong between them between iterations of the
+denoising kernel.
+
+### Edge Detection
+
+#### G-Buffer
+
+The edge detection process uses the positions and normals at the intersection of the 
+camera rays associated with each pixel. So, in order to have this information available to us
+to use in post process, we need to create a new geometry buffer (G-Buffer) to store relevant 
+information at each pixel.
+
+Below you can see a visualization of the data collected in this G-Buffer for a simple scene:
+
+| Position Buffer      | Normal Buffer | Depth Buffer |
+| ----------- | ----------- | ----------- |
+| ![](img/results/pos_buffer.PNG)     | ![](img/results/nor_buffer.PNG)       | ![](img/results/depth_buffer.PNG) |
+
+#### Edge Detecting with Weights
+
+Edge detection is perfomed by using the source path-traced image (i.e. per-pixel color information),
+per-pixel intersection world space positions, and per-pixel intersection world space normals.
+At a certain pixel ```p```, the squared distance between ```p's``` position, normals, and 
+color information and one of ```p's``` neighbors (what index into the filter the process is in)
+is calculated. Then weighting terms for these three components (color, position, and normal)
+are calculated using an exponential function. The three weights are multiplied together
+to get a combined weight for this pixel comparison. The combined weight is then
+multiplied by the filter value and the offset pixel value to get the contribution at that offset.
+The weight is also accumulated by multiplying it by the filter value and adding it to a
+variable keeping track of the sum of weights. At the end, the accumulated color is divided by
+the accumulated weights to yield the final pixel color at ```p```, now denoised.
+
+While solving for the weight values for position, normal, and color, bias values are also included
+that are parameterizable by the user. This allow the artist to increase and decrease the scale of
+these individual components. Increasing the color bias value causes a greater amount of blur.
+Increasing the normal bias causes more smoothing along object boundaries where the per-pixel
+normal values have large change. Increasing the position bias causes more smoothing along
+object boundaries where one object is in front of another. Increasing the normal and position
+biases, along with the color bias, will cause the edge detection to fail (at least for the
+test scene), as the normal and position values have almost no impact now on the amount of blurring
+going on between objects.
+
+## Visual Analysis
+
+
+### Filter Size
+Below is a visual comparison of different filter sizes with edge detection. The number of
+sample-per-pixel is only 20, with a very high color weight.
+
+| Kernel Offset = 1     | Kernel Offset = 2 | Kernel Offset = 4 |
+| ----------- | ----------- |  ----------- |
+| ![](img/results/kernel_size_1_iter_1.PNG)      |   ![](img/results/kernel_size_2_iter_1.PNG)     | ![](img/results/kernel_size_3_iter_1.PNG) |
+
+| Kernel Offset = 8     | Kernel Offset = 16 | Kernel Offset = 32 |
+| ----------- | ----------- |  ----------- |
+| ![](img/results/kernel_size_4_iter_1.PNG)      |   ![](img/results/kernel_size_5_iter_1.PNG)     | ![](img/results/kernel_size_6_iter_1.PNG) |
+
+| Kernel Offset = 64     | Kernel Offset = 128 | Kernel Offset = 256 |
+| ----------- | ----------- |  ----------- |
+| ![](img/results/kernel_size_7_iter_1.PNG)      |   ![](img/results/kernel_size_8_iter_1.PNG)     | ![](img/results/kernel_size_9_iter_1.PNG) |
+
+I would not say that the visual quality scales uniformly with scale. At around kernel offset
+of 16 is about where the visable changes slow down significantly, so much so that I cannot
+really make it out with my eyes. Additionally, changing the kernel offset from 1 to 2 does not
+really make a large visual impact, but offsets 4, 8, and 16 have large changes. This is due to a mixture of
+the large magnitude of the noise in the source image, our eyes perception of that noise and
+the change in it from iteration to iteration, and also because the color weight of the denoising
+algorithm is cut in half for each denoising kernel call, which the authors used to help keep small
+scale detail. 
+
+### Different Material Types
+
+| 1 Iteration Not Denoised     | 1 Iteration Denoised | 5000 Iterations Not Denoised |
+| ----------- | ----------- |  ----------- |
+| ![](img/results/matcom_1iter.PNG)      |   ![](img/results/matcomp_denoised.PNG)     | ![](img/results/matcomp_5000iter.PNG) |
+
+As can be seen from the comparison above, the denoising algorithm actually struggles a bit with specular
+materials. While the edge detection handles the edges of the specular material quite well, the
+reflection on the surface appears rough and more like a microfacet material. This also does not
+go completely away until hundreds of iterations of path tracing are generated. Unlike the
+specular surface, the diffuse surface already scatters light in all directions randomly, so
+the smudging and blurring is MUCH less apparant. At one iteration and denoised the sphere
+almost looks good enough to consider converged.
+
+### Different Scenes
+
+|  | 1 Iteration  | 100 Iterations |
+| ----------- | ----------- | ----------- |
+| Smaller Ceiling Light     |   ![](img/results/cornell_1iter_denoised.PNG)     | ![](img/results/cornell_100iter_denoised.PNG) |
+| Larger Ceiling Light     |   ![](img/results/iteration_1_denoised.PNG)     | ![](img/results/iteration_100_denoised.PNG) |
+
+As can be seen above, the denoising algorithm struggles a lot more with the smaller light scene
+than the larger light scene. This is because the smaller light scene will naturally sample the
+light less times than the one with a larger light, because it is much more likely to hit the
+larger light while sending a ray in a random direction. This of course impacts the denoising,
+because at lower path-tracing iterations, this will mean more pixels are black, and also
+there will be much more variance between pixels. Both of these are bad for blurring, because
+blurring requires there to be at least some minimum amount of useful information to use
+without looking splotchy (almost like low iteration photon mapping).
+
+## Performance Analysis
+
+### Convergence
+Below shows renders of different samples-per-pixel before and after denoising:
+
+| 1 Iteration     | 5 Iterations  | 10 Iterations |
+| ----------- | ----------- |  ----------- |
+| Original     | Original  | Original |
+| ![](img/results/iteration_1.PNG)      |   ![](img/results/iteration_5.PNG)     | ![](img/results/iteration_10.PNG) |
+| Denoised     | Denoised  | Denoised |
+| ![](img/results/iteration_1_denoised.PNG)      |   ![](img/results/iteration_5_denoised.PNG)     | ![](img/results/iteration_10_denoised.PNG) |
+
+| 50 Iteration     | 100 Iterations  | 500 Iterations |
+| ----------- | ----------- |  ----------- |
+| Original     | Original  | Original |
+| ![](img/results/iteration_50.PNG)      |   ![](img/results/iteration_100.PNG)     | ![](img/results/iteration_500.PNG) |
+| Denoised     | Denoised  | Denoised |
+| ![](img/results/iteration_50_denoised.PNG)      |   ![](img/results/iteration_100_denoised.PNG)     | ![](img/results/iteration_500_denoised.PNG) |
+
+| 1000 Iteration     | 5000 Iterations  |
+| ----------- | ----------- |
+| Original     | Original  |
+| ![](img/results/iteration_1000.PNG)      |   ![](img/results/iteration_5000.PNG)     |
+| Denoised     | Denoised  |
+| ![](img/results/iteration_1000_denoised.PNG)      |   ![](img/results/iteration_5000_denoised.PNG)     |
+
+
+I would say that, for denoising, iteration 500 is about where I would say the results are "acceptably smooth".
+And what I mean by that, is that by iteration 500 the image not only looks like the background colors
+have smooth outed to a near converged look, but also that the specular sphere no longer looks smudged.
+In comparison, I think that between 1000-5000 iterations is where the none denoised render
+looks "acceptably smooth". Anything before that has the apparant path-tracing noise pattern.
+Here are the diff images for 500 iterations with and without denoising in comparison to
+the 5000 iteration result (with no denoising):
+
+| | Original     | Denoised  |
+| ----------- | ----------- | ----------- |
+| Render | ![](img/results/iteration_500.PNG)      |   ![](img/results/iteration_500_denoised.PNG)     |
+| Diff from 5000 iter | ![](img/results/diff_500.PNG)      |   ![](img/results/diff_500_denoised.PNG)     |
+
+### Varying Filter Size
+
+![](img/figures/denoise_runtime_vs_pt_iter.png)
+
+As can be seen from the graph above, the amount of time taken for the denoising algorithm
+with varying number of path tracing iterations is the same; this is because the denoising
+algorithm is only influenced by the resolution of the image to denoise and the size of the 
+convolution filter. In addition, each increase in filter size only results in a constant amount
+of additional runtime, about equal to the size of filter size 1. This is because the number of
+pixels we are sampling for each additional iteration of the denoising kernel is the same,
+as a result of the A Trous Wavelet. This means that this algorithm scales very well regardless
+of number of samples taken.
+
+### Path Tracing vs Denoising
+
+![](img/figures/pathtracing_v_denoising.png)
+
+As can be seen from the figure above, and having shown already that the denoising algorithm
+runtime is independent of the number of path tracing iterations and indeed constant at a
+certain filter size, the percentage of time taken to do the denoising operation vs. the actual
+path tracing decreases exponentially, and the dropoff is fast. This is because each path tracing
+iteration takes about 7 seconds to run, about the same time as the denoising algorithm. So, each
+additional iteration of path tracing cuts the percentage of time taken to do the denoising
+be around ```1 / iter + 1```.
+
+### Render Resolution
+
+![](img/figures/resolution.png)
+
+As can be seen from the figure above, the runtime of the denoising algorithm increases about
+quadratically with increased resolution (where width and height are the same). This is what we
+expect. Although the kernel size is constant for all of these data points, the number of pixels
+the GPU needs to run the code on increases quadratically as well.
+
+## References
+
+Edge-Avoiding A-Trous Wavelet Transform for fast Global Illumination Filtering:
+
+Paper: https://jo.dreggn.org/home/2010_atrous.pdf
+
+Presentation: https://www.highperformancegraphics.org/previous/www_2010/media/RayTracing_I/HPG2010_RayTracing_I_Dammertz.pdf
+
+Wikipedia Gaussian Blur: https://en.wikipedia.org/wiki/Gaussian_blur
+
+Filter used in the paper: https://www.eso.org/sci/software/esomidas/doc/user/18NOV/volb/node317.html
+
+Gaussian blur from Krita Application: https://krita.org/en/
\ No newline at end of file
diff --git a/img/figures/denoise_runtime_vs_pt_iter.png b/img/figures/denoise_runtime_vs_pt_iter.png
new file mode 100644
index 00000000..a3536514
Binary files /dev/null and b/img/figures/denoise_runtime_vs_pt_iter.png differ
diff --git a/img/figures/gaussian_blur.PNG b/img/figures/gaussian_blur.PNG
new file mode 100644
index 00000000..4dc7374b
Binary files /dev/null and b/img/figures/gaussian_blur.PNG differ
diff --git a/img/figures/gaussiankernel.PNG b/img/figures/gaussiankernel.PNG
new file mode 100644
index 00000000..ecce8bc0
Binary files /dev/null and b/img/figures/gaussiankernel.PNG differ
diff --git a/img/figures/kerneloffsets.PNG b/img/figures/kerneloffsets.PNG
new file mode 100644
index 00000000..aee82070
Binary files /dev/null and b/img/figures/kerneloffsets.PNG differ
diff --git a/img/figures/pathtracing_v_denoising.png b/img/figures/pathtracing_v_denoising.png
new file mode 100644
index 00000000..c1910102
Binary files /dev/null and b/img/figures/pathtracing_v_denoising.png differ
diff --git a/img/figures/resolution.png b/img/figures/resolution.png
new file mode 100644
index 00000000..e81cbf05
Binary files /dev/null and b/img/figures/resolution.png differ
diff --git a/img/results/ceiling_light.PNG b/img/results/ceiling_light.PNG
new file mode 100644
index 00000000..c47f0a9f
Binary files /dev/null and b/img/results/ceiling_light.PNG differ
diff --git a/img/results/cornell_100iter_denoised.PNG b/img/results/cornell_100iter_denoised.PNG
new file mode 100644
index 00000000..a46acdc0
Binary files /dev/null and b/img/results/cornell_100iter_denoised.PNG differ
diff --git a/img/results/cornell_1iter_denoised.PNG b/img/results/cornell_1iter_denoised.PNG
new file mode 100644
index 00000000..c46eec4f
Binary files /dev/null and b/img/results/cornell_1iter_denoised.PNG differ
diff --git a/img/results/depth_buffer.PNG b/img/results/depth_buffer.PNG
new file mode 100644
index 00000000..d5dc34fa
Binary files /dev/null and b/img/results/depth_buffer.PNG differ
diff --git a/img/results/diff_1000_denoised.PNG b/img/results/diff_1000_denoised.PNG
new file mode 100644
index 00000000..b239367a
Binary files /dev/null and b/img/results/diff_1000_denoised.PNG differ
diff --git a/img/results/diff_500.PNG b/img/results/diff_500.PNG
new file mode 100644
index 00000000..b57cb99a
Binary files /dev/null and b/img/results/diff_500.PNG differ
diff --git a/img/results/diff_500_denoised.PNG b/img/results/diff_500_denoised.PNG
new file mode 100644
index 00000000..dfbf879f
Binary files /dev/null and b/img/results/diff_500_denoised.PNG differ
diff --git a/img/results/iteration_1.PNG b/img/results/iteration_1.PNG
new file mode 100644
index 00000000..70437556
Binary files /dev/null and b/img/results/iteration_1.PNG differ
diff --git a/img/results/iteration_10.PNG b/img/results/iteration_10.PNG
new file mode 100644
index 00000000..fa35e631
Binary files /dev/null and b/img/results/iteration_10.PNG differ
diff --git a/img/results/iteration_100.PNG b/img/results/iteration_100.PNG
new file mode 100644
index 00000000..b0184b5f
Binary files /dev/null and b/img/results/iteration_100.PNG differ
diff --git a/img/results/iteration_1000.PNG b/img/results/iteration_1000.PNG
new file mode 100644
index 00000000..4cd2fd95
Binary files /dev/null and b/img/results/iteration_1000.PNG differ
diff --git a/img/results/iteration_1000_denoised.PNG b/img/results/iteration_1000_denoised.PNG
new file mode 100644
index 00000000..0dfe63ee
Binary files /dev/null and b/img/results/iteration_1000_denoised.PNG differ
diff --git a/img/results/iteration_100_denoised.PNG b/img/results/iteration_100_denoised.PNG
new file mode 100644
index 00000000..3b63df90
Binary files /dev/null and b/img/results/iteration_100_denoised.PNG differ
diff --git a/img/results/iteration_10_denoised.PNG b/img/results/iteration_10_denoised.PNG
new file mode 100644
index 00000000..ff02db13
Binary files /dev/null and b/img/results/iteration_10_denoised.PNG differ
diff --git a/img/results/iteration_1_denoised.PNG b/img/results/iteration_1_denoised.PNG
new file mode 100644
index 00000000..168b9d3c
Binary files /dev/null and b/img/results/iteration_1_denoised.PNG differ
diff --git a/img/results/iteration_5.PNG b/img/results/iteration_5.PNG
new file mode 100644
index 00000000..f188122c
Binary files /dev/null and b/img/results/iteration_5.PNG differ
diff --git a/img/results/iteration_50.PNG b/img/results/iteration_50.PNG
new file mode 100644
index 00000000..ee3d5076
Binary files /dev/null and b/img/results/iteration_50.PNG differ
diff --git a/img/results/iteration_500.PNG b/img/results/iteration_500.PNG
new file mode 100644
index 00000000..a4fea651
Binary files /dev/null and b/img/results/iteration_500.PNG differ
diff --git a/img/results/iteration_5000.PNG b/img/results/iteration_5000.PNG
new file mode 100644
index 00000000..49151218
Binary files /dev/null and b/img/results/iteration_5000.PNG differ
diff --git a/img/results/iteration_5000_denoised.PNG b/img/results/iteration_5000_denoised.PNG
new file mode 100644
index 00000000..0bfe99f8
Binary files /dev/null and b/img/results/iteration_5000_denoised.PNG differ
diff --git a/img/results/iteration_500_denoised.PNG b/img/results/iteration_500_denoised.PNG
new file mode 100644
index 00000000..1ade5705
Binary files /dev/null and b/img/results/iteration_500_denoised.PNG differ
diff --git a/img/results/iteration_50_denoised.PNG b/img/results/iteration_50_denoised.PNG
new file mode 100644
index 00000000..5f431f98
Binary files /dev/null and b/img/results/iteration_50_denoised.PNG differ
diff --git a/img/results/iteration_5_denoised.PNG b/img/results/iteration_5_denoised.PNG
new file mode 100644
index 00000000..85feefaa
Binary files /dev/null and b/img/results/iteration_5_denoised.PNG differ
diff --git a/img/results/kernel_size_10_iter_20.PNG b/img/results/kernel_size_10_iter_20.PNG
new file mode 100644
index 00000000..105d39ee
Binary files /dev/null and b/img/results/kernel_size_10_iter_20.PNG differ
diff --git a/img/results/kernel_size_1_iter_1.PNG b/img/results/kernel_size_1_iter_1.PNG
new file mode 100644
index 00000000..9490b1e2
Binary files /dev/null and b/img/results/kernel_size_1_iter_1.PNG differ
diff --git a/img/results/kernel_size_1_iter_20.PNG b/img/results/kernel_size_1_iter_20.PNG
new file mode 100644
index 00000000..414fda94
Binary files /dev/null and b/img/results/kernel_size_1_iter_20.PNG differ
diff --git a/img/results/kernel_size_2_iter_1.PNG b/img/results/kernel_size_2_iter_1.PNG
new file mode 100644
index 00000000..f27c5d1a
Binary files /dev/null and b/img/results/kernel_size_2_iter_1.PNG differ
diff --git a/img/results/kernel_size_2_iter_20.PNG b/img/results/kernel_size_2_iter_20.PNG
new file mode 100644
index 00000000..d2cb21a6
Binary files /dev/null and b/img/results/kernel_size_2_iter_20.PNG differ
diff --git a/img/results/kernel_size_3_iter_1.PNG b/img/results/kernel_size_3_iter_1.PNG
new file mode 100644
index 00000000..c1d78b0c
Binary files /dev/null and b/img/results/kernel_size_3_iter_1.PNG differ
diff --git a/img/results/kernel_size_3_iter_20.PNG b/img/results/kernel_size_3_iter_20.PNG
new file mode 100644
index 00000000..ef3cb04d
Binary files /dev/null and b/img/results/kernel_size_3_iter_20.PNG differ
diff --git a/img/results/kernel_size_4_iter_1.PNG b/img/results/kernel_size_4_iter_1.PNG
new file mode 100644
index 00000000..3cef2719
Binary files /dev/null and b/img/results/kernel_size_4_iter_1.PNG differ
diff --git a/img/results/kernel_size_4_iter_20.PNG b/img/results/kernel_size_4_iter_20.PNG
new file mode 100644
index 00000000..07c80904
Binary files /dev/null and b/img/results/kernel_size_4_iter_20.PNG differ
diff --git a/img/results/kernel_size_5_iter_1.PNG b/img/results/kernel_size_5_iter_1.PNG
new file mode 100644
index 00000000..b510903b
Binary files /dev/null and b/img/results/kernel_size_5_iter_1.PNG differ
diff --git a/img/results/kernel_size_5_iter_20.PNG b/img/results/kernel_size_5_iter_20.PNG
new file mode 100644
index 00000000..a76a948b
Binary files /dev/null and b/img/results/kernel_size_5_iter_20.PNG differ
diff --git a/img/results/kernel_size_6_iter_1.PNG b/img/results/kernel_size_6_iter_1.PNG
new file mode 100644
index 00000000..0f7b29fe
Binary files /dev/null and b/img/results/kernel_size_6_iter_1.PNG differ
diff --git a/img/results/kernel_size_6_iter_20.PNG b/img/results/kernel_size_6_iter_20.PNG
new file mode 100644
index 00000000..a47e5983
Binary files /dev/null and b/img/results/kernel_size_6_iter_20.PNG differ
diff --git a/img/results/kernel_size_7_iter_1.PNG b/img/results/kernel_size_7_iter_1.PNG
new file mode 100644
index 00000000..d22e8fea
Binary files /dev/null and b/img/results/kernel_size_7_iter_1.PNG differ
diff --git a/img/results/kernel_size_7_iter_20.PNG b/img/results/kernel_size_7_iter_20.PNG
new file mode 100644
index 00000000..50d361e2
Binary files /dev/null and b/img/results/kernel_size_7_iter_20.PNG differ
diff --git a/img/results/kernel_size_8_iter_1.PNG b/img/results/kernel_size_8_iter_1.PNG
new file mode 100644
index 00000000..2ae699e6
Binary files /dev/null and b/img/results/kernel_size_8_iter_1.PNG differ
diff --git a/img/results/kernel_size_8_iter_20.PNG b/img/results/kernel_size_8_iter_20.PNG
new file mode 100644
index 00000000..d102b278
Binary files /dev/null and b/img/results/kernel_size_8_iter_20.PNG differ
diff --git a/img/results/kernel_size_9_iter_1.PNG b/img/results/kernel_size_9_iter_1.PNG
new file mode 100644
index 00000000..22af5013
Binary files /dev/null and b/img/results/kernel_size_9_iter_1.PNG differ
diff --git a/img/results/kernel_size_9_iter_20.PNG b/img/results/kernel_size_9_iter_20.PNG
new file mode 100644
index 00000000..f82ae4a2
Binary files /dev/null and b/img/results/kernel_size_9_iter_20.PNG differ
diff --git a/img/results/matcom_1iter.PNG b/img/results/matcom_1iter.PNG
new file mode 100644
index 00000000..a382d1bb
Binary files /dev/null and b/img/results/matcom_1iter.PNG differ
diff --git a/img/results/matcomp_5000iter.PNG b/img/results/matcomp_5000iter.PNG
new file mode 100644
index 00000000..677e73d6
Binary files /dev/null and b/img/results/matcomp_5000iter.PNG differ
diff --git a/img/results/matcomp_denoised.PNG b/img/results/matcomp_denoised.PNG
new file mode 100644
index 00000000..7989ddce
Binary files /dev/null and b/img/results/matcomp_denoised.PNG differ
diff --git a/img/results/no_edge_detection_filter1.PNG b/img/results/no_edge_detection_filter1.PNG
new file mode 100644
index 00000000..2c1a10df
Binary files /dev/null and b/img/results/no_edge_detection_filter1.PNG differ
diff --git a/img/results/no_edge_detection_filter3.PNG b/img/results/no_edge_detection_filter3.PNG
new file mode 100644
index 00000000..74200615
Binary files /dev/null and b/img/results/no_edge_detection_filter3.PNG differ
diff --git a/img/results/no_edge_detection_filter5.PNG b/img/results/no_edge_detection_filter5.PNG
new file mode 100644
index 00000000..27bddeb7
Binary files /dev/null and b/img/results/no_edge_detection_filter5.PNG differ
diff --git a/img/results/no_edge_detection_filter7.PNG b/img/results/no_edge_detection_filter7.PNG
new file mode 100644
index 00000000..c8c4d47d
Binary files /dev/null and b/img/results/no_edge_detection_filter7.PNG differ
diff --git a/img/results/nor_buffer.PNG b/img/results/nor_buffer.PNG
new file mode 100644
index 00000000..4a5624e6
Binary files /dev/null and b/img/results/nor_buffer.PNG differ
diff --git a/img/results/plain_blur_i1.PNG b/img/results/plain_blur_i1.PNG
new file mode 100644
index 00000000..4019d3f2
Binary files /dev/null and b/img/results/plain_blur_i1.PNG differ
diff --git a/img/results/plain_blur_i10.PNG b/img/results/plain_blur_i10.PNG
new file mode 100644
index 00000000..6b8224f4
Binary files /dev/null and b/img/results/plain_blur_i10.PNG differ
diff --git a/img/results/plain_blur_i2.PNG b/img/results/plain_blur_i2.PNG
new file mode 100644
index 00000000..12e10b30
Binary files /dev/null and b/img/results/plain_blur_i2.PNG differ
diff --git a/img/results/plain_blur_i3.PNG b/img/results/plain_blur_i3.PNG
new file mode 100644
index 00000000..f8489112
Binary files /dev/null and b/img/results/plain_blur_i3.PNG differ
diff --git a/img/results/plain_blur_i4.PNG b/img/results/plain_blur_i4.PNG
new file mode 100644
index 00000000..2fce8302
Binary files /dev/null and b/img/results/plain_blur_i4.PNG differ
diff --git a/img/results/plain_blur_i5.PNG b/img/results/plain_blur_i5.PNG
new file mode 100644
index 00000000..3abbf06a
Binary files /dev/null and b/img/results/plain_blur_i5.PNG differ
diff --git a/img/results/plain_blur_i6.PNG b/img/results/plain_blur_i6.PNG
new file mode 100644
index 00000000..634e9f96
Binary files /dev/null and b/img/results/plain_blur_i6.PNG differ
diff --git a/img/results/plain_blur_i7.PNG b/img/results/plain_blur_i7.PNG
new file mode 100644
index 00000000..93259cf4
Binary files /dev/null and b/img/results/plain_blur_i7.PNG differ
diff --git a/img/results/plain_blur_i8.PNG b/img/results/plain_blur_i8.PNG
new file mode 100644
index 00000000..e27ccc80
Binary files /dev/null and b/img/results/plain_blur_i8.PNG differ
diff --git a/img/results/plain_blur_i9.PNG b/img/results/plain_blur_i9.PNG
new file mode 100644
index 00000000..5fe5c30e
Binary files /dev/null and b/img/results/plain_blur_i9.PNG differ
diff --git a/img/results/pos_buffer.PNG b/img/results/pos_buffer.PNG
new file mode 100644
index 00000000..3f99ac62
Binary files /dev/null and b/img/results/pos_buffer.PNG differ
diff --git a/img/results/render_1.PNG b/img/results/render_1.PNG
new file mode 100644
index 00000000..ae2a1945
Binary files /dev/null and b/img/results/render_1.PNG differ
diff --git a/img/results/render_100.PNG b/img/results/render_100.PNG
new file mode 100644
index 00000000..d9df5606
Binary files /dev/null and b/img/results/render_100.PNG differ
diff --git a/img/results/render_1000.PNG b/img/results/render_1000.PNG
new file mode 100644
index 00000000..3e83bf69
Binary files /dev/null and b/img/results/render_1000.PNG differ
diff --git a/img/results/render_denoised_1.PNG b/img/results/render_denoised_1.PNG
new file mode 100644
index 00000000..3a8ffe98
Binary files /dev/null and b/img/results/render_denoised_1.PNG differ
diff --git a/img/results/render_denoised_100.PNG b/img/results/render_denoised_100.PNG
new file mode 100644
index 00000000..73d6782b
Binary files /dev/null and b/img/results/render_denoised_100.PNG differ
diff --git a/img/results/render_denoised_1000.PNG b/img/results/render_denoised_1000.PNG
new file mode 100644
index 00000000..94bfec82
Binary files /dev/null and b/img/results/render_denoised_1000.PNG differ
diff --git a/scenes/cornell.txt b/scenes/cornell.txt
index 83ff8202..097cc68d 100644
--- a/scenes/cornell.txt
+++ b/scenes/cornell.txt
@@ -51,11 +51,11 @@ EMITTANCE   0
 // Camera
 CAMERA
 RES         800 800
-FOVY        45
-ITERATIONS  5000
+FOVY        19.5
+ITERATIONS  100
 DEPTH       8
 FILE        cornell
-EYE         0.0 5 10.5
+EYE         0.0 5 19.0
 LOOKAT      0 5 0
 UP          0 1 0
 
diff --git a/scenes/cornell_ceiling_light.txt b/scenes/cornell_ceiling_light.txt
index 15af5f19..bcdbe242 100644
--- a/scenes/cornell_ceiling_light.txt
+++ b/scenes/cornell_ceiling_light.txt
@@ -51,11 +51,11 @@ EMITTANCE   0
 // Camera
 CAMERA
 RES         800 800
-FOVY        45
-ITERATIONS  10
+FOVY        19.5
+ITERATIONS  5000
 DEPTH       8
 FILE        cornell
-EYE         0.0 5 10.5
+EYE         0.0 5 19.0
 LOOKAT      0 5 0
 UP          0 1 0
 
diff --git a/scenes/custom_scene.txt b/scenes/custom_scene.txt
new file mode 100644
index 00000000..cc7c74ad
--- /dev/null
+++ b/scenes/custom_scene.txt
@@ -0,0 +1,183 @@
+// Emissive material (light)
+MATERIAL 0
+RGB         1 1 1
+SPECEX      0
+SPECRGB     0 0 0
+REFL        0
+REFR        0
+REFRIOR     0
+EMITTANCE   5
+
+// Diffuse white
+MATERIAL 1
+RGB         .85 .55 .35
+SPECEX      0
+SPECRGB     0 0 0
+REFL        0
+REFR        0
+REFRIOR     0
+EMITTANCE   0
+
+// Diffuse red
+MATERIAL 2
+RGB         .98 .98 .98
+SPECEX      0
+SPECRGB     0 0 0
+REFL        0
+REFR        0
+REFRIOR     0
+EMITTANCE   0
+
+// Diffuse green
+MATERIAL 3
+RGB         .98 .98 .98
+SPECEX      0
+SPECRGB     0 0 0
+REFL        0
+REFR        0
+REFRIOR     0
+EMITTANCE   0
+
+// Specular white
+MATERIAL 4
+RGB         .98 .98 .98
+SPECEX      0
+SPECRGB     .98 .98 .98
+REFL        1
+REFR        0
+REFRIOR     0
+EMITTANCE   0
+
+// Specular white
+MATERIAL 5
+RGB         .73 .81 .98
+SPECEX      0
+SPECRGB     0 0 0
+REFL        0
+REFR        0
+REFRIOR     0
+EMITTANCE   0
+
+// Camera
+CAMERA
+RES         1920 1080
+FOVY        19.5
+ITERATIONS  1000
+DEPTH       8
+FILE        cornell
+EYE         0.0 5 19.0
+LOOKAT      0 5 0
+UP          0 1 0
+
+
+// Ceiling light
+OBJECT 0
+cube
+material 0
+TRANS       6.5 7 -3
+ROTAT       35 24 73
+SCALE       2 2 2
+
+// Floor
+OBJECT 1
+cube
+material 1
+TRANS       0 0 0
+ROTAT       0 0 0
+SCALE       20 .01 10
+
+// Ceiling
+OBJECT 2
+cube
+material 1
+TRANS       0 10 0
+ROTAT       0 0 90
+SCALE       .01 20 10
+
+// Back wall
+OBJECT 3
+cube
+material 1
+TRANS       0 5 -5
+ROTAT       0 90 0
+SCALE       .01 10 20
+
+// Left wall
+OBJECT 4
+cube
+material 2
+TRANS       -10 5 0
+ROTAT       0 0 0
+SCALE       .01 10 10
+
+// Right wall
+OBJECT 5
+cube
+material 3
+TRANS       10 5 0
+ROTAT       0 0 0
+SCALE       .01 10 10
+
+// Sphere
+OBJECT 6
+sphere
+material 4
+TRANS       -3 4 1
+ROTAT       0 0 0
+SCALE       3 3 3
+
+// Sphere
+OBJECT 7
+sphere
+material 5
+TRANS       7 2 -1
+ROTAT       0 0 0
+SCALE       4 4 4
+
+// Sphere
+OBJECT 8
+sphere
+material 0
+TRANS       -6 0 -3
+ROTAT       0 0 0
+SCALE       3 3 3
+
+// Sphere
+OBJECT 9
+cube
+material 4
+TRANS       -9 2 3
+ROTAT       0 0 0
+SCALE       1 4 1
+
+// Ceiling light
+OBJECT 10
+cube
+material 0
+TRANS       3 0 2
+ROTAT       0 0 0
+SCALE       3 1 3
+
+// Ceiling light
+OBJECT 11
+sphere
+material 0
+TRANS       0 10 0
+ROTAT       0 0 0
+SCALE       5 1 5
+
+// Ceiling light
+OBJECT 12
+cube
+material 4
+TRANS       3 2 2
+ROTAT       45 45 0
+SCALE       2 2 2
+
+// Sphere
+OBJECT 13
+sphere
+material 5
+TRANS       -2 0 3
+ROTAT       0 0 0
+SCALE       6 6 6
\ No newline at end of file
diff --git a/src/main.cpp b/src/main.cpp
index 4092ae4a..f7dbb803 100644
--- a/src/main.cpp
+++ b/src/main.cpp
@@ -24,10 +24,14 @@ int startupIterations = 0;
 int lastLoopIterations = 0;
 bool ui_showGbuffer = false;
 bool ui_denoise = false;
-int ui_filterSize = 80;
-float ui_colorWeight = 0.45f;
-float ui_normalWeight = 0.35f;
-float ui_positionWeight = 0.2f;
+int ui_filterIterations= 10;
+int lastFilterIterations = 10;
+float ui_colorWeight = 150.0f;
+float lastColorWeight = 150.0f;
+float ui_normalWeight = 0.061f;
+float lastNormalWeight = 0.061f;
+float ui_positionWeight = 3.049f;
+float lastPositionWeight = 3.049f;
 bool ui_saveAndExit = false;
 
 static bool camchanged = true;
@@ -45,6 +49,10 @@ int iteration;
 int width;
 int height;
 
+bool denoise = true;
+float time_taken_pathtrace = 0.0f;
+float time_taken_denoise = 0.0f;
+
 //-------------------------------
 //-------------MAIN--------------
 //-------------------------------
@@ -126,6 +134,23 @@ void runCuda() {
       camchanged = true;
     }
 
+    if (lastColorWeight != ui_colorWeight) {
+        lastColorWeight = ui_colorWeight;
+        denoise = true;
+    }
+    if (lastNormalWeight != ui_normalWeight) {
+        lastNormalWeight = ui_normalWeight;
+        denoise = true;
+    }
+    if (lastPositionWeight != ui_positionWeight) {
+        lastPositionWeight = ui_positionWeight;
+        denoise = true;
+    }
+    if (lastFilterIterations != ui_filterIterations) {
+        lastFilterIterations = ui_filterIterations;
+        denoise = true;
+    }
+
     if (camchanged) {
         iteration = 0;
         Camera &cam = renderState->camera;
@@ -144,6 +169,8 @@ void runCuda() {
         cameraPosition += cam.lookAt;
         cam.position = cameraPosition;
         camchanged = false;
+        denoise = true;
+        time_taken_pathtrace = 0.0f;
       }
 
     // Map OpenGL buffer object for writing from CUDA on a single GPU
@@ -154,6 +181,7 @@ void runCuda() {
         pathtraceInit(scene);
     }
 
+    
     uchar4 *pbo_dptr = NULL;
     cudaGLMapBufferObject((void**)&pbo_dptr, pbo);
 
@@ -162,12 +190,35 @@ void runCuda() {
 
         // execute the kernel
         int frame = 0;
+        PerformanceTimer perf_timer;
+        perf_timer.startGpuTimer();
         pathtrace(frame, iteration);
+        perf_timer.endGpuTimer();
+        time_taken_pathtrace += perf_timer.getGpuElapsedTimeForPreviousOperation();
+
     }
 
     if (ui_showGbuffer) {
       showGBuffer(pbo_dptr);
-    } else {
+    }
+    else if (iteration == ui_iterations) {
+        
+        PerformanceTimer perf_timer;
+        perf_timer.startGpuTimer();
+        DenoiseParams denoise_params{ denoise, ui_positionWeight, ui_normalWeight, ui_colorWeight, ui_filterIterations };
+        denoiseAndShowImage(pbo_dptr, iteration, denoise_params);
+        perf_timer.endGpuTimer();
+        if (denoise == true) {
+            std::cout << time_taken_pathtrace << std::endl;
+            time_taken_denoise = perf_timer.getGpuElapsedTimeForPreviousOperation();
+            std::cout <<  time_taken_denoise << std::endl;
+        }
+        
+        //showImage(pbo_dptr, iteration);
+        denoise = false;
+        //iteration++;
+    }
+    else {
       showImage(pbo_dptr, iteration);
     }
 
diff --git a/src/main.h b/src/main.h
index 06d311a8..dd0c9d9e 100644
--- a/src/main.h
+++ b/src/main.h
@@ -36,7 +36,7 @@ extern int ui_iterations;
 extern int startupIterations;
 extern bool ui_showGbuffer;
 extern bool ui_denoise;
-extern int ui_filterSize;
+extern int ui_filterIterations;
 extern float ui_colorWeight;
 extern float ui_normalWeight;
 extern float ui_positionWeight;
diff --git a/src/pathtrace.cu b/src/pathtrace.cu
index 23e5f909..6fbb18ad 100644
--- a/src/pathtrace.cu
+++ b/src/pathtrace.cu
@@ -16,6 +16,12 @@
 
 #define ERRORCHECK 1
 
+//#define DISPLAY_POSITION
+//#define DISPLAY_NORMAL
+//#define DISPLAY_DEPTH
+
+#define WAVELET
+
 #define FILENAME (strrchr(__FILE__, '/') ? strrchr(__FILE__, '/') + 1 : __FILE__)
 #define checkCUDAError(msg) checkCUDAErrorFn(msg, FILENAME, __LINE__)
 void checkCUDAErrorFn(const char *msg, const char *file, int line) {
@@ -73,17 +79,33 @@ __global__ void gbufferToPBO(uchar4* pbo, glm::ivec2 resolution, GBufferPixel* g
 
     if (x < resolution.x && y < resolution.y) {
         int index = x + (y * resolution.x);
-        float timeToIntersect = gBuffer[index].t * 256.0;
+        //float timeToIntersect = gBuffer[index].t * 256.0;
 
         pbo[index].w = 0;
-        pbo[index].x = timeToIntersect;
-        pbo[index].y = timeToIntersect;
-        pbo[index].z = timeToIntersect;
+#ifdef DISPLAY_NORMAL
+        pbo[index].x = ((gBuffer[index].nor.x + 1.0f) * 0.5f) * 255.0f;
+        pbo[index].y = ((gBuffer[index].nor.y + 1.0f) * 0.5f) * 255.0f;
+        pbo[index].z = ((gBuffer[index].nor.z + 1.0f) * 0.5f) * 255.0f;
+#endif
+
+#ifdef DISPLAY_POSITION
+        pbo[index].x = glm::abs(gBuffer[index].pos.x) * 0.1f * 255.0f;
+        pbo[index].y = glm::abs(gBuffer[index].pos.y) * 0.1f * 255.0f;
+        pbo[index].z = glm::abs(gBuffer[index].pos.z) * 0.1f * 255.0f;
+#endif
+
+#ifdef DISPLAY_DEPTH
+        pbo[index].x = glm::abs(gBuffer[index].t) * 0.1f * 255.0f;
+        pbo[index].y = glm::abs(gBuffer[index].t) * 0.1f * 255.0f;
+        pbo[index].z = glm::abs(gBuffer[index].t) * 0.1f * 255.0f;
+#endif
     }
 }
 
 static Scene * hst_scene = NULL;
 static glm::vec3 * dev_image = NULL;
+static glm::vec3* dev_image_denoised_a = NULL;
+static glm::vec3* dev_image_denoised_b = NULL;
 static Geom * dev_geoms = NULL;
 static Material * dev_materials = NULL;
 static PathSegment * dev_paths = NULL;
@@ -92,6 +114,25 @@ static GBufferPixel* dev_gBuffer = NULL;
 // TODO: static variables for device memory, any extra info you need, etc
 // ...
 
+// values from
+// https://www.eso.org/sci/software/esomidas/doc/user/18NOV/volb/node317.html
+
+const float h_kernel[25] = {    1.0f / 256.0f, 1.0f / 64.0f, 3.0f / 128.0f, 1.0f / 64.0f, 1.0f / 256.0f,
+                                1.0 / 64.0f, 1.0f / 16.0f, 3.0 / 32.0f, 1.0f / 16.0f, 1.0f / 64.0f,
+                                3.0f / 128.0f, 3.0f / 32.0f, 9.0f / 64.0f, 3.0f / 32.0f, 3.0f / 128.0f,
+                                1.0 / 64.0f, 1.0f / 16.0f, 3.0 / 32.0f, 1.0f / 16.0f, 1.0f / 64.0f,
+                                1.0f / 256.0f, 1.0f / 64.0f, 3.0f / 128.0f, 1.0f / 64.0f, 1.0f / 256.0f};
+
+static float* dev_h_kernel = NULL;
+
+const glm::ivec2 h_kernel_offsets[25] = {   glm::ivec2(-2,-2), glm::ivec2(-1,-2), glm::ivec2(0,-2), glm::ivec2(1,-2), glm::ivec2(2,-2),
+                                            glm::ivec2(-2,-1), glm::ivec2(-1,-1), glm::ivec2(0,-1), glm::ivec2(1,-1), glm::ivec2(2,-1),
+                                            glm::ivec2(-2,0), glm::ivec2(-1,0), glm::ivec2(0,0), glm::ivec2(1,0), glm::ivec2(2,0),
+                                            glm::ivec2(-2,1), glm::ivec2(-1,1), glm::ivec2(0,1), glm::ivec2(1,1), glm::ivec2(2,1),
+                                            glm::ivec2(-2,2), glm::ivec2(-1,2), glm::ivec2(0,2), glm::ivec2(1,2), glm::ivec2(2,2) };
+
+static glm::ivec2* dev_h_kernel_offsets = NULL;
+
 void pathtraceInit(Scene *scene) {
     hst_scene = scene;
     const Camera &cam = hst_scene->state.camera;
@@ -100,6 +141,12 @@ void pathtraceInit(Scene *scene) {
     cudaMalloc(&dev_image, pixelcount * sizeof(glm::vec3));
     cudaMemset(dev_image, 0, pixelcount * sizeof(glm::vec3));
 
+    cudaMalloc(&dev_image_denoised_a, pixelcount * sizeof(glm::vec3));
+    cudaMemset(dev_image_denoised_a, 0, pixelcount * sizeof(glm::vec3));
+
+    cudaMalloc(&dev_image_denoised_b, pixelcount * sizeof(glm::vec3));
+    cudaMemset(dev_image_denoised_b, 0, pixelcount * sizeof(glm::vec3));
+
   	cudaMalloc(&dev_paths, pixelcount * sizeof(PathSegment));
 
   	cudaMalloc(&dev_geoms, scene->geoms.size() * sizeof(Geom));
@@ -113,6 +160,12 @@ void pathtraceInit(Scene *scene) {
 
     cudaMalloc(&dev_gBuffer, pixelcount * sizeof(GBufferPixel));
 
+    cudaMalloc(&dev_h_kernel, 25 * sizeof(float));
+    cudaMemcpy(dev_h_kernel, h_kernel, 25 * sizeof(float), cudaMemcpyHostToDevice);
+
+    cudaMalloc(&dev_h_kernel_offsets, 25 * sizeof(glm::ivec2));
+    cudaMemcpy(dev_h_kernel_offsets, h_kernel_offsets, 25 * sizeof(glm::ivec2), cudaMemcpyHostToDevice);
+
     // TODO: initialize any extra device memeory you need
 
     checkCUDAError("pathtraceInit");
@@ -120,11 +173,15 @@ void pathtraceInit(Scene *scene) {
 
 void pathtraceFree() {
     cudaFree(dev_image);  // no-op if dev_image is null
+    cudaFree(dev_image_denoised_a);
+    cudaFree(dev_image_denoised_b);
   	cudaFree(dev_paths);
   	cudaFree(dev_geoms);
   	cudaFree(dev_materials);
   	cudaFree(dev_intersections);
     cudaFree(dev_gBuffer);
+    cudaFree(dev_h_kernel);
+    cudaFree(dev_h_kernel_offsets);
     // TODO: clean up any extra device memory you created
 
     checkCUDAError("pathtraceFree");
@@ -282,6 +339,8 @@ __global__ void generateGBuffer (
   if (idx < num_paths)
   {
     gBuffer[idx].t = shadeableIntersections[idx].t;
+    gBuffer[idx].nor = shadeableIntersections[idx].surfaceNormal;
+    gBuffer[idx].pos = pathSegments[idx].ray.origin + pathSegments[idx].ray.direction * shadeableIntersections[idx].t;
   }
 }
 
@@ -417,6 +476,8 @@ void showGBuffer(uchar4* pbo) {
             (cam.resolution.x + blockSize2d.x - 1) / blockSize2d.x,
             (cam.resolution.y + blockSize2d.y - 1) / blockSize2d.y);
 
+
+
     // CHECKITOUT: process the gbuffer results and send them to OpenGL buffer for visualization
     gbufferToPBO<<<blocksPerGrid2d, blockSize2d>>>(pbo, cam.resolution, dev_gBuffer);
 }
@@ -431,3 +492,153 @@ const Camera &cam = hst_scene->state.camera;
     // Send results to OpenGL buffer for rendering
     sendImageToPBO<<<blocksPerGrid2d, blockSize2d>>>(pbo, cam.resolution, iter, dev_image);
 }
+
+#ifdef WAVELET
+
+// implements the a trous (ratatouille) filter
+__global__ void denoiseRatatouille(glm::ivec2 resolution, int iter, float* h_kernel, glm::ivec2* h_kernel_offsets, int iter_offset, float sigma_p_sq, float sigma_n_sq, float sigma_rt_sq, GBufferPixel* gBuffer, glm::vec3* image, glm::vec3* image_denoised) {
+    int x = (blockIdx.x * blockDim.x) + threadIdx.x;
+    int y = (blockIdx.y * blockDim.y) + threadIdx.y;
+
+    if (x < resolution.x && y < resolution.y) {
+        glm::vec3 color_accum = glm::vec3(0, 0, 0);
+        int x_image;
+        int y_image;
+        int kernel_index;
+        glm::ivec2 kernel_offset;
+        float kernel_val;
+        
+        glm::vec3 input_color = image[x + (y * resolution.x)];
+        GBufferPixel input_gBuffer = gBuffer[x + (y * resolution.x)];
+        glm::vec3 buffer_diff;
+        float squared_distance;
+        float w_rt;
+        float w_n;
+        float w_p;
+        float total_weight;
+        float w_accum = 0.0f;
+
+
+        for (int i = 0; i < 25; ++i) {
+            kernel_offset = h_kernel_offsets[i] * iter_offset;
+            kernel_val = h_kernel[i];
+
+            x_image = glm::clamp(x + kernel_offset.x, 0, resolution.x - 1);
+            y_image = glm::clamp(y + kernel_offset.y, 0, resolution.y - 1);
+
+            buffer_diff = input_gBuffer.pos - gBuffer[x_image + (y_image * resolution.x)].pos;
+            squared_distance = glm::dot(buffer_diff, buffer_diff);
+            w_p = glm::min(glm::exp(-(squared_distance) / sigma_p_sq), 1.0f);
+
+            buffer_diff = input_gBuffer.nor - gBuffer[x_image + (y_image * resolution.x)].nor;
+            squared_distance = glm::max(glm::dot(buffer_diff, buffer_diff) / ((float)(iter_offset * iter_offset)), 0.0f);
+            w_n = glm::min(glm::exp(-(squared_distance) / sigma_n_sq), 1.0f);
+
+            buffer_diff = input_color - image[x_image + (y_image * resolution.x)];
+            squared_distance = glm::dot(buffer_diff, buffer_diff);
+            w_rt = glm::min(glm::exp(-(squared_distance) / sigma_rt_sq), 1.0f);
+
+            
+
+            total_weight = w_p * w_n * w_rt;
+
+            color_accum += total_weight * kernel_val * image[x_image + (y_image * resolution.x)];
+            w_accum += total_weight * kernel_val;
+        }
+        
+        image_denoised[x + (y * resolution.x)] = color_accum / w_accum;
+    }
+}
+
+
+void denoiseAndShowImage(uchar4* pbo, int iter, DenoiseParams denoise_params) {
+    if (denoise_params.denoise) {
+        const Camera& cam = hst_scene->state.camera;
+        const dim3 blockSize2d(8, 8);
+        const dim3 blocksPerGrid2d(
+            (cam.resolution.x + blockSize2d.x - 1) / blockSize2d.x,
+            (cam.resolution.y + blockSize2d.y - 1) / blockSize2d.y);
+
+        
+        
+        cudaMemcpy(dev_image_denoised_a, dev_image, cam.resolution.x * cam.resolution.y * sizeof(glm::vec3), cudaMemcpyDeviceToDevice);
+        
+        int iter_offset = 1;
+        
+        for (int i = 0; i < denoise_params.denoise_iterations; ++i) {
+            denoiseRatatouille << <blocksPerGrid2d, blockSize2d >> > (cam.resolution, iter, dev_h_kernel, dev_h_kernel_offsets, iter_offset, denoise_params.sigma_p * denoise_params.sigma_p, denoise_params.sigma_n * denoise_params.sigma_n, denoise_params.sigma_rt * denoise_params.sigma_rt, dev_gBuffer, dev_image_denoised_a, dev_image_denoised_b);
+            if (i < denoise_params.denoise_iterations - 1) {
+                glm::vec3* temp = dev_image_denoised_b;
+                dev_image_denoised_b = dev_image_denoised_a;
+                dev_image_denoised_a = temp;
+            }
+            iter_offset *= 2;
+            denoise_params.sigma_rt /= (float)iter_offset;
+        }
+
+
+        // Send results to OpenGL buffer for rendering
+        sendImageToPBO << <blocksPerGrid2d, blockSize2d >> > (pbo, cam.resolution, iter, dev_image_denoised_b);
+        
+    }
+    
+}
+
+#else
+// implements the a trous (ratatouille) filter
+__global__ void denoiseRatatouille(glm::ivec2 resolution, int iter, float* h_kernel, glm::ivec2* h_kernel_offsets, int iter_offset, GBufferPixel* gBuffer, glm::vec3* image, glm::vec3* image_denoised) {
+    int x = (blockIdx.x * blockDim.x) + threadIdx.x;
+    int y = (blockIdx.y * blockDim.y) + threadIdx.y;
+
+    if (x < resolution.x && y < resolution.y) {
+        glm::vec3 color_accum = glm::vec3(0, 0, 0);
+        int x_image;
+        int y_image;
+        int kernel_index;
+        glm::ivec2 kernel_offset;
+#pragma unroll
+        for (int i = 0; i < 25; ++i) {
+            kernel_offset = h_kernel_offsets[i] * iter_offset;
+
+            x_image = glm::clamp(x + kernel_offset.x, 0, resolution.x - 1);
+            y_image = glm::clamp(y + kernel_offset.y, 0, resolution.y - 1);
+            color_accum += h_kernel[i] * image[x_image + (y_image * resolution.x)];
+        }
+
+        image_denoised[x + (y * resolution.x)] = color_accum;
+    }
+}
+
+void denoiseAndShowImage(uchar4* pbo, int iter, DenoiseParams denoise_params) {
+    if (denoise_params.denoise) {
+        const Camera& cam = hst_scene->state.camera;
+        const dim3 blockSize2d(8, 8);
+        const dim3 blocksPerGrid2d(
+            (cam.resolution.x + blockSize2d.x - 1) / blockSize2d.x,
+            (cam.resolution.y + blockSize2d.y - 1) / blockSize2d.y);
+
+        PerformanceTimer perf_timer;
+        perf_timer.startGpuTimer();
+        cudaMemcpy(dev_image_denoised_a, dev_image, cam.resolution.x * cam.resolution.y * sizeof(glm::vec3), cudaMemcpyDeviceToDevice);
+
+        int iter_offset = 1;
+        for (int i = 0; i < denoise_params.denoise_iterations; ++i) {
+            denoiseRatatouille << <blocksPerGrid2d, blockSize2d >> > (cam.resolution, iter, dev_h_kernel, dev_h_kernel_offsets, iter_offset, dev_gBuffer, dev_image_denoised_a, dev_image_denoised_b);
+            if (i < denoise_params.denoise_iterations - 1) {
+                glm::vec3* temp = dev_image_denoised_b;
+                dev_image_denoised_b = dev_image_denoised_a;
+                dev_image_denoised_a = temp;
+            }
+            iter_offset *= 2;
+
+        }
+
+
+        // Send results to OpenGL buffer for rendering
+        sendImageToPBO << <blocksPerGrid2d, blockSize2d >> > (pbo, cam.resolution, iter, dev_image_denoised_b);
+        perf_timer.endGpuTimer();
+        std::cout << perf_timer.getGpuElapsedTimeForPreviousOperation() << std::endl;
+    }
+
+}
+#endif
diff --git a/src/pathtrace.h b/src/pathtrace.h
index 9e12f440..5ca547d3 100644
--- a/src/pathtrace.h
+++ b/src/pathtrace.h
@@ -2,9 +2,112 @@
 
 #include <vector>
 #include "scene.h"
+#include <chrono>
+
+struct DenoiseParams {
+	bool denoise;
+	float sigma_p;
+	float sigma_n;
+	float sigma_rt;
+	int denoise_iterations;
+};
 
 void pathtraceInit(Scene *scene);
 void pathtraceFree();
 void pathtrace(int frame, int iteration);
 void showGBuffer(uchar4 *pbo);
 void showImage(uchar4 *pbo, int iter);
+void denoiseAndShowImage(uchar4* pbo, int iter, DenoiseParams denoise_params);
+
+/**
+        * This class is used for timing the performance
+        * Uncopyable and unmovable
+        *
+        * Adapted from WindyDarian(https://github.com/WindyDarian)
+        */
+
+class PerformanceTimer
+{
+public:
+    PerformanceTimer()
+    {
+        cudaEventCreate(&event_start);
+        cudaEventCreate(&event_end);
+    }
+
+    ~PerformanceTimer()
+    {
+        cudaEventDestroy(event_start);
+        cudaEventDestroy(event_end);
+    }
+
+    void startCpuTimer()
+    {
+        if (cpu_timer_started) { throw std::runtime_error("CPU timer already started"); }
+        cpu_timer_started = true;
+
+        time_start_cpu = std::chrono::high_resolution_clock::now();
+    }
+
+    void endCpuTimer()
+    {
+        time_end_cpu = std::chrono::high_resolution_clock::now();
+
+        if (!cpu_timer_started) { throw std::runtime_error("CPU timer not started"); }
+
+        std::chrono::duration<double, std::milli> duro = time_end_cpu - time_start_cpu;
+        prev_elapsed_time_cpu_milliseconds =
+            static_cast<decltype(prev_elapsed_time_cpu_milliseconds)>(duro.count());
+
+        cpu_timer_started = false;
+    }
+
+    void startGpuTimer()
+    {
+        if (gpu_timer_started) { throw std::runtime_error("GPU timer already started"); }
+        gpu_timer_started = true;
+
+        cudaEventRecord(event_start);
+    }
+
+    void endGpuTimer()
+    {
+        cudaEventRecord(event_end);
+        cudaEventSynchronize(event_end);
+
+        if (!gpu_timer_started) { throw std::runtime_error("GPU timer not started"); }
+
+        cudaEventElapsedTime(&prev_elapsed_time_gpu_milliseconds, event_start, event_end);
+        gpu_timer_started = false;
+    }
+
+    float getCpuElapsedTimeForPreviousOperation() //noexcept //(damn I need VS 2015
+    {
+        return prev_elapsed_time_cpu_milliseconds;
+    }
+
+    float getGpuElapsedTimeForPreviousOperation() //noexcept
+    {
+        return prev_elapsed_time_gpu_milliseconds;
+    }
+
+    // remove copy and move functions
+    PerformanceTimer(const PerformanceTimer&) = delete;
+    PerformanceTimer(PerformanceTimer&&) = delete;
+    PerformanceTimer& operator=(const PerformanceTimer&) = delete;
+    PerformanceTimer& operator=(PerformanceTimer&&) = delete;
+
+private:
+    cudaEvent_t event_start = nullptr;
+    cudaEvent_t event_end = nullptr;
+
+    using time_point_t = std::chrono::high_resolution_clock::time_point;
+    time_point_t time_start_cpu;
+    time_point_t time_end_cpu;
+
+    bool cpu_timer_started = false;
+    bool gpu_timer_started = false;
+
+    float prev_elapsed_time_cpu_milliseconds = 0.f;
+    float prev_elapsed_time_gpu_milliseconds = 0.f;
+};
diff --git a/src/preview.cpp b/src/preview.cpp
index 3ca27180..713c0041 100644
--- a/src/preview.cpp
+++ b/src/preview.cpp
@@ -214,10 +214,10 @@ void drawGui(int windowWidth, int windowHeight) {
 
     ImGui::Checkbox("Denoise", &ui_denoise);
 
-    ImGui::SliderInt("Filter Size", &ui_filterSize, 0, 100);
-    ImGui::SliderFloat("Color Weight", &ui_colorWeight, 0.0f, 10.0f);
-    ImGui::SliderFloat("Normal Weight", &ui_normalWeight, 0.0f, 10.0f);
-    ImGui::SliderFloat("Position Weight", &ui_positionWeight, 0.0f, 10.0f);
+    ImGui::SliderInt("Filter Iterations", &ui_filterIterations, 1, 10);
+    ImGui::SliderFloat("Color Weight", &ui_colorWeight, 0.0f, 150.0f);
+    ImGui::SliderFloat("Normal Weight", &ui_normalWeight, 0.0f, 15.0f);
+    ImGui::SliderFloat("Position Weight", &ui_positionWeight, 0.0f, 15.0f);
 
     ImGui::Separator();
 
diff --git a/src/sceneStructs.h b/src/sceneStructs.h
index da7e558a..2b3e574a 100644
--- a/src/sceneStructs.h
+++ b/src/sceneStructs.h
@@ -79,4 +79,6 @@ struct ShadeableIntersection {
 // What information might be helpful for guiding a denoising filter?
 struct GBufferPixel {
   float t;
+  glm::vec3 pos;
+  glm::vec3 nor;
 };