diff --git a/README.md b/README.md index f044c821..409b94f5 100644 --- a/README.md +++ b/README.md @@ -3,11 +3,265 @@ CUDA Denoiser For CUDA Path Tracer **University of Pennsylvania, CIS 565: GPU Programming and Architecture, Project 4** -* (TODO) YOUR NAME HERE -* Tested on: (TODO) Windows 22, i7-2222 @ 2.22GHz 22GB, GTX 222 222MB (Moore 2222 Lab) +* Nick Moon + * [LinkedIn](https://www.linkedin.com/in/nick-moon1/), [personal website](https://nicholasmoon.github.io/) +* Tested on: Windows 10, AMD Ryzen 9 5900HS @ 3.0GHz 32GB, NVIDIA RTX 3060 Laptop 6GB (Personal Laptop) -### (TODO: Your README) -*DO NOT* leave the README to the last minute! It is a crucial part of the -project, and we will not be able to grade you without a good README. +**This project is an implementation of the Edge-Avoiding À-Trous Wavelet Transform for Fast Global +Illumination Filtering. +This denoising algorithm uses a style of gaussian blurring to smooth noisy parts of +the render, while smartly detecting edges with G-Buffer values stored during path-tracing. +This allows for segmented denoising that preserves object boundaries.** +## RESULTS + + +| Denoised 1 SPP | Denoised 100 SPP | Denoised 1000 SPP | +| ----------- | ----------- | ----------- | +| ![](img/results/render_denoised_1.PNG) | ![](img/results/render_denoised_100.PNG) | ![](img/results/render_denoised_1000.PNG) | + +| Original 1 SPP | Original 100 SPP | Original 1000 SPP | +| ----------- | ----------- | ----------- | +| ![](img/results/render_1.PNG) | ![](img/results/render_100.PNG) | ![](img/results/render_1000.PNG) | + +Adding denoising to these renders only incurred an additional constant 27ms of runtime, no +matter how many path tracing iterations were used! + + +## IMPLEMENTATION + +### Gaussian Blur and Filtering + +As a small introduction, the core of the denoising algorithm is based on filters/kernels. These +are a collection of values that describe weighting around a center pixel ```p```. +So, for example, if you have a 5x5 +kernel ```k``` and are at pixel ```p``` then the middle element ```k[2][2]``` will be multiplied +by the value at pixel p, and the result accumulated for each of the 25 pixels around ```p```. + +Below is an example of a kernel generated with the gaussian function (from Wikipedia): + +![](img/figures/gaussiankernel.PNG) + +Applying the kernel to every pixel in an image will result in blur (via Krita Art Application) like the image below: + +| Original | Blurred | +| ----------- | ----------- | +| ![](img/results/iteration_1.PNG) | ![](img/figures/gaussian_blur.PNG) | + + + +### À-Trous Wavelet Transform + +The À-Trous Wavelet Transform described in the paper is a filter similar to the gaussian kernel, +but optimized. Instead of having a kernel size that grows quadratically with the number of +pixels desired to be sampled, the À-Trous Wavelet Transform instead reuses the same kernel, +for example a 5x5 like used in this project, but performs multiple iterations of denoising +using exponentially greater offsets between pixels sampled each time. This allows for a larger +neighborhood of pixels to be sampled without significantly increasing the amount of computation +required. An illustration of this is shown in the below figure: + +![](img/figures/kerneloffsets.PNG) + +Below is also a demonstration of the À-Trous Wavelet Transform applied to a noisy path-traced +cornell box render, without the edge detection described in the next section: + +| Kernel Size 1 (1 iter) | Kernel Size 4 (3 iter) | Kernel Size 16 (5 iter) | Kernel Size 64 (7 iter) | +| ----------- | ----------- | ----------- | ----------- | +| ![](img/results/no_edge_detection_filter1.PNG) | ![](img/results/no_edge_detection_filter3.PNG) | ![](img/results/no_edge_detection_filter5.PNG) | ![](img/results/no_edge_detection_filter7.PNG) | + +As can be seen, this looks very similar to the pure gaussian blur from the previous section. +It just blurs the entire screen, and it would be rare to describe it as "denoising". + +Specifically, the offset between pixels for each iteration ```i``` of the kernel is ```2^i```. + +In order to implement this blurring operation, I needed the blur filter, which was 5x5, +a filter offset array, which was also 5x5, and two buffers of vec3s that stored the color +information between blur processes and were eventually written to the openGL PBO to be rendered. +I needed to buffers because I needed to ping-pong between them between iterations of the +denoising kernel. + +### Edge Detection + +#### G-Buffer + +The edge detection process uses the positions and normals at the intersection of the +camera rays associated with each pixel. So, in order to have this information available to us +to use in post process, we need to create a new geometry buffer (G-Buffer) to store relevant +information at each pixel. + +Below you can see a visualization of the data collected in this G-Buffer for a simple scene: + +| Position Buffer | Normal Buffer | Depth Buffer | +| ----------- | ----------- | ----------- | +| ![](img/results/pos_buffer.PNG) | ![](img/results/nor_buffer.PNG) | ![](img/results/depth_buffer.PNG) | + +#### Edge Detecting with Weights + +Edge detection is perfomed by using the source path-traced image (i.e. per-pixel color information), +per-pixel intersection world space positions, and per-pixel intersection world space normals. +At a certain pixel ```p```, the squared distance between ```p's``` position, normals, and +color information and one of ```p's``` neighbors (what index into the filter the process is in) +is calculated. Then weighting terms for these three components (color, position, and normal) +are calculated using an exponential function. The three weights are multiplied together +to get a combined weight for this pixel comparison. The combined weight is then +multiplied by the filter value and the offset pixel value to get the contribution at that offset. +The weight is also accumulated by multiplying it by the filter value and adding it to a +variable keeping track of the sum of weights. At the end, the accumulated color is divided by +the accumulated weights to yield the final pixel color at ```p```, now denoised. + +While solving for the weight values for position, normal, and color, bias values are also included +that are parameterizable by the user. This allow the artist to increase and decrease the scale of +these individual components. Increasing the color bias value causes a greater amount of blur. +Increasing the normal bias causes more smoothing along object boundaries where the per-pixel +normal values have large change. Increasing the position bias causes more smoothing along +object boundaries where one object is in front of another. Increasing the normal and position +biases, along with the color bias, will cause the edge detection to fail (at least for the +test scene), as the normal and position values have almost no impact now on the amount of blurring +going on between objects. + +## Visual Analysis + + +### Filter Size +Below is a visual comparison of different filter sizes with edge detection. The number of +sample-per-pixel is only 20, with a very high color weight. + +| Kernel Offset = 1 | Kernel Offset = 2 | Kernel Offset = 4 | +| ----------- | ----------- | ----------- | +| ![](img/results/kernel_size_1_iter_1.PNG) | ![](img/results/kernel_size_2_iter_1.PNG) | ![](img/results/kernel_size_3_iter_1.PNG) | + +| Kernel Offset = 8 | Kernel Offset = 16 | Kernel Offset = 32 | +| ----------- | ----------- | ----------- | +| ![](img/results/kernel_size_4_iter_1.PNG) | ![](img/results/kernel_size_5_iter_1.PNG) | ![](img/results/kernel_size_6_iter_1.PNG) | + +| Kernel Offset = 64 | Kernel Offset = 128 | Kernel Offset = 256 | +| ----------- | ----------- | ----------- | +| ![](img/results/kernel_size_7_iter_1.PNG) | ![](img/results/kernel_size_8_iter_1.PNG) | ![](img/results/kernel_size_9_iter_1.PNG) | + +I would not say that the visual quality scales uniformly with scale. At around kernel offset +of 16 is about where the visable changes slow down significantly, so much so that I cannot +really make it out with my eyes. Additionally, changing the kernel offset from 1 to 2 does not +really make a large visual impact, but offsets 4, 8, and 16 have large changes. This is due to a mixture of +the large magnitude of the noise in the source image, our eyes perception of that noise and +the change in it from iteration to iteration, and also because the color weight of the denoising +algorithm is cut in half for each denoising kernel call, which the authors used to help keep small +scale detail. + +### Different Material Types + +| 1 Iteration Not Denoised | 1 Iteration Denoised | 5000 Iterations Not Denoised | +| ----------- | ----------- | ----------- | +| ![](img/results/matcom_1iter.PNG) | ![](img/results/matcomp_denoised.PNG) | ![](img/results/matcomp_5000iter.PNG) | + +As can be seen from the comparison above, the denoising algorithm actually struggles a bit with specular +materials. While the edge detection handles the edges of the specular material quite well, the +reflection on the surface appears rough and more like a microfacet material. This also does not +go completely away until hundreds of iterations of path tracing are generated. Unlike the +specular surface, the diffuse surface already scatters light in all directions randomly, so +the smudging and blurring is MUCH less apparant. At one iteration and denoised the sphere +almost looks good enough to consider converged. + +### Different Scenes + +| | 1 Iteration | 100 Iterations | +| ----------- | ----------- | ----------- | +| Smaller Ceiling Light | ![](img/results/cornell_1iter_denoised.PNG) | ![](img/results/cornell_100iter_denoised.PNG) | +| Larger Ceiling Light | ![](img/results/iteration_1_denoised.PNG) | ![](img/results/iteration_100_denoised.PNG) | + +As can be seen above, the denoising algorithm struggles a lot more with the smaller light scene +than the larger light scene. This is because the smaller light scene will naturally sample the +light less times than the one with a larger light, because it is much more likely to hit the +larger light while sending a ray in a random direction. This of course impacts the denoising, +because at lower path-tracing iterations, this will mean more pixels are black, and also +there will be much more variance between pixels. Both of these are bad for blurring, because +blurring requires there to be at least some minimum amount of useful information to use +without looking splotchy (almost like low iteration photon mapping). + +## Performance Analysis + +### Convergence +Below shows renders of different samples-per-pixel before and after denoising: + +| 1 Iteration | 5 Iterations | 10 Iterations | +| ----------- | ----------- | ----------- | +| Original | Original | Original | +| ![](img/results/iteration_1.PNG) | ![](img/results/iteration_5.PNG) | ![](img/results/iteration_10.PNG) | +| Denoised | Denoised | Denoised | +| ![](img/results/iteration_1_denoised.PNG) | ![](img/results/iteration_5_denoised.PNG) | ![](img/results/iteration_10_denoised.PNG) | + +| 50 Iteration | 100 Iterations | 500 Iterations | +| ----------- | ----------- | ----------- | +| Original | Original | Original | +| ![](img/results/iteration_50.PNG) | ![](img/results/iteration_100.PNG) | ![](img/results/iteration_500.PNG) | +| Denoised | Denoised | Denoised | +| ![](img/results/iteration_50_denoised.PNG) | ![](img/results/iteration_100_denoised.PNG) | ![](img/results/iteration_500_denoised.PNG) | + +| 1000 Iteration | 5000 Iterations | +| ----------- | ----------- | +| Original | Original | +| ![](img/results/iteration_1000.PNG) | ![](img/results/iteration_5000.PNG) | +| Denoised | Denoised | +| ![](img/results/iteration_1000_denoised.PNG) | ![](img/results/iteration_5000_denoised.PNG) | + + +I would say that, for denoising, iteration 500 is about where I would say the results are "acceptably smooth". +And what I mean by that, is that by iteration 500 the image not only looks like the background colors +have smooth outed to a near converged look, but also that the specular sphere no longer looks smudged. +In comparison, I think that between 1000-5000 iterations is where the none denoised render +looks "acceptably smooth". Anything before that has the apparant path-tracing noise pattern. +Here are the diff images for 500 iterations with and without denoising in comparison to +the 5000 iteration result (with no denoising): + +| | Original | Denoised | +| ----------- | ----------- | ----------- | +| Render | ![](img/results/iteration_500.PNG) | ![](img/results/iteration_500_denoised.PNG) | +| Diff from 5000 iter | ![](img/results/diff_500.PNG) | ![](img/results/diff_500_denoised.PNG) | + +### Varying Filter Size + +![](img/figures/denoise_runtime_vs_pt_iter.png) + +As can be seen from the graph above, the amount of time taken for the denoising algorithm +with varying number of path tracing iterations is the same; this is because the denoising +algorithm is only influenced by the resolution of the image to denoise and the size of the +convolution filter. In addition, each increase in filter size only results in a constant amount +of additional runtime, about equal to the size of filter size 1. This is because the number of +pixels we are sampling for each additional iteration of the denoising kernel is the same, +as a result of the A Trous Wavelet. This means that this algorithm scales very well regardless +of number of samples taken. + +### Path Tracing vs Denoising + +![](img/figures/pathtracing_v_denoising.png) + +As can be seen from the figure above, and having shown already that the denoising algorithm +runtime is independent of the number of path tracing iterations and indeed constant at a +certain filter size, the percentage of time taken to do the denoising operation vs. the actual +path tracing decreases exponentially, and the dropoff is fast. This is because each path tracing +iteration takes about 7 seconds to run, about the same time as the denoising algorithm. So, each +additional iteration of path tracing cuts the percentage of time taken to do the denoising +be around ```1 / iter + 1```. + +### Render Resolution + +![](img/figures/resolution.png) + +As can be seen from the figure above, the runtime of the denoising algorithm increases about +quadratically with increased resolution (where width and height are the same). This is what we +expect. Although the kernel size is constant for all of these data points, the number of pixels +the GPU needs to run the code on increases quadratically as well. + +## References + +Edge-Avoiding A-Trous Wavelet Transform for fast Global Illumination Filtering: + +Paper: https://jo.dreggn.org/home/2010_atrous.pdf + +Presentation: https://www.highperformancegraphics.org/previous/www_2010/media/RayTracing_I/HPG2010_RayTracing_I_Dammertz.pdf + +Wikipedia Gaussian Blur: https://en.wikipedia.org/wiki/Gaussian_blur + +Filter used in the paper: https://www.eso.org/sci/software/esomidas/doc/user/18NOV/volb/node317.html + +Gaussian blur from Krita Application: https://krita.org/en/ \ No newline at end of file diff --git a/img/figures/denoise_runtime_vs_pt_iter.png b/img/figures/denoise_runtime_vs_pt_iter.png new file mode 100644 index 00000000..a3536514 Binary files /dev/null and b/img/figures/denoise_runtime_vs_pt_iter.png differ diff --git a/img/figures/gaussian_blur.PNG b/img/figures/gaussian_blur.PNG new file mode 100644 index 00000000..4dc7374b Binary files /dev/null and b/img/figures/gaussian_blur.PNG differ diff --git a/img/figures/gaussiankernel.PNG b/img/figures/gaussiankernel.PNG new file mode 100644 index 00000000..ecce8bc0 Binary files /dev/null and b/img/figures/gaussiankernel.PNG differ diff --git a/img/figures/kerneloffsets.PNG b/img/figures/kerneloffsets.PNG new file mode 100644 index 00000000..aee82070 Binary files /dev/null and b/img/figures/kerneloffsets.PNG differ diff --git a/img/figures/pathtracing_v_denoising.png b/img/figures/pathtracing_v_denoising.png new file mode 100644 index 00000000..c1910102 Binary files /dev/null and b/img/figures/pathtracing_v_denoising.png differ diff --git a/img/figures/resolution.png b/img/figures/resolution.png new file mode 100644 index 00000000..e81cbf05 Binary files /dev/null and b/img/figures/resolution.png differ diff --git a/img/results/ceiling_light.PNG b/img/results/ceiling_light.PNG new file mode 100644 index 00000000..c47f0a9f Binary files /dev/null and b/img/results/ceiling_light.PNG differ diff --git a/img/results/cornell_100iter_denoised.PNG b/img/results/cornell_100iter_denoised.PNG new file mode 100644 index 00000000..a46acdc0 Binary files /dev/null and b/img/results/cornell_100iter_denoised.PNG differ diff --git a/img/results/cornell_1iter_denoised.PNG b/img/results/cornell_1iter_denoised.PNG new file mode 100644 index 00000000..c46eec4f Binary files /dev/null and b/img/results/cornell_1iter_denoised.PNG differ diff --git a/img/results/depth_buffer.PNG b/img/results/depth_buffer.PNG new file mode 100644 index 00000000..d5dc34fa Binary files /dev/null and b/img/results/depth_buffer.PNG differ diff --git a/img/results/diff_1000_denoised.PNG b/img/results/diff_1000_denoised.PNG new file mode 100644 index 00000000..b239367a Binary files /dev/null and b/img/results/diff_1000_denoised.PNG differ diff --git a/img/results/diff_500.PNG b/img/results/diff_500.PNG new file mode 100644 index 00000000..b57cb99a Binary files /dev/null and b/img/results/diff_500.PNG differ diff --git a/img/results/diff_500_denoised.PNG b/img/results/diff_500_denoised.PNG new file mode 100644 index 00000000..dfbf879f Binary files /dev/null and b/img/results/diff_500_denoised.PNG differ diff --git a/img/results/iteration_1.PNG b/img/results/iteration_1.PNG new file mode 100644 index 00000000..70437556 Binary files /dev/null and b/img/results/iteration_1.PNG differ diff --git a/img/results/iteration_10.PNG b/img/results/iteration_10.PNG new file mode 100644 index 00000000..fa35e631 Binary files /dev/null and b/img/results/iteration_10.PNG differ diff --git a/img/results/iteration_100.PNG b/img/results/iteration_100.PNG new file mode 100644 index 00000000..b0184b5f Binary files /dev/null and b/img/results/iteration_100.PNG differ diff --git a/img/results/iteration_1000.PNG b/img/results/iteration_1000.PNG new file mode 100644 index 00000000..4cd2fd95 Binary files /dev/null and b/img/results/iteration_1000.PNG differ diff --git a/img/results/iteration_1000_denoised.PNG b/img/results/iteration_1000_denoised.PNG new file mode 100644 index 00000000..0dfe63ee Binary files /dev/null and b/img/results/iteration_1000_denoised.PNG differ diff --git a/img/results/iteration_100_denoised.PNG b/img/results/iteration_100_denoised.PNG new file mode 100644 index 00000000..3b63df90 Binary files /dev/null and b/img/results/iteration_100_denoised.PNG differ diff --git a/img/results/iteration_10_denoised.PNG b/img/results/iteration_10_denoised.PNG new file mode 100644 index 00000000..ff02db13 Binary files /dev/null and b/img/results/iteration_10_denoised.PNG differ diff --git a/img/results/iteration_1_denoised.PNG b/img/results/iteration_1_denoised.PNG new file mode 100644 index 00000000..168b9d3c Binary files /dev/null and b/img/results/iteration_1_denoised.PNG differ diff --git a/img/results/iteration_5.PNG b/img/results/iteration_5.PNG new file mode 100644 index 00000000..f188122c Binary files /dev/null and b/img/results/iteration_5.PNG differ diff --git a/img/results/iteration_50.PNG b/img/results/iteration_50.PNG new file mode 100644 index 00000000..ee3d5076 Binary files /dev/null and b/img/results/iteration_50.PNG differ diff --git a/img/results/iteration_500.PNG b/img/results/iteration_500.PNG new file mode 100644 index 00000000..a4fea651 Binary files /dev/null and b/img/results/iteration_500.PNG differ diff --git a/img/results/iteration_5000.PNG b/img/results/iteration_5000.PNG new file mode 100644 index 00000000..49151218 Binary files /dev/null and b/img/results/iteration_5000.PNG differ diff --git a/img/results/iteration_5000_denoised.PNG b/img/results/iteration_5000_denoised.PNG new file mode 100644 index 00000000..0bfe99f8 Binary files /dev/null and b/img/results/iteration_5000_denoised.PNG differ diff --git a/img/results/iteration_500_denoised.PNG b/img/results/iteration_500_denoised.PNG new file mode 100644 index 00000000..1ade5705 Binary files /dev/null and b/img/results/iteration_500_denoised.PNG differ diff --git a/img/results/iteration_50_denoised.PNG b/img/results/iteration_50_denoised.PNG new file mode 100644 index 00000000..5f431f98 Binary files /dev/null and b/img/results/iteration_50_denoised.PNG differ diff --git a/img/results/iteration_5_denoised.PNG b/img/results/iteration_5_denoised.PNG new file mode 100644 index 00000000..85feefaa Binary files /dev/null and b/img/results/iteration_5_denoised.PNG differ diff --git a/img/results/kernel_size_10_iter_20.PNG b/img/results/kernel_size_10_iter_20.PNG new file mode 100644 index 00000000..105d39ee Binary files /dev/null and b/img/results/kernel_size_10_iter_20.PNG differ diff --git a/img/results/kernel_size_1_iter_1.PNG b/img/results/kernel_size_1_iter_1.PNG new file mode 100644 index 00000000..9490b1e2 Binary files /dev/null and b/img/results/kernel_size_1_iter_1.PNG differ diff --git a/img/results/kernel_size_1_iter_20.PNG b/img/results/kernel_size_1_iter_20.PNG new file mode 100644 index 00000000..414fda94 Binary files /dev/null and b/img/results/kernel_size_1_iter_20.PNG differ diff --git a/img/results/kernel_size_2_iter_1.PNG b/img/results/kernel_size_2_iter_1.PNG new file mode 100644 index 00000000..f27c5d1a Binary files /dev/null and b/img/results/kernel_size_2_iter_1.PNG differ diff --git a/img/results/kernel_size_2_iter_20.PNG b/img/results/kernel_size_2_iter_20.PNG new file mode 100644 index 00000000..d2cb21a6 Binary files /dev/null and b/img/results/kernel_size_2_iter_20.PNG differ diff --git a/img/results/kernel_size_3_iter_1.PNG b/img/results/kernel_size_3_iter_1.PNG new file mode 100644 index 00000000..c1d78b0c Binary files /dev/null and b/img/results/kernel_size_3_iter_1.PNG differ diff --git a/img/results/kernel_size_3_iter_20.PNG b/img/results/kernel_size_3_iter_20.PNG new file mode 100644 index 00000000..ef3cb04d Binary files /dev/null and b/img/results/kernel_size_3_iter_20.PNG differ diff --git a/img/results/kernel_size_4_iter_1.PNG b/img/results/kernel_size_4_iter_1.PNG new file mode 100644 index 00000000..3cef2719 Binary files /dev/null and b/img/results/kernel_size_4_iter_1.PNG differ diff --git a/img/results/kernel_size_4_iter_20.PNG b/img/results/kernel_size_4_iter_20.PNG new file mode 100644 index 00000000..07c80904 Binary files /dev/null and b/img/results/kernel_size_4_iter_20.PNG differ diff --git a/img/results/kernel_size_5_iter_1.PNG b/img/results/kernel_size_5_iter_1.PNG new file mode 100644 index 00000000..b510903b Binary files /dev/null and b/img/results/kernel_size_5_iter_1.PNG differ diff --git a/img/results/kernel_size_5_iter_20.PNG b/img/results/kernel_size_5_iter_20.PNG new file mode 100644 index 00000000..a76a948b Binary files /dev/null and b/img/results/kernel_size_5_iter_20.PNG differ diff --git a/img/results/kernel_size_6_iter_1.PNG b/img/results/kernel_size_6_iter_1.PNG new file mode 100644 index 00000000..0f7b29fe Binary files /dev/null and b/img/results/kernel_size_6_iter_1.PNG differ diff --git a/img/results/kernel_size_6_iter_20.PNG b/img/results/kernel_size_6_iter_20.PNG new file mode 100644 index 00000000..a47e5983 Binary files /dev/null and b/img/results/kernel_size_6_iter_20.PNG differ diff --git a/img/results/kernel_size_7_iter_1.PNG b/img/results/kernel_size_7_iter_1.PNG new file mode 100644 index 00000000..d22e8fea Binary files /dev/null and b/img/results/kernel_size_7_iter_1.PNG differ diff --git a/img/results/kernel_size_7_iter_20.PNG b/img/results/kernel_size_7_iter_20.PNG new file mode 100644 index 00000000..50d361e2 Binary files /dev/null and b/img/results/kernel_size_7_iter_20.PNG differ diff --git a/img/results/kernel_size_8_iter_1.PNG b/img/results/kernel_size_8_iter_1.PNG new file mode 100644 index 00000000..2ae699e6 Binary files /dev/null and b/img/results/kernel_size_8_iter_1.PNG differ diff --git a/img/results/kernel_size_8_iter_20.PNG b/img/results/kernel_size_8_iter_20.PNG new file mode 100644 index 00000000..d102b278 Binary files /dev/null and b/img/results/kernel_size_8_iter_20.PNG differ diff --git a/img/results/kernel_size_9_iter_1.PNG b/img/results/kernel_size_9_iter_1.PNG new file mode 100644 index 00000000..22af5013 Binary files /dev/null and b/img/results/kernel_size_9_iter_1.PNG differ diff --git a/img/results/kernel_size_9_iter_20.PNG b/img/results/kernel_size_9_iter_20.PNG new file mode 100644 index 00000000..f82ae4a2 Binary files /dev/null and b/img/results/kernel_size_9_iter_20.PNG differ diff --git a/img/results/matcom_1iter.PNG b/img/results/matcom_1iter.PNG new file mode 100644 index 00000000..a382d1bb Binary files /dev/null and b/img/results/matcom_1iter.PNG differ diff --git a/img/results/matcomp_5000iter.PNG b/img/results/matcomp_5000iter.PNG new file mode 100644 index 00000000..677e73d6 Binary files /dev/null and b/img/results/matcomp_5000iter.PNG differ diff --git a/img/results/matcomp_denoised.PNG b/img/results/matcomp_denoised.PNG new file mode 100644 index 00000000..7989ddce Binary files /dev/null and b/img/results/matcomp_denoised.PNG differ diff --git a/img/results/no_edge_detection_filter1.PNG b/img/results/no_edge_detection_filter1.PNG new file mode 100644 index 00000000..2c1a10df Binary files /dev/null and b/img/results/no_edge_detection_filter1.PNG differ diff --git a/img/results/no_edge_detection_filter3.PNG b/img/results/no_edge_detection_filter3.PNG new file mode 100644 index 00000000..74200615 Binary files /dev/null and b/img/results/no_edge_detection_filter3.PNG differ diff --git a/img/results/no_edge_detection_filter5.PNG b/img/results/no_edge_detection_filter5.PNG new file mode 100644 index 00000000..27bddeb7 Binary files /dev/null and b/img/results/no_edge_detection_filter5.PNG differ diff --git a/img/results/no_edge_detection_filter7.PNG b/img/results/no_edge_detection_filter7.PNG new file mode 100644 index 00000000..c8c4d47d Binary files /dev/null and b/img/results/no_edge_detection_filter7.PNG differ diff --git a/img/results/nor_buffer.PNG b/img/results/nor_buffer.PNG new file mode 100644 index 00000000..4a5624e6 Binary files /dev/null and b/img/results/nor_buffer.PNG differ diff --git a/img/results/plain_blur_i1.PNG b/img/results/plain_blur_i1.PNG new file mode 100644 index 00000000..4019d3f2 Binary files /dev/null and b/img/results/plain_blur_i1.PNG differ diff --git a/img/results/plain_blur_i10.PNG b/img/results/plain_blur_i10.PNG new file mode 100644 index 00000000..6b8224f4 Binary files /dev/null and b/img/results/plain_blur_i10.PNG differ diff --git a/img/results/plain_blur_i2.PNG b/img/results/plain_blur_i2.PNG new file mode 100644 index 00000000..12e10b30 Binary files /dev/null and b/img/results/plain_blur_i2.PNG differ diff --git a/img/results/plain_blur_i3.PNG b/img/results/plain_blur_i3.PNG new file mode 100644 index 00000000..f8489112 Binary files /dev/null and b/img/results/plain_blur_i3.PNG differ diff --git a/img/results/plain_blur_i4.PNG b/img/results/plain_blur_i4.PNG new file mode 100644 index 00000000..2fce8302 Binary files /dev/null and b/img/results/plain_blur_i4.PNG differ diff --git a/img/results/plain_blur_i5.PNG b/img/results/plain_blur_i5.PNG new file mode 100644 index 00000000..3abbf06a Binary files /dev/null and b/img/results/plain_blur_i5.PNG differ diff --git a/img/results/plain_blur_i6.PNG b/img/results/plain_blur_i6.PNG new file mode 100644 index 00000000..634e9f96 Binary files /dev/null and b/img/results/plain_blur_i6.PNG differ diff --git a/img/results/plain_blur_i7.PNG b/img/results/plain_blur_i7.PNG new file mode 100644 index 00000000..93259cf4 Binary files /dev/null and b/img/results/plain_blur_i7.PNG differ diff --git a/img/results/plain_blur_i8.PNG b/img/results/plain_blur_i8.PNG new file mode 100644 index 00000000..e27ccc80 Binary files /dev/null and b/img/results/plain_blur_i8.PNG differ diff --git a/img/results/plain_blur_i9.PNG b/img/results/plain_blur_i9.PNG new file mode 100644 index 00000000..5fe5c30e Binary files /dev/null and b/img/results/plain_blur_i9.PNG differ diff --git a/img/results/pos_buffer.PNG b/img/results/pos_buffer.PNG new file mode 100644 index 00000000..3f99ac62 Binary files /dev/null and b/img/results/pos_buffer.PNG differ diff --git a/img/results/render_1.PNG b/img/results/render_1.PNG new file mode 100644 index 00000000..ae2a1945 Binary files /dev/null and b/img/results/render_1.PNG differ diff --git a/img/results/render_100.PNG b/img/results/render_100.PNG new file mode 100644 index 00000000..d9df5606 Binary files /dev/null and b/img/results/render_100.PNG differ diff --git a/img/results/render_1000.PNG b/img/results/render_1000.PNG new file mode 100644 index 00000000..3e83bf69 Binary files /dev/null and b/img/results/render_1000.PNG differ diff --git a/img/results/render_denoised_1.PNG b/img/results/render_denoised_1.PNG new file mode 100644 index 00000000..3a8ffe98 Binary files /dev/null and b/img/results/render_denoised_1.PNG differ diff --git a/img/results/render_denoised_100.PNG b/img/results/render_denoised_100.PNG new file mode 100644 index 00000000..73d6782b Binary files /dev/null and b/img/results/render_denoised_100.PNG differ diff --git a/img/results/render_denoised_1000.PNG b/img/results/render_denoised_1000.PNG new file mode 100644 index 00000000..94bfec82 Binary files /dev/null and b/img/results/render_denoised_1000.PNG differ diff --git a/scenes/cornell.txt b/scenes/cornell.txt index 83ff8202..097cc68d 100644 --- a/scenes/cornell.txt +++ b/scenes/cornell.txt @@ -51,11 +51,11 @@ EMITTANCE 0 // Camera CAMERA RES 800 800 -FOVY 45 -ITERATIONS 5000 +FOVY 19.5 +ITERATIONS 100 DEPTH 8 FILE cornell -EYE 0.0 5 10.5 +EYE 0.0 5 19.0 LOOKAT 0 5 0 UP 0 1 0 diff --git a/scenes/cornell_ceiling_light.txt b/scenes/cornell_ceiling_light.txt index 15af5f19..bcdbe242 100644 --- a/scenes/cornell_ceiling_light.txt +++ b/scenes/cornell_ceiling_light.txt @@ -51,11 +51,11 @@ EMITTANCE 0 // Camera CAMERA RES 800 800 -FOVY 45 -ITERATIONS 10 +FOVY 19.5 +ITERATIONS 5000 DEPTH 8 FILE cornell -EYE 0.0 5 10.5 +EYE 0.0 5 19.0 LOOKAT 0 5 0 UP 0 1 0 diff --git a/scenes/custom_scene.txt b/scenes/custom_scene.txt new file mode 100644 index 00000000..cc7c74ad --- /dev/null +++ b/scenes/custom_scene.txt @@ -0,0 +1,183 @@ +// Emissive material (light) +MATERIAL 0 +RGB 1 1 1 +SPECEX 0 +SPECRGB 0 0 0 +REFL 0 +REFR 0 +REFRIOR 0 +EMITTANCE 5 + +// Diffuse white +MATERIAL 1 +RGB .85 .55 .35 +SPECEX 0 +SPECRGB 0 0 0 +REFL 0 +REFR 0 +REFRIOR 0 +EMITTANCE 0 + +// Diffuse red +MATERIAL 2 +RGB .98 .98 .98 +SPECEX 0 +SPECRGB 0 0 0 +REFL 0 +REFR 0 +REFRIOR 0 +EMITTANCE 0 + +// Diffuse green +MATERIAL 3 +RGB .98 .98 .98 +SPECEX 0 +SPECRGB 0 0 0 +REFL 0 +REFR 0 +REFRIOR 0 +EMITTANCE 0 + +// Specular white +MATERIAL 4 +RGB .98 .98 .98 +SPECEX 0 +SPECRGB .98 .98 .98 +REFL 1 +REFR 0 +REFRIOR 0 +EMITTANCE 0 + +// Specular white +MATERIAL 5 +RGB .73 .81 .98 +SPECEX 0 +SPECRGB 0 0 0 +REFL 0 +REFR 0 +REFRIOR 0 +EMITTANCE 0 + +// Camera +CAMERA +RES 1920 1080 +FOVY 19.5 +ITERATIONS 1000 +DEPTH 8 +FILE cornell +EYE 0.0 5 19.0 +LOOKAT 0 5 0 +UP 0 1 0 + + +// Ceiling light +OBJECT 0 +cube +material 0 +TRANS 6.5 7 -3 +ROTAT 35 24 73 +SCALE 2 2 2 + +// Floor +OBJECT 1 +cube +material 1 +TRANS 0 0 0 +ROTAT 0 0 0 +SCALE 20 .01 10 + +// Ceiling +OBJECT 2 +cube +material 1 +TRANS 0 10 0 +ROTAT 0 0 90 +SCALE .01 20 10 + +// Back wall +OBJECT 3 +cube +material 1 +TRANS 0 5 -5 +ROTAT 0 90 0 +SCALE .01 10 20 + +// Left wall +OBJECT 4 +cube +material 2 +TRANS -10 5 0 +ROTAT 0 0 0 +SCALE .01 10 10 + +// Right wall +OBJECT 5 +cube +material 3 +TRANS 10 5 0 +ROTAT 0 0 0 +SCALE .01 10 10 + +// Sphere +OBJECT 6 +sphere +material 4 +TRANS -3 4 1 +ROTAT 0 0 0 +SCALE 3 3 3 + +// Sphere +OBJECT 7 +sphere +material 5 +TRANS 7 2 -1 +ROTAT 0 0 0 +SCALE 4 4 4 + +// Sphere +OBJECT 8 +sphere +material 0 +TRANS -6 0 -3 +ROTAT 0 0 0 +SCALE 3 3 3 + +// Sphere +OBJECT 9 +cube +material 4 +TRANS -9 2 3 +ROTAT 0 0 0 +SCALE 1 4 1 + +// Ceiling light +OBJECT 10 +cube +material 0 +TRANS 3 0 2 +ROTAT 0 0 0 +SCALE 3 1 3 + +// Ceiling light +OBJECT 11 +sphere +material 0 +TRANS 0 10 0 +ROTAT 0 0 0 +SCALE 5 1 5 + +// Ceiling light +OBJECT 12 +cube +material 4 +TRANS 3 2 2 +ROTAT 45 45 0 +SCALE 2 2 2 + +// Sphere +OBJECT 13 +sphere +material 5 +TRANS -2 0 3 +ROTAT 0 0 0 +SCALE 6 6 6 \ No newline at end of file diff --git a/src/main.cpp b/src/main.cpp index 4092ae4a..f7dbb803 100644 --- a/src/main.cpp +++ b/src/main.cpp @@ -24,10 +24,14 @@ int startupIterations = 0; int lastLoopIterations = 0; bool ui_showGbuffer = false; bool ui_denoise = false; -int ui_filterSize = 80; -float ui_colorWeight = 0.45f; -float ui_normalWeight = 0.35f; -float ui_positionWeight = 0.2f; +int ui_filterIterations= 10; +int lastFilterIterations = 10; +float ui_colorWeight = 150.0f; +float lastColorWeight = 150.0f; +float ui_normalWeight = 0.061f; +float lastNormalWeight = 0.061f; +float ui_positionWeight = 3.049f; +float lastPositionWeight = 3.049f; bool ui_saveAndExit = false; static bool camchanged = true; @@ -45,6 +49,10 @@ int iteration; int width; int height; +bool denoise = true; +float time_taken_pathtrace = 0.0f; +float time_taken_denoise = 0.0f; + //------------------------------- //-------------MAIN-------------- //------------------------------- @@ -126,6 +134,23 @@ void runCuda() { camchanged = true; } + if (lastColorWeight != ui_colorWeight) { + lastColorWeight = ui_colorWeight; + denoise = true; + } + if (lastNormalWeight != ui_normalWeight) { + lastNormalWeight = ui_normalWeight; + denoise = true; + } + if (lastPositionWeight != ui_positionWeight) { + lastPositionWeight = ui_positionWeight; + denoise = true; + } + if (lastFilterIterations != ui_filterIterations) { + lastFilterIterations = ui_filterIterations; + denoise = true; + } + if (camchanged) { iteration = 0; Camera &cam = renderState->camera; @@ -144,6 +169,8 @@ void runCuda() { cameraPosition += cam.lookAt; cam.position = cameraPosition; camchanged = false; + denoise = true; + time_taken_pathtrace = 0.0f; } // Map OpenGL buffer object for writing from CUDA on a single GPU @@ -154,6 +181,7 @@ void runCuda() { pathtraceInit(scene); } + uchar4 *pbo_dptr = NULL; cudaGLMapBufferObject((void**)&pbo_dptr, pbo); @@ -162,12 +190,35 @@ void runCuda() { // execute the kernel int frame = 0; + PerformanceTimer perf_timer; + perf_timer.startGpuTimer(); pathtrace(frame, iteration); + perf_timer.endGpuTimer(); + time_taken_pathtrace += perf_timer.getGpuElapsedTimeForPreviousOperation(); + } if (ui_showGbuffer) { showGBuffer(pbo_dptr); - } else { + } + else if (iteration == ui_iterations) { + + PerformanceTimer perf_timer; + perf_timer.startGpuTimer(); + DenoiseParams denoise_params{ denoise, ui_positionWeight, ui_normalWeight, ui_colorWeight, ui_filterIterations }; + denoiseAndShowImage(pbo_dptr, iteration, denoise_params); + perf_timer.endGpuTimer(); + if (denoise == true) { + std::cout << time_taken_pathtrace << std::endl; + time_taken_denoise = perf_timer.getGpuElapsedTimeForPreviousOperation(); + std::cout << time_taken_denoise << std::endl; + } + + //showImage(pbo_dptr, iteration); + denoise = false; + //iteration++; + } + else { showImage(pbo_dptr, iteration); } diff --git a/src/main.h b/src/main.h index 06d311a8..dd0c9d9e 100644 --- a/src/main.h +++ b/src/main.h @@ -36,7 +36,7 @@ extern int ui_iterations; extern int startupIterations; extern bool ui_showGbuffer; extern bool ui_denoise; -extern int ui_filterSize; +extern int ui_filterIterations; extern float ui_colorWeight; extern float ui_normalWeight; extern float ui_positionWeight; diff --git a/src/pathtrace.cu b/src/pathtrace.cu index 23e5f909..6fbb18ad 100644 --- a/src/pathtrace.cu +++ b/src/pathtrace.cu @@ -16,6 +16,12 @@ #define ERRORCHECK 1 +//#define DISPLAY_POSITION +//#define DISPLAY_NORMAL +//#define DISPLAY_DEPTH + +#define WAVELET + #define FILENAME (strrchr(__FILE__, '/') ? strrchr(__FILE__, '/') + 1 : __FILE__) #define checkCUDAError(msg) checkCUDAErrorFn(msg, FILENAME, __LINE__) void checkCUDAErrorFn(const char *msg, const char *file, int line) { @@ -73,17 +79,33 @@ __global__ void gbufferToPBO(uchar4* pbo, glm::ivec2 resolution, GBufferPixel* g if (x < resolution.x && y < resolution.y) { int index = x + (y * resolution.x); - float timeToIntersect = gBuffer[index].t * 256.0; + //float timeToIntersect = gBuffer[index].t * 256.0; pbo[index].w = 0; - pbo[index].x = timeToIntersect; - pbo[index].y = timeToIntersect; - pbo[index].z = timeToIntersect; +#ifdef DISPLAY_NORMAL + pbo[index].x = ((gBuffer[index].nor.x + 1.0f) * 0.5f) * 255.0f; + pbo[index].y = ((gBuffer[index].nor.y + 1.0f) * 0.5f) * 255.0f; + pbo[index].z = ((gBuffer[index].nor.z + 1.0f) * 0.5f) * 255.0f; +#endif + +#ifdef DISPLAY_POSITION + pbo[index].x = glm::abs(gBuffer[index].pos.x) * 0.1f * 255.0f; + pbo[index].y = glm::abs(gBuffer[index].pos.y) * 0.1f * 255.0f; + pbo[index].z = glm::abs(gBuffer[index].pos.z) * 0.1f * 255.0f; +#endif + +#ifdef DISPLAY_DEPTH + pbo[index].x = glm::abs(gBuffer[index].t) * 0.1f * 255.0f; + pbo[index].y = glm::abs(gBuffer[index].t) * 0.1f * 255.0f; + pbo[index].z = glm::abs(gBuffer[index].t) * 0.1f * 255.0f; +#endif } } static Scene * hst_scene = NULL; static glm::vec3 * dev_image = NULL; +static glm::vec3* dev_image_denoised_a = NULL; +static glm::vec3* dev_image_denoised_b = NULL; static Geom * dev_geoms = NULL; static Material * dev_materials = NULL; static PathSegment * dev_paths = NULL; @@ -92,6 +114,25 @@ static GBufferPixel* dev_gBuffer = NULL; // TODO: static variables for device memory, any extra info you need, etc // ... +// values from +// https://www.eso.org/sci/software/esomidas/doc/user/18NOV/volb/node317.html + +const float h_kernel[25] = { 1.0f / 256.0f, 1.0f / 64.0f, 3.0f / 128.0f, 1.0f / 64.0f, 1.0f / 256.0f, + 1.0 / 64.0f, 1.0f / 16.0f, 3.0 / 32.0f, 1.0f / 16.0f, 1.0f / 64.0f, + 3.0f / 128.0f, 3.0f / 32.0f, 9.0f / 64.0f, 3.0f / 32.0f, 3.0f / 128.0f, + 1.0 / 64.0f, 1.0f / 16.0f, 3.0 / 32.0f, 1.0f / 16.0f, 1.0f / 64.0f, + 1.0f / 256.0f, 1.0f / 64.0f, 3.0f / 128.0f, 1.0f / 64.0f, 1.0f / 256.0f}; + +static float* dev_h_kernel = NULL; + +const glm::ivec2 h_kernel_offsets[25] = { glm::ivec2(-2,-2), glm::ivec2(-1,-2), glm::ivec2(0,-2), glm::ivec2(1,-2), glm::ivec2(2,-2), + glm::ivec2(-2,-1), glm::ivec2(-1,-1), glm::ivec2(0,-1), glm::ivec2(1,-1), glm::ivec2(2,-1), + glm::ivec2(-2,0), glm::ivec2(-1,0), glm::ivec2(0,0), glm::ivec2(1,0), glm::ivec2(2,0), + glm::ivec2(-2,1), glm::ivec2(-1,1), glm::ivec2(0,1), glm::ivec2(1,1), glm::ivec2(2,1), + glm::ivec2(-2,2), glm::ivec2(-1,2), glm::ivec2(0,2), glm::ivec2(1,2), glm::ivec2(2,2) }; + +static glm::ivec2* dev_h_kernel_offsets = NULL; + void pathtraceInit(Scene *scene) { hst_scene = scene; const Camera &cam = hst_scene->state.camera; @@ -100,6 +141,12 @@ void pathtraceInit(Scene *scene) { cudaMalloc(&dev_image, pixelcount * sizeof(glm::vec3)); cudaMemset(dev_image, 0, pixelcount * sizeof(glm::vec3)); + cudaMalloc(&dev_image_denoised_a, pixelcount * sizeof(glm::vec3)); + cudaMemset(dev_image_denoised_a, 0, pixelcount * sizeof(glm::vec3)); + + cudaMalloc(&dev_image_denoised_b, pixelcount * sizeof(glm::vec3)); + cudaMemset(dev_image_denoised_b, 0, pixelcount * sizeof(glm::vec3)); + cudaMalloc(&dev_paths, pixelcount * sizeof(PathSegment)); cudaMalloc(&dev_geoms, scene->geoms.size() * sizeof(Geom)); @@ -113,6 +160,12 @@ void pathtraceInit(Scene *scene) { cudaMalloc(&dev_gBuffer, pixelcount * sizeof(GBufferPixel)); + cudaMalloc(&dev_h_kernel, 25 * sizeof(float)); + cudaMemcpy(dev_h_kernel, h_kernel, 25 * sizeof(float), cudaMemcpyHostToDevice); + + cudaMalloc(&dev_h_kernel_offsets, 25 * sizeof(glm::ivec2)); + cudaMemcpy(dev_h_kernel_offsets, h_kernel_offsets, 25 * sizeof(glm::ivec2), cudaMemcpyHostToDevice); + // TODO: initialize any extra device memeory you need checkCUDAError("pathtraceInit"); @@ -120,11 +173,15 @@ void pathtraceInit(Scene *scene) { void pathtraceFree() { cudaFree(dev_image); // no-op if dev_image is null + cudaFree(dev_image_denoised_a); + cudaFree(dev_image_denoised_b); cudaFree(dev_paths); cudaFree(dev_geoms); cudaFree(dev_materials); cudaFree(dev_intersections); cudaFree(dev_gBuffer); + cudaFree(dev_h_kernel); + cudaFree(dev_h_kernel_offsets); // TODO: clean up any extra device memory you created checkCUDAError("pathtraceFree"); @@ -282,6 +339,8 @@ __global__ void generateGBuffer ( if (idx < num_paths) { gBuffer[idx].t = shadeableIntersections[idx].t; + gBuffer[idx].nor = shadeableIntersections[idx].surfaceNormal; + gBuffer[idx].pos = pathSegments[idx].ray.origin + pathSegments[idx].ray.direction * shadeableIntersections[idx].t; } } @@ -417,6 +476,8 @@ void showGBuffer(uchar4* pbo) { (cam.resolution.x + blockSize2d.x - 1) / blockSize2d.x, (cam.resolution.y + blockSize2d.y - 1) / blockSize2d.y); + + // CHECKITOUT: process the gbuffer results and send them to OpenGL buffer for visualization gbufferToPBO<<>>(pbo, cam.resolution, dev_gBuffer); } @@ -431,3 +492,153 @@ const Camera &cam = hst_scene->state.camera; // Send results to OpenGL buffer for rendering sendImageToPBO<<>>(pbo, cam.resolution, iter, dev_image); } + +#ifdef WAVELET + +// implements the a trous (ratatouille) filter +__global__ void denoiseRatatouille(glm::ivec2 resolution, int iter, float* h_kernel, glm::ivec2* h_kernel_offsets, int iter_offset, float sigma_p_sq, float sigma_n_sq, float sigma_rt_sq, GBufferPixel* gBuffer, glm::vec3* image, glm::vec3* image_denoised) { + int x = (blockIdx.x * blockDim.x) + threadIdx.x; + int y = (blockIdx.y * blockDim.y) + threadIdx.y; + + if (x < resolution.x && y < resolution.y) { + glm::vec3 color_accum = glm::vec3(0, 0, 0); + int x_image; + int y_image; + int kernel_index; + glm::ivec2 kernel_offset; + float kernel_val; + + glm::vec3 input_color = image[x + (y * resolution.x)]; + GBufferPixel input_gBuffer = gBuffer[x + (y * resolution.x)]; + glm::vec3 buffer_diff; + float squared_distance; + float w_rt; + float w_n; + float w_p; + float total_weight; + float w_accum = 0.0f; + + + for (int i = 0; i < 25; ++i) { + kernel_offset = h_kernel_offsets[i] * iter_offset; + kernel_val = h_kernel[i]; + + x_image = glm::clamp(x + kernel_offset.x, 0, resolution.x - 1); + y_image = glm::clamp(y + kernel_offset.y, 0, resolution.y - 1); + + buffer_diff = input_gBuffer.pos - gBuffer[x_image + (y_image * resolution.x)].pos; + squared_distance = glm::dot(buffer_diff, buffer_diff); + w_p = glm::min(glm::exp(-(squared_distance) / sigma_p_sq), 1.0f); + + buffer_diff = input_gBuffer.nor - gBuffer[x_image + (y_image * resolution.x)].nor; + squared_distance = glm::max(glm::dot(buffer_diff, buffer_diff) / ((float)(iter_offset * iter_offset)), 0.0f); + w_n = glm::min(glm::exp(-(squared_distance) / sigma_n_sq), 1.0f); + + buffer_diff = input_color - image[x_image + (y_image * resolution.x)]; + squared_distance = glm::dot(buffer_diff, buffer_diff); + w_rt = glm::min(glm::exp(-(squared_distance) / sigma_rt_sq), 1.0f); + + + + total_weight = w_p * w_n * w_rt; + + color_accum += total_weight * kernel_val * image[x_image + (y_image * resolution.x)]; + w_accum += total_weight * kernel_val; + } + + image_denoised[x + (y * resolution.x)] = color_accum / w_accum; + } +} + + +void denoiseAndShowImage(uchar4* pbo, int iter, DenoiseParams denoise_params) { + if (denoise_params.denoise) { + const Camera& cam = hst_scene->state.camera; + const dim3 blockSize2d(8, 8); + const dim3 blocksPerGrid2d( + (cam.resolution.x + blockSize2d.x - 1) / blockSize2d.x, + (cam.resolution.y + blockSize2d.y - 1) / blockSize2d.y); + + + + cudaMemcpy(dev_image_denoised_a, dev_image, cam.resolution.x * cam.resolution.y * sizeof(glm::vec3), cudaMemcpyDeviceToDevice); + + int iter_offset = 1; + + for (int i = 0; i < denoise_params.denoise_iterations; ++i) { + denoiseRatatouille << > > (cam.resolution, iter, dev_h_kernel, dev_h_kernel_offsets, iter_offset, denoise_params.sigma_p * denoise_params.sigma_p, denoise_params.sigma_n * denoise_params.sigma_n, denoise_params.sigma_rt * denoise_params.sigma_rt, dev_gBuffer, dev_image_denoised_a, dev_image_denoised_b); + if (i < denoise_params.denoise_iterations - 1) { + glm::vec3* temp = dev_image_denoised_b; + dev_image_denoised_b = dev_image_denoised_a; + dev_image_denoised_a = temp; + } + iter_offset *= 2; + denoise_params.sigma_rt /= (float)iter_offset; + } + + + // Send results to OpenGL buffer for rendering + sendImageToPBO << > > (pbo, cam.resolution, iter, dev_image_denoised_b); + + } + +} + +#else +// implements the a trous (ratatouille) filter +__global__ void denoiseRatatouille(glm::ivec2 resolution, int iter, float* h_kernel, glm::ivec2* h_kernel_offsets, int iter_offset, GBufferPixel* gBuffer, glm::vec3* image, glm::vec3* image_denoised) { + int x = (blockIdx.x * blockDim.x) + threadIdx.x; + int y = (blockIdx.y * blockDim.y) + threadIdx.y; + + if (x < resolution.x && y < resolution.y) { + glm::vec3 color_accum = glm::vec3(0, 0, 0); + int x_image; + int y_image; + int kernel_index; + glm::ivec2 kernel_offset; +#pragma unroll + for (int i = 0; i < 25; ++i) { + kernel_offset = h_kernel_offsets[i] * iter_offset; + + x_image = glm::clamp(x + kernel_offset.x, 0, resolution.x - 1); + y_image = glm::clamp(y + kernel_offset.y, 0, resolution.y - 1); + color_accum += h_kernel[i] * image[x_image + (y_image * resolution.x)]; + } + + image_denoised[x + (y * resolution.x)] = color_accum; + } +} + +void denoiseAndShowImage(uchar4* pbo, int iter, DenoiseParams denoise_params) { + if (denoise_params.denoise) { + const Camera& cam = hst_scene->state.camera; + const dim3 blockSize2d(8, 8); + const dim3 blocksPerGrid2d( + (cam.resolution.x + blockSize2d.x - 1) / blockSize2d.x, + (cam.resolution.y + blockSize2d.y - 1) / blockSize2d.y); + + PerformanceTimer perf_timer; + perf_timer.startGpuTimer(); + cudaMemcpy(dev_image_denoised_a, dev_image, cam.resolution.x * cam.resolution.y * sizeof(glm::vec3), cudaMemcpyDeviceToDevice); + + int iter_offset = 1; + for (int i = 0; i < denoise_params.denoise_iterations; ++i) { + denoiseRatatouille << > > (cam.resolution, iter, dev_h_kernel, dev_h_kernel_offsets, iter_offset, dev_gBuffer, dev_image_denoised_a, dev_image_denoised_b); + if (i < denoise_params.denoise_iterations - 1) { + glm::vec3* temp = dev_image_denoised_b; + dev_image_denoised_b = dev_image_denoised_a; + dev_image_denoised_a = temp; + } + iter_offset *= 2; + + } + + + // Send results to OpenGL buffer for rendering + sendImageToPBO << > > (pbo, cam.resolution, iter, dev_image_denoised_b); + perf_timer.endGpuTimer(); + std::cout << perf_timer.getGpuElapsedTimeForPreviousOperation() << std::endl; + } + +} +#endif diff --git a/src/pathtrace.h b/src/pathtrace.h index 9e12f440..5ca547d3 100644 --- a/src/pathtrace.h +++ b/src/pathtrace.h @@ -2,9 +2,112 @@ #include #include "scene.h" +#include + +struct DenoiseParams { + bool denoise; + float sigma_p; + float sigma_n; + float sigma_rt; + int denoise_iterations; +}; void pathtraceInit(Scene *scene); void pathtraceFree(); void pathtrace(int frame, int iteration); void showGBuffer(uchar4 *pbo); void showImage(uchar4 *pbo, int iter); +void denoiseAndShowImage(uchar4* pbo, int iter, DenoiseParams denoise_params); + +/** + * This class is used for timing the performance + * Uncopyable and unmovable + * + * Adapted from WindyDarian(https://github.com/WindyDarian) + */ + +class PerformanceTimer +{ +public: + PerformanceTimer() + { + cudaEventCreate(&event_start); + cudaEventCreate(&event_end); + } + + ~PerformanceTimer() + { + cudaEventDestroy(event_start); + cudaEventDestroy(event_end); + } + + void startCpuTimer() + { + if (cpu_timer_started) { throw std::runtime_error("CPU timer already started"); } + cpu_timer_started = true; + + time_start_cpu = std::chrono::high_resolution_clock::now(); + } + + void endCpuTimer() + { + time_end_cpu = std::chrono::high_resolution_clock::now(); + + if (!cpu_timer_started) { throw std::runtime_error("CPU timer not started"); } + + std::chrono::duration duro = time_end_cpu - time_start_cpu; + prev_elapsed_time_cpu_milliseconds = + static_cast(duro.count()); + + cpu_timer_started = false; + } + + void startGpuTimer() + { + if (gpu_timer_started) { throw std::runtime_error("GPU timer already started"); } + gpu_timer_started = true; + + cudaEventRecord(event_start); + } + + void endGpuTimer() + { + cudaEventRecord(event_end); + cudaEventSynchronize(event_end); + + if (!gpu_timer_started) { throw std::runtime_error("GPU timer not started"); } + + cudaEventElapsedTime(&prev_elapsed_time_gpu_milliseconds, event_start, event_end); + gpu_timer_started = false; + } + + float getCpuElapsedTimeForPreviousOperation() //noexcept //(damn I need VS 2015 + { + return prev_elapsed_time_cpu_milliseconds; + } + + float getGpuElapsedTimeForPreviousOperation() //noexcept + { + return prev_elapsed_time_gpu_milliseconds; + } + + // remove copy and move functions + PerformanceTimer(const PerformanceTimer&) = delete; + PerformanceTimer(PerformanceTimer&&) = delete; + PerformanceTimer& operator=(const PerformanceTimer&) = delete; + PerformanceTimer& operator=(PerformanceTimer&&) = delete; + +private: + cudaEvent_t event_start = nullptr; + cudaEvent_t event_end = nullptr; + + using time_point_t = std::chrono::high_resolution_clock::time_point; + time_point_t time_start_cpu; + time_point_t time_end_cpu; + + bool cpu_timer_started = false; + bool gpu_timer_started = false; + + float prev_elapsed_time_cpu_milliseconds = 0.f; + float prev_elapsed_time_gpu_milliseconds = 0.f; +}; diff --git a/src/preview.cpp b/src/preview.cpp index 3ca27180..713c0041 100644 --- a/src/preview.cpp +++ b/src/preview.cpp @@ -214,10 +214,10 @@ void drawGui(int windowWidth, int windowHeight) { ImGui::Checkbox("Denoise", &ui_denoise); - ImGui::SliderInt("Filter Size", &ui_filterSize, 0, 100); - ImGui::SliderFloat("Color Weight", &ui_colorWeight, 0.0f, 10.0f); - ImGui::SliderFloat("Normal Weight", &ui_normalWeight, 0.0f, 10.0f); - ImGui::SliderFloat("Position Weight", &ui_positionWeight, 0.0f, 10.0f); + ImGui::SliderInt("Filter Iterations", &ui_filterIterations, 1, 10); + ImGui::SliderFloat("Color Weight", &ui_colorWeight, 0.0f, 150.0f); + ImGui::SliderFloat("Normal Weight", &ui_normalWeight, 0.0f, 15.0f); + ImGui::SliderFloat("Position Weight", &ui_positionWeight, 0.0f, 15.0f); ImGui::Separator(); diff --git a/src/sceneStructs.h b/src/sceneStructs.h index da7e558a..2b3e574a 100644 --- a/src/sceneStructs.h +++ b/src/sceneStructs.h @@ -79,4 +79,6 @@ struct ShadeableIntersection { // What information might be helpful for guiding a denoising filter? struct GBufferPixel { float t; + glm::vec3 pos; + glm::vec3 nor; };