Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
50 changes: 45 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,11 +3,51 @@ CUDA Denoiser For CUDA Path Tracer

**University of Pennsylvania, CIS 565: GPU Programming and Architecture, Project 4**

* (TODO) YOUR NAME HERE
* Tested on: (TODO) Windows 22, i7-2222 @ 2.22GHz 22GB, GTX 222 222MB (Moore 2222 Lab)
* Ryan Tong
* [LinkedIn](https://www.linkedin.com/in/ryanctong/), [personal website](), [twitter](), etc.
* Tested on: Windows 10, i7-8750H @ 2.20GHz 16GB, GeForce GTX 1060 6144MB (Personal Laptop)

### (TODO: Your README)
![Denoiser](img/title.png)

*DO NOT* leave the README to the last minute! It is a crucial part of the
project, and we will not be able to grade you without a good README.
### Project Description
This project uses an A-Trous filter to denoise a pathtraced image at an early iteration stage to reduce the number of iterations needed to generate an acceptably smooth image. The paper followed was: https://jo.dreggn.org/home/2010_atrous.pdf

### Performance
To generate an “acceptably smooth” image as shown above, I found that the denoiser takes about ~13ms. Note that this image was generated using 800x800 resolution and filter size of 40.

Compared to using many thousands of iterations to generate an acceptably smooth image with only 10 iterations with denoising as opposed to 500 iterations without denoising.

### Denoised vs Not
![Denoised](img/title.png)
![Reference](img/reference.png)

### Performance Analysis
2 parameters that affect runtime of the denoiser are resolution and filter size. For resolution, this makes sense because resolution determines the total number of operations needed to be performed. Specifically, the higher the resolution, the more threads need to be launched since one denoising thread is launched per pixel. Similarly, increasing filter size increases runtime because it increases the number of iterations of increasing the filter size. Specifically, we start at a size of 5x5 and increase the step width by a factor of 2 until we reach the desired filter size. We can see these affects reflected in the data below:

### Resolution Performance Impact
![Denoiser](img/resolution.png)

### Filter Size Performance Impact
![Denoiser](img/filter.png)

### Filter Size Visual Analysis
Here are some images comparing the effects of filter size. As you can see, filter values that are too small are ineffective and filter sizes that are too big make the image too blurry. This makes sense because smaller filters do not take into account enough of the neighboring pixels to be effective and larger filters take into account too many neighboring pixels.
### Different Filter Size Visualization
![5x5](img/five.png)
![15x15](img/fifteen.png)
![45x45](img/fourtyfive.png)
![80x80](img/eighty.png)
![100x100](img/hundred.png)

### Material Type
The material type also determines the effectiveness of this technique. Diffuse materials work best while specular is worse since the reflections are blurred as well.
### Different Material Visualization
![diffuse](img/diffuse.png)
![specular](img/title.png)

### Scene Type
The amount of light also determines the effectiveness of this technique. Specifically, brighter scenes work better since there is less complexity in the lighting and more uniformity. As you can see in the darker Cornell box scene, there are more dark splotches that are due to the fact that there is a large change from light to dark that the filter is unable to smooth.
### Different Lighting Visualization
![dark](img/dark.png)
![light](img/title.png)

Binary file added img/dark.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added img/diffuse.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added img/eighty.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added img/fifteen.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added img/filter.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added img/five.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added img/fourtyfive.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added img/hundred.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added img/reference.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added img/resolution.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added img/title.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2 changes: 1 addition & 1 deletion scenes/cornell.txt
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,7 @@ EMITTANCE 0
CAMERA
RES 800 800
FOVY 45
ITERATIONS 5000
ITERATIONS 10
DEPTH 8
FILE cornell
EYE 0.0 5 10.5
Expand Down
6 changes: 5 additions & 1 deletion src/main.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -167,7 +167,11 @@ void runCuda() {

if (ui_showGbuffer) {
showGBuffer(pbo_dptr);
} else {
}
else if (ui_denoise) {
showDenoise(pbo_dptr, iteration, ui_filterSize, ui_colorWeight, ui_normalWeight, ui_positionWeight);
}
else {
showImage(pbo_dptr, iteration);
}

Expand Down
212 changes: 171 additions & 41 deletions src/pathtrace.cu
Original file line number Diff line number Diff line change
Expand Up @@ -67,18 +67,23 @@ __global__ void sendImageToPBO(uchar4* pbo, glm::ivec2 resolution,
}
}

// TODO Modify this so that we can viz different parts of the gbuffer
__global__ void gbufferToPBO(uchar4* pbo, glm::ivec2 resolution, GBufferPixel* gBuffer) {
int x = (blockIdx.x * blockDim.x) + threadIdx.x;
int y = (blockIdx.y * blockDim.y) + threadIdx.y;

if (x < resolution.x && y < resolution.y) {
int index = x + (y * resolution.x);
float timeToIntersect = gBuffer[index].t * 256.0;

pbo[index].w = 0;
pbo[index].x = timeToIntersect;
pbo[index].y = timeToIntersect;
pbo[index].z = timeToIntersect;
//float timeToIntersect = gBuffer[index].t * 256.0;
//glm::vec3 viz = (gBuffer[index].normal + glm::vec3(1.0)) / glm::vec3(2.0) * glm::vec3(255.0);
if (gBuffer[index].t > 0) {
float position_range = 25.f;
glm::vec3 viz = (glm::clamp(gBuffer[index].position, glm::vec3(-position_range), glm::vec3(position_range)) + position_range) / (position_range * 2.f) * 255.f;
pbo[index].w = 0;
pbo[index].x = viz.r;
pbo[index].y = viz.g;
pbo[index].z = viz.b;
}
}
}

Expand All @@ -92,6 +97,29 @@ static GBufferPixel* dev_gBuffer = NULL;
// TODO: static variables for device memory, any extra info you need, etc
// ...

static float* dev_filter = NULL;
static glm::vec2* dev_offsets = NULL;
// Kernel/Filter from https://www.eso.org/sci/software/esomidas/doc/user/18NOV/volb/node317.html
const float filter[25] = { 1.0 / 256.0, 1.0 / 64.0, 3.0 / 128.0, 1.0 / 64.0, 1.0 / 256.0,
1.0 / 64.0, 1.0 / 16.0, 3.0 / 32.0, 1.0 / 16.0, 1.0 / 64.0,
3.0 / 128.0, 3.0 / 32.0, 9.0 / 64.0, 3.0 / 32.0, 3.0 / 128.0,
1.0 / 64.0, 1.0 / 16.0, 3.0 / 32.0, 1.0 / 16.0, 1.0 / 64.0,
1.0 / 256.0, 1.0 / 64.0, 3.0 / 128.0, 1.0 / 64.0, 1.0 / 256.0,
};
// Offsets (x, y)
const glm::vec2 offsets[25] = { glm::vec2(-2, -2), glm::vec2(-1, -2), glm::vec2(0, -2), glm::vec2(1, -2), glm::vec2(2, -2),
glm::vec2(-2, -1), glm::vec2(-1, -1), glm::vec2(0, -1), glm::vec2(1, -1), glm::vec2(2, -1),
glm::vec2(-2, 0), glm::vec2(-1, 0), glm::vec2(0, 0), glm::vec2(1, 0), glm::vec2(2, 0),
glm::vec2(-2, 1), glm::vec2(-1, 1), glm::vec2(0, 1), glm::vec2(1, 1), glm::vec2(2, 1),
glm::vec2(-2, 2), glm::vec2(-1, 2), glm::vec2(0, 2), glm::vec2(1, 2), glm::vec2(2, 2),
};
// Temp denoise output buffer for ping ponging
static glm::vec3* dev_denoise_in = NULL;
static glm::vec3* dev_denoise_out = NULL;
// Stuff for timing
static cudaEvent_t startTime = NULL;
static cudaEvent_t endTime = NULL;

void pathtraceInit(Scene *scene) {
hst_scene = scene;
const Camera &cam = hst_scene->state.camera;
Expand All @@ -114,7 +142,17 @@ void pathtraceInit(Scene *scene) {
cudaMalloc(&dev_gBuffer, pixelcount * sizeof(GBufferPixel));

// TODO: initialize any extra device memeory you need
cudaMalloc(&dev_filter, 25 * sizeof(float));
cudaMemcpy(dev_filter, &filter, 25 * sizeof(float), cudaMemcpyHostToDevice);

cudaMalloc(&dev_offsets, 25 * sizeof(glm::vec2));
cudaMemcpy(dev_offsets, &offsets, 25 * sizeof(glm::vec2), cudaMemcpyHostToDevice);

cudaMalloc(&dev_denoise_in, pixelcount * sizeof(glm::vec3));
cudaMalloc(&dev_denoise_out, pixelcount * sizeof(glm::vec3));

cudaEventCreate(&startTime);
cudaEventCreate(&endTime);
checkCUDAError("pathtraceInit");
}

Expand All @@ -126,7 +164,17 @@ void pathtraceFree() {
cudaFree(dev_intersections);
cudaFree(dev_gBuffer);
// TODO: clean up any extra device memory you created
cudaFree(dev_filter);
cudaFree(dev_offsets);
cudaFree(dev_denoise_in);
cudaFree(dev_denoise_out);

if (startTime != NULL) {
cudaEventDestroy(startTime);
}
if (endTime != NULL) {
cudaEventDestroy(endTime);
}
checkCUDAError("pathtraceFree");
}

Expand All @@ -148,7 +196,7 @@ __global__ void generateRayFromCamera(Camera cam, int iter, int traceDepth, Path
PathSegment & segment = pathSegments[index];

segment.ray.origin = cam.position;
segment.color = glm::vec3(1.0f, 1.0f, 1.0f);
segment.color = glm::vec3(1.0f, 1.0f, 1.0f);

segment.ray.direction = glm::normalize(cam.view
- cam.right * cam.pixelLength.x * ((float)x - (float)cam.resolution.x * 0.5f)
Expand Down Expand Up @@ -273,15 +321,19 @@ __global__ void shadeSimpleMaterials (
}
}


// TODO ADD NORMALS, XYZ to this
__global__ void generateGBuffer (
int num_paths,
ShadeableIntersection* shadeableIntersections,
PathSegment* pathSegments,
PathSegment* pathSegments,
GBufferPixel* gBuffer) {
int idx = blockIdx.x * blockDim.x + threadIdx.x;
if (idx < num_paths)
{
gBuffer[idx].t = shadeableIntersections[idx].t;
gBuffer[idx].normal = shadeableIntersections[idx].surfaceNormal;
gBuffer[idx].position = shadeableIntersections[idx].t * pathSegments[idx].ray.direction + pathSegments[idx].ray.origin;
}
}

Expand Down Expand Up @@ -356,46 +408,46 @@ void pathtrace(int frame, int iter) {
// --- PathSegment Tracing Stage ---
// Shoot ray into scene, bounce between objects, push shading chunks

// Empty gbuffer
cudaMemset(dev_gBuffer, 0, pixelcount * sizeof(GBufferPixel));
// Empty gbuffer
cudaMemset(dev_gBuffer, 0, pixelcount * sizeof(GBufferPixel));

// clean shading chunks
cudaMemset(dev_intersections, 0, pixelcount * sizeof(ShadeableIntersection));

bool iterationComplete = false;
bool iterationComplete = false;
while (!iterationComplete) {

// tracing
dim3 numblocksPathSegmentTracing = (num_paths + blockSize1d - 1) / blockSize1d;
computeIntersections <<<numblocksPathSegmentTracing, blockSize1d>>> (
depth
, num_paths
, dev_paths
, dev_geoms
, hst_scene->geoms.size()
, dev_intersections
);
checkCUDAError("trace one bounce");
cudaDeviceSynchronize();

if (depth == 0) {
generateGBuffer<<<numblocksPathSegmentTracing, blockSize1d>>>(num_paths, dev_intersections, dev_paths, dev_gBuffer);
}

depth++;

shadeSimpleMaterials<<<numblocksPathSegmentTracing, blockSize1d>>> (
iter,
num_paths,
dev_intersections,
dev_paths,
dev_materials
);
iterationComplete = depth == traceDepth;
// tracing
dim3 numblocksPathSegmentTracing = (num_paths + blockSize1d - 1) / blockSize1d;
computeIntersections <<<numblocksPathSegmentTracing, blockSize1d>>> (
depth
, num_paths
, dev_paths
, dev_geoms
, hst_scene->geoms.size()
, dev_intersections
);
checkCUDAError("trace one bounce");
cudaDeviceSynchronize();

if (depth == 0) {
generateGBuffer<<<numblocksPathSegmentTracing, blockSize1d>>>(num_paths, dev_intersections, dev_paths, dev_gBuffer);
}

depth++;

shadeSimpleMaterials<<<numblocksPathSegmentTracing, blockSize1d>>> (
iter,
num_paths,
dev_intersections,
dev_paths,
dev_materials
);
iterationComplete = depth == traceDepth;
}

// Assemble this iteration and apply it to the image
dim3 numBlocksPixels = (pixelcount + blockSize1d - 1) / blockSize1d;
// Assemble this iteration and apply it to the image
dim3 numBlocksPixels = (pixelcount + blockSize1d - 1) / blockSize1d;
finalGather<<<numBlocksPixels, blockSize1d>>>(num_paths, dev_image, dev_paths);

///////////////////////////////////////////////////////////////////////////
Expand All @@ -422,7 +474,7 @@ void showGBuffer(uchar4* pbo) {
}

void showImage(uchar4* pbo, int iter) {
const Camera &cam = hst_scene->state.camera;
const Camera &cam = hst_scene->state.camera;
const dim3 blockSize2d(8, 8);
const dim3 blocksPerGrid2d(
(cam.resolution.x + blockSize2d.x - 1) / blockSize2d.x,
Expand All @@ -431,3 +483,81 @@ const Camera &cam = hst_scene->state.camera;
// Send results to OpenGL buffer for rendering
sendImageToPBO<<<blocksPerGrid2d, blockSize2d>>>(pbo, cam.resolution, iter, dev_image);
}

__global__ void denoise(glm::vec3* dev_imageIn, glm::vec3* dev_imageOut, const int stepWidth, const glm::vec2 resolution,
const glm::vec2* dev_offsets, const float* dev_filter, const float colorSigma, const float normalSigma,
const float positionSigma, const GBufferPixel* dev_gBuffer) {
int x = (blockIdx.x * blockDim.x) + threadIdx.x;
int y = (blockIdx.y * blockDim.y) + threadIdx.y;

if (x < resolution.x && y < resolution.y) {
int originalIndex = x + (y * resolution.x);

//Center point values (current pixel)
glm::vec3 originalColor = dev_imageIn[originalIndex];
glm::vec3 originalNorm = dev_gBuffer[originalIndex].normal;
glm::vec3 originalPos = dev_gBuffer[originalIndex].position;

glm::vec3 sum = glm::vec3(0.0);
float cumW = 0.0;

for (int i = 0; i < 25; ++i) { // Get neighbors
glm::vec2 neighbor_offset = dev_offsets[i] * glm::vec2(stepWidth);
int neighborX = x + neighbor_offset.x;
int neighborY = y + neighbor_offset.y;
if (neighborX >= 0 && neighborX < resolution.x && neighborY >= 0 && neighborY < resolution.y) { // check bounds of image
int neighborIndex = neighborX + (neighborY * resolution.x);

glm::vec3 color = dev_imageIn[neighborIndex];
float colorWeight = min(exp(-(glm::length2(originalColor - color)) / colorSigma), 1.f);

glm::vec3 norm = dev_gBuffer[neighborIndex].normal;
float normWeight = min(exp(-(max(glm::length2(originalNorm - norm) / (stepWidth * stepWidth), 0.f) / normalSigma)), 1.f);

glm::vec3 pos = dev_gBuffer[neighborIndex].position;
float posWeight = min(exp(-(glm::length2(originalPos - norm) / positionSigma)), 1.f);

float weight = colorWeight * normWeight * posWeight;
sum += color * weight * dev_filter[i];
cumW += weight * dev_filter[i];
//blurred_pix += dev_filter[i] * dev_imageIn[neighbor_index];
}
}
dev_imageOut[originalIndex] = sum / cumW;
}
}

void showDenoise(uchar4* pbo, int iter, const int filterSize, const float colorSigma, const float normalSigma, const float positionSigma) {
const Camera& cam = hst_scene->state.camera;
const dim3 blockSize2d(8, 8);
const dim3 blocksPerGrid2d(
(cam.resolution.x + blockSize2d.x - 1) / blockSize2d.x,
(cam.resolution.y + blockSize2d.y - 1) / blockSize2d.y);
const int pixelcount = cam.resolution.x * cam.resolution.y;

// Copy image to denoise buffer so it doesnt affect orignial image
cudaMemcpy(dev_denoise_in, dev_image, pixelcount * sizeof(glm::vec3), cudaMemcpyDeviceToDevice);
const float squaredColor = pow(colorSigma, 2);
const float squaredNormal = pow(normalSigma * .1, 2);
const float squaredPos = pow(positionSigma, 2);
int i = 0;
float time;
cudaEventRecord(startTime);
while (4 * (1 << i) + 1 < filterSize) { // Multiple iterations of denoising
int stepWidth = 1 << i;
denoise << <blocksPerGrid2d, blockSize2d >> > (dev_denoise_in, dev_denoise_out, stepWidth,
cam.resolution, dev_offsets, dev_filter,
squaredColor, squaredNormal, squaredPos, dev_gBuffer);
cudaDeviceSynchronize();
//Ping pong buffers
glm::vec3* temp = dev_denoise_in;
dev_denoise_in = dev_denoise_out;
dev_denoise_out = temp;
++i;
}
cudaEventRecord(endTime);
cudaEventSynchronize(endTime);
cudaEventElapsedTime(&time, startTime, endTime);
std::cout << "Time denoise: " << time << std::endl;
sendImageToPBO << <blocksPerGrid2d, blockSize2d >> > (pbo, cam.resolution, iter, dev_denoise_in);
}
1 change: 1 addition & 0 deletions src/pathtrace.h
Original file line number Diff line number Diff line change
Expand Up @@ -8,3 +8,4 @@ void pathtraceFree();
void pathtrace(int frame, int iteration);
void showGBuffer(uchar4 *pbo);
void showImage(uchar4 *pbo, int iter);
void showDenoise(uchar4* pbo, int iter, const int filterSize, const float colorSigma, const float normalSigma, const float positionSigma);
3 changes: 3 additions & 0 deletions src/sceneStructs.h
Original file line number Diff line number Diff line change
Expand Up @@ -77,6 +77,9 @@ struct ShadeableIntersection {

// CHECKITOUT - a simple struct for storing scene geometry information per-pixel.
// What information might be helpful for guiding a denoising filter?
// Need to store normal, position of intersection
struct GBufferPixel {
float t;
glm::vec3 normal;
glm::vec3 position;
};