Skip to content

What happens in a one frame of rendering

mika edited this page Mar 7, 2019 · 1 revision

CPU

  • Renderers: Unity goes through the scene’s renderer components and adds them to a list.
  • Culling: Unity does a pass at culling the renderers. Each mesh has a bounding box, that’s converted into an AABB per renderer, and Unity checks those against the frustum of the camera. Any renderers who’s bounds that aren’t visible are skipped and removed from the lists. It also compares the layer mask on the camera against the renderer’s game object and skips those that aren’t visible.
  • Gather Asset Data: Unity gets a list of all meshes, textures, and shaders that the remaining renderer components are referencing. Note, I’m not mentioning materials here, but those are really just ways to reference textures, shaders, and a list of extra variables.
  • Upload Asset Data: Unity checks if this is the first time any of them have been seen, and if so it uploads them to the GPU. In the case of the textures, it uploads the texture data and gets a handle to that texture. It then does the same for the two shaders, and the mesh data. The mesh data is actually multiple chunks of data. For a cube, that’s a struct array of data for each of the 24 vertices (position, UVs, color, etc), and an index array which would be 36 ints that index those 24 vertices, 3 for each triangle with some triangles sharing vertices. For a lot of this data, once it’s uploaded to the GPU, it’s deleted from the CPU memory.
  • Sorting: Next Unity is going to sort off of the renderer components. There’s several parts to this, render queue, materials, lighting, distance, etc. We’ll ignore most of those and just focus on distance, or “depth” sorting and transparent vs opaque sorting and say all materials are using one of those two queues. Opaque objects get put into one pile, then sorted front to back. That is the objects closer to the camera (based on a sphere that encapsulates the AABB) get sorted to the top of the list. Transparent objects get put into the other pile and sorted back to front; further away objects are sorted to the top of that list.
  • Setup Rendering: Here’s where the real meat begins. Unity already uploaded the asset data to the GPU, and has handles to them rather than direct references. So to draw something it just needs to say “hey, we want to use this asset now”. But before that we also need to setup some other data that isn’t going to change between objects. Mainly stuff like the camera’s world to view transform and projection matrices. Those will get shared by everything so that gets setup once and sent to the GPU. This also tells the GPU to set up the render target / frame buffer, or clear it from the previous frame.
  • Render Object: After the shared data is setup we tell the GPU the per object data for the first object on the list: It’s object to world transform matrix, any float / vector / color parameters the material had, what textures it needs and on which inputs, and what mesh asset to use. It also tells the GPU which shader to use, and sets up the effect state. Things like the blend mode, front or back face culling, ZTest, ZWrite, etc. Then it says “draw that”. Any values that don’t change from the previously rendered object are left alone and don’t need to be sent again.
  • Opaques: Repeat step 7 until we’re out of stuff in the opaque list.
  • Skybox: At this point Unity renders the skybox mesh with a special transform matrix so that it is effectively infinitely far away. Really this is no different than step 7, it just isn’t a “renderer” component.
  • Transparencies: Repeat step 7 again, but for all of the transparent list.
  • Tell the GPU we’re done, and display the results.

GPU

  • Assets: During step 4 of the CPU side, the GPU took the data uploaded to it and put it into it’s memory. That’s it. Technically the handle was returned by the CPU side graphics APIs, which got them from the CPU side graphics drivers, which told the GPU “here’s some data, put it at this memory address”. The GPU just took the data and responded with “ah-yup!”
  • Mutable Data: During CPU step 6, basically the same thing as GPU step 1 happens. “Ah-yup”. Same with CPU step 7, with some additional switches being flipped and instructions being handed out. Get some memory space ready to render into.
  • Draw Call: And then it begins. We draw one cube.
    • Vertex Shader: The GPU has multiple cores, somewhere between several tens to many thousands. In most modern GPUs these are general purpose, at least in terms of the kind of code that GPUs can run. So for something like a cube with only 24 vertices, all of these will get processed in parallel. Each vertex position is transformed from the “local” space (the data that’s stored in the GPU’s vertex data struct array) to homogeneous clip space. It also optionally modified or passes on any extra data, like the UVs and color, if the shader has code for that, as well as take and process any other material properties it might need for this stage.
    • Rasterization: Next the GPU goes through the index array and calculates the screen coverage for each triangle based on the vertex positions. This is done in the order they exist in the array and not in parallel. If a triangle doesn’t cover any pixels on screen, it is skipped. If it has ZTest it’ll test against the depth buffer and skip pixels that aren’t visible. If ZWrite is on, it’ll update the depth buffer with the new depth. Then it’ll start drawing the pixels for that triangle. For each pixel it needs to calculate the barycentric coordinate, and interpolate the vertex data to pass on to that pixel’s fragment shader.
    • Fragment Shader: Pixels on screen are processed in batches. At the smallest, this is in groups of 2x2 pixels calculated in parallel. But that may also be as part of larger batches of 8x8 pixels or 16x8 pixels or other size groups. These all run in parallel, per triangle. If a single pixel is visible in one of these larger groups, the other cores may be completely idle, or calculating values for pixels outside the triangle’s coverage or otherwise hidden. The fragment shader takes the interpolated data for it’s position, as well as any of the material properties and a texture data, and puts it together to form a single color value. Textures are sampled with specialized hardware that handles fast decoding of the various texture formats and filtering and splits out a color value for the shader to use.
    • ROP: The values output by the fragment shader are blended into the frame buffer. For opaques, this is usually just a replacement of the current color value. For transparent objects this may be alpha blending or additive blending. The GPU has specialized hardware that handles this.
  • Repeat all parts of step 3 for every draw call, in the order the CPU sent them.
  • Send the final image in the frame buffer to the monitor.

Then the next frame starts and we do it all over. Note that all that texture, shader, and mesh data just sits there on the GPU and gets reused over and over. I’m skipping a bunch of things in there too, but that’s the gist.

Source: https://forum.unity.com/threads/trying-to-understand-the-rendering-steps.640438/#post-4291516