![]() ![]() In real hardware, performance would be the same - and, if the previous approach caused the cache to thrash, the performance of this version would be better. Note: This animation takes longer than the last one because the image is updated after each tile has processed a triangle the previous animation updated only after each triangle was completely rendered and during transfers between tiles an memory. However, we can see the difference - rendering for a triangle completes in one cached tile before moving to the next. Since in our simple example the cache can cover the entire framebuffer width, this approach doesn't reduce the number of memory transfers that are performed. ![]() We can solve this problem by changing the order in which the pixels within a triangle are rasterized: we can draw all the pixels that the triangle covers within one tile before moving on to the next tile. One problem with the technique we've shown so far is that a large triangle might thrash the cache if drawn in simple top-to-bottom order, since each horizontal line on the screen might cover more tiles than can fit in the cache. In a real-world situation, the framebuffer would likely be larger relative to the cached tiles. The cache lines hold the same number of pixels as for the linear cache in the previous example. The cached framebuffer area is now shown above the corresponding cached depth buffer area. This time the four cache lines (spaced out horizontally) each cover a square area in the framebuffer and depth buffer. This example is simplified - actual hardware may use more complex mappings between pixels and memory in order to further improve locality of reference. With square cache areas that are the same size as a linear cache, more rendering happens within the cache, and transfers to memory are less frequent - we've reduced external memory bandwidth! A similar technique is often used in texture storage, since the reading of texture values similarly shows spatial locality of reference. Triangles that are near to each other in space are often submitted near each other in time (in this example, each "spike" of the object is drawn before moving on to the next), so better grouping of the cache area results in more cache hits. The first step towards reducing memory bandwidth is to treat each cache line as covering a two-dimensional rectangular area (a "tile") in memory. Framebuffer pixels corresponding to "dirty" cache lines are shown in magenta (framebuffer) and white (depth buffer). Above each cache line is a miniature rectangle showing where the pixels corresponding to the cache line fall in the framebuffer: red for "dirty" cache lines that have been written to, green for "clean" cache lines that still match memory, and brighter colors for cache lines that have been accessed more recently. In this diagram, four "cache lines" of consecutive image memory are shown above the image as it is rendered. IMRs cause memory to be accessed in an unpredictable order, determined by the way triangles are submitted. The next diagram shows that a large amount of memory is transferred during rasterization even with a simple cache for the framebuffer pixels and depth values. In an IMR, the graphics pipeline proceeds top-to-bottom for each primitive, accessing memory on a per-primitive basis.Ī naïve implementation of an immediate-mode renderer might use a large amount of memory bandwidth. Historically, desktop and console GPUs have behaved in roughly this way. Hardware which processes triangles immediately as they are submitted, as shown here, is known as an Immediate-Mode Renderer ("IMR"). These images, like others below, show the color framebuffer on the left and the corresponding depth buffer on the right. That is, rasterization happens as shown below: The traditional interface presented by a graphics API is that of submitting triangles in order, with the concept that the GPU renders each triangle in turn. Click on the button below an image to activate its animation. Please note: This article contains a number of animations. This article discusses tile-based rendering, the approach used by most mobile graphics hardware - and, increasingly, by desktop hardware. External memory bandwidth is costly in terms of space and power requirements, especially for mobile rendering. Modern graphics hardware requires a high amount of memory bandwidth as part of rendering operations. GPU Framebuffer Memory: Understanding Tiling
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |