Evangelizing the Rays of Light


Video game graphics seem to really be taking off, don't they? Not too long ago, we were amazed by the transition from single colored blips to well... multi-colored blips. I still remember, as many probably do, back when the Commodores and the Ataris ruled the day. And then how much further things had come when the NES came forth -- we had real characters, who really felt like multi-dimensional characters. And yet further so into the 16-bit era, where you had that much better graphics to say... hey, Sonic the Hedgehog actually had attitude and personality... While it wasn't quite as big from a pure technical standpoint as compared to say, the step up from N64 to Gamecube, it exuded the feeling that it was an enormous difference.

Following through to today, where everything is 3-d using these monstrously huge chips that carry a dozen or more independent SIMD pipelines and just achieve obscene parallelism... it's a race of sheer brute force. Anyone who's seen the Unreal Engine 3.0 demos should have been pretty well floored the first time around. On the whole, UE3 technology isn't that complicated on the individual level. It's the sheer scale and extent to which it does damn near everything all at the same time so smoothly is what's amazing. It's one thing to be able to do normal mapping -- well, everybody does normal mapping now -- it's another thing entirely to do it on everything. Doom 3 first scared us spitless with its normal mapping demos, but it's reached a point where it's worth considering for anyone implementing a game engine. And that's why companies like, say, Monolith have dropped their long-standing Lithtech engine in favor of a brand new one for F.E.A.R. Similarly, there are now a whole host of new modeller plugins available to anybody (to purchase, that is) for the sake of normal map creation. All of a sudden, our first glimpse at full-scale normal mapping throughout an entire game world, is no longer the only or even the very best example.

Which is not to say that Doom3 is particularly bad or anything. It's just that the sheer complexity of its worlds and the extent to which things like normal maps, gloss maps, shadow volumes, etc. are all used limits the artwork as well. Ideally, we'd like to be completely free of all limits, wouldn't we? Well, that'll never happen. We do have to consider the limits of our hardware, and how much content we can really pour into a game. As it is right now, we're transitioning into a point where 256 MB of video memory is becoming more commonplace, and PCI-Express busses allow us to have twice the bandwidth of AGP at yet less latency. Has all that made a difference? Not really. Fact of the the matter is that games have so far been made with AGP 8x and 128 MB of video memory in mind. Even though you could say that 256 is commonplace now... it's really only commonplace on the store shelves. It's really not that high a percentage of people who have actually bought these cards. The enthusiasts and die-hard nuts who feel the need to upgrade every 4 months are about the only ones. Well, there's also the professional workstation cards, but that's a whole other matter. Quite simply, they are still a high added expense for little to no gain at this point in time. That of course, will change. Tim Sweeney for instance, is already talking about a point where a single scene or level in a game will have more than 2 GB of content to it. A concept like that is pretty crazy, but if every texture is 2048x2048, and every sound is 24-bit 192 kHz, and every character is in the 7000 poly range, and a level contains 30 or 40 characters... well, that's not impossible. Wasteful to an extent, but not impossible.

Well, it's pretty safe to say that 3 years from now, a single character will have more content to them than an entire game level had 3 years ago. What with all the normal/bump maps, displacement maps, gloss/specular level maps, reflection/refraction maps, most likely at some point spherical harmonics maps. Moreover, we play at high resolutions these days. I remember the original Quake, which I wouldn't dare play at over 320x240 on my old 486. Although, eventually that 486DX became a DX2 and I put in a VLB video card, so I was eventually able to handle it at 640x480. At the time, that seemed damn near miraculous. Nowadays, we scoff at anything short of 1024x768. This in turn means that the resolution of our content has to increase likewise. The exception is, of course, console gaming, where we hardly have any choice... but that too will change when HDTV becomes more common. Still, I think that's at least 2 console generations away. HDTV is too expensive, and for the most part, it only exists in unwieldy enormous units. We still can't pick up a little 19-incher at Walmart that happens to support HDTV yet, which means it's pretty well limited to the home setting in places where people have generally settled down and intend not to move any time soon. In any case, where we once had 250-polygon characters, we now have 2500-poly characters. Where we once used to limit ourselves to 128x128 or 256x256 textures, we now tend to use 1024x1024... and 2048 is just over the horizon. This is a lot of content.

So how do the graphics chips deal with this growth? The world of GPUs is pretty much about guys named nVidia and ATI and 3dLabs and Matrox, etc. comparing themselves in the john. It's about throwing more silicon at the problem and making everything bigger. That's basically it. The technology we have now all started with stuff SGI was working with ages ago. Aside from further programmability in two specific points in the pipeline, it really hasn't changed much. Sure some interesting little optimizations and tricks have been played here and there, and of course, compatibility with more than one API has been held in mind. This is simply the sorts of changes that allow the GPU manufacturers to play with marketing terms like "extreme pipelines". On the whole, though, it's been a matter of size. The other guys have 8 pipelines? We have 12! They've got 4 textures per pass? We've got 8! They run at 300 MHz? We run at 450! They've got 90 million transistors? We've got 120 million! And it just goes on and on. Well, that's how you increase your theoretical throughput and sell your chips. Big numbers make people think this is a better product.

Will this really go on forever like so? Well, the GPU manufacturers would like it to. But it really can't. No matter how hard you try, you can't just count on being able to make bigger and bigger chips at smaller and smaller process sizes so that you can infinitely carry it out. Eventually, your geometries will get just too big and your wafers just too small. And that's not even considering the fact that fabrication is not exactly a walk in the park. This is why most GPU manufacturers don't have their own fabs. Currently, CPU manufacturers are transitioning to 90-nm transistor geometries. Yields, power consumption, leakage, clock scaling, are all still somewhat questionable at this point, but we'll see. In turn, this means that GPUs are not going to 90-nm processes anytime soon. So the end result is that these 200-million odd transistors have to take up that much more space. Certainly if any CPU (for home consumer use) were anywhere as big as these GPUs are now, it would be ridiculously expensive, and suffer pretty poor yields. Sure you have the Athlon 64FX which is pretty big and carries about 100 million transistors. Fact is that about half of that is cache. And of course, cache is not as dense as logic, which in turn means that it takes up more than half of the space on the die. In the land of GPUs, we don't have the big caches. It's almost all logic transistors. Fortunately, the logic in a GPU is much simpler. It's not loaded with all nature of ISA translation, pipeline forwarding, out-of-order execution, register renaming, cache flushing, branch prediction, speculative prefetching, etc... They're pretty ordinary -- there just happens to be a lot of them. In theory, it's easier to make a GPU than a CPU. Or at least, easier to design, anyway. Problem is that when you have something that big, yields are low. And even if the yields are pretty high percentage-wise, they're pretty low in absolute production volume for a given number of wafers. And when you don't have your own fabs, your fab partner who's doing all this work for you is getting a cut. And then you have to deal with the fact that your product has to be superceded by the next generation pretty darn quickly (nVidia and their darn "Moore's Law Squared" thing). This translates into high costs passed on to the consumer.

So what does lie in the future? It's probably a lost cause, but I'm personally a raytracing nut. And raytracing hardware is a wondrous thing, but can it actually really happen? Well, I'm pretty certain that someone will at least try to put such a product on the market. But how competitive it is remains to be seen. The biggest problems in the way of raytracing hardware are finance and market acceptance. If even half of the millions of dollars poured by a single nVidia or ATI into making polygon rasterization hardware bigger and bigger and well... bigger... went into research on raytracing hardware, it would certainly be infinitely farther along than it is now. Right now small groups have samples and simulations of actual units that are all really small and perform reasonably well, but really only have enough power to do first-order raycasting or maybe one level of recursion. This leads to a lot of the artifacts that make regular Whitted raytracing look fake - sharp edged shadows, point lights, specular highlights in abundance, flawless reflective surfaces... Then of course, you have the fact that not only is there not much money going into it... but there would potentially be a lot of money going into crushing it. Unfortunately, the likes of nVidia and ATI have so much vested interest in polygon rasterization hardware, that not only do they intend never to move away from it, but they openly denounce raytracing hardware. This is especially true of nVidia. David Kirk seems to have a knack for getting emotional about arguments against raytracing (e.g. The infamous "pencils" argument at the SIGGRAPH discussion panel). Often times, watching him go against a raytracing supporter is like watching a Christian fundamentalist go up against an actual scientist. An engineer at ATI's research facility in Marlborough once told me that ATI pretty much intends to stick with standard rasterization hardware 'probably for the next 20 years or so'. Now what surprises me about this is not so much that they want to stick with rasterizer hardware, but that some company in the business of graphics accelerator chips is actually making a decision for 20 years down the road? Who in their right mind makes 20-year plans in this business? I'm sure there are plenty of good explanations for all of this. nVidia's David Kirk is certainly an intelligent and knowledgable fellow, and keeping his job in mind, he does have to support the business practices of the company for which he works. It's just that he gets emotional and irrational if you so much as mention the word "raytracing" to his face. It's certainly sound business on nVidia/ATI/Matrox/whoever's part to keep continuing with their existing technology. I just don't think that changes the importance of the switch. ATI's Larry Seiler believes that there will someday be something new to subsume both raytracing and rasterization... Maybe... but until it's really clear what that may be, it's still kind of a loose string to hang onto.

The other big problem in acceptance is the fact that the public at large is pretty much clueless. As far as they're concerned, a video card just plain works. In general, the argument in favor of rasterization hardware is simply the fact that they're loaded with computational power. Indeed, we can theoretically achieve hundreds of gigaflops throughput as opposed to a few with a normal CPU. And there is a good deal of research into exercising that computing power for general purpose computing. In many ways, this sort of thing makes me sick. We're not even at the point where we're stressing the GPU's capabilities for graphics yet! Are we so far out of ideas in rendering that we need to look elsewhere? While in general, I'm in favor of the idea of having any amount of computing power made available in a developer-friendly way... the GPU is not and will never be a general purpose computing device. It will always be specifically made for stream computing. It will always be limited in its programmability. Even with PCI-Express, you can't expect that recovering of information will be a smooth process. More importantly, while we're gaining more extensive flow control within the shaders themselves, do we have flow control anywhere else in the pipeline? If we think about the version 1.x pixel shaders, we were pretty limited. We had only 8 arithmetic instructions (although you could do 2 texture stages in 1.4, and pull out 16). It's no question that we were very limited. At that point, you really couldn't think about general purpose computing too much because the shaders themselves limited what you could do graphically. At least then, we were at a stage where graphics-wise, there was more we could think of to do than we actually were capable of. Now we have a lot of the power that could prove most useful for rendering. Sure, there's still a lot we can't do... but is that it? Are we at a point where we have more power at our disposal than our imagination can manage? That bodes poorly for the future of realtime graphics... Or even the present, for that matter.

This is why I'm so much in favor of raytracing over rasterization. It's an entirely different way of approaching the rendering problem. When you get right down to it, everything we try to do with a rasterization-based engine is just to get closer to what a raytracer of some kind already could. Per-pixel lighting... Spherical harmonics... specular lighting... reflection maps and refraction maps... anisotropic lighting... it's all stuff that raytracers do. In fact, since I mentioned spherical harmonics -- how do you think people actually perform that sampling in order to get the low-frequency lighting estimates? A stochastic raytracer, of course -- while any algorithm can be used to actually perform the initial lighting pass, you still have to get an unbiased sampling of the scene. All of these things come out much more naturally when you're using a raytracer.

So what's different about it? Well, first of all, unlike a rasterizer... with a raytracer, pixels are in the outer loop of your renderer, and geometry is in the inner loop. This means that per-pixel effects are effortless. The renderer itself starts at the pixel level. Second, everything has its final place in world space. There is no transformation into view space (or rather, there doesn't need to be; you can still do it if you feel like), and no perspective projection transform. The camera is simply handled implicitly through pixel scan->ray transformations. This reduces the number of transformations that really need to be performed on objects. It also means that the camera is a much more flexible thing because it's not dependent on a particular transformation. Linear cameras are the only possible model with rasterization hardware, and the projections only occur at a vertex level, and not pixel for pixel. With a raytracer, every pixel experiences perspective projection, and moreover, we can have non-linear (e.g. spherical) cameras... or time-dependent cameras... or cameras with an actual lens simulation (for depth-of-field). Third, resolution is the single biggest determinant of performance. Since the number of pixels is a direct measure of the number of primary rays (essentially # of pixels * antialiasing level), performance is linearly going up with number of pixels. On the other hand, the number of actual objects/elements in the scene affect it much less, especially since you have flawless hidden surface removal and occlusion with a raytracer. Also, there are several approaches that can be taken to cull elements from ray intersection tests when they don't need to be considered. This is as compared to a rasterizer, where we have a lot more fillrate than required primarily due to all the horrid overdraw we endure such that resolution (up to a certain point) doesn't make as much of a difference as scene complexity. Fourth, we actually sample the world. This means that we don't just get information from texture maps and such... we can actually get information by shooting a ray out from some point and finding out what it hits. This means that things like reflection maps and refraction maps are meaningless because we can have actual reflection and refraction. Fifth, we don't work with individual polygons or meshes. We work with whole scenes. This is actually a bit of a difficult hurdle for hardware because we need some way to represent an entire scene and make sure that whole scene is accessible to the hardware. This is also especially difficult in a game world since they are so big and are only going to get bigger years down the road. Sixth, implicitly defined surfaces are preferred. This way, a single object can be defined by as few elements as possible. Which means that instead of having some polygon mesh for a sphere, we simply represent a sphere as a center and a radius. Polygons in general are slow with a raytracer because it takes so many polygons to represent a single object. In fact, higher order subdivision surfaces are actually faster. This is because the control mesh for a subdivision model can be represented with far fewer polygons than an in-game mesh that we might use in a rasterizer-based engine. The subdivision scheme itself results in a surface that can be calculated implicitly (e.g. a Catmull-Clark subdivision mesh will converge to a cubic Bezier surface for the same control mesh after infinite subdivisions). Similarly, a terrain which would be very slow to render as a mesh can be represented (to the hardware, anyway) instead as a single large flat polygon with a displacement map.

The image people usually have in mind when talking about raytracing is one full of shiny objects, reflective spheres, checkerboard ground planes, and hard-edged pixel-perfect shadows. And yet there is so much more to raytracing. Its real power lies in its Monte Carlo variations. This includes distribution raytracing, monte carlo path tracing, photon mapping, bidirectional path tracing, Metropolis light transport, etc*. It's not a simple matter of a single ray per pixel bounced around through the world. It's a matter of actually sending out sample rays in several directions in order to get information about the entire world relative to every point (every point that we can see). Ray tracing itself is the very basis for several full-scale global illumination rendering schemes that can model every possible form of light transport. The real difficulty here lies in sheer power. One of the things about a raytracing pipeline is that it's very hard to make one that is completely synchronous. Simply because you can't guarantee that every ray will take the same amount of time to return sample data; Some rays will lead to an area of just a few objects and some will lead to an area of several objects (which we may not necessarily be able to completely cull reliably). Also, from a hardware perspective, if we consider the fact that the scene storage is cached in a locally accessible buffer, we could run into some rays that lead to cache misses (i.e. the part of the scene we need is not in the cache), which incurs a further delay. At the very least, though, you can guarantee one thing -- no ray will effect any other ray. EVER. While a single sample point can spawn multiple sample rays, all the rays are completely independent of one another. What this means is that ray tracing is nearly multi-threadable ad infinitum. Granted, the fact that your pipeline is asynchronous means that momentary speedups and slowdowns in one pipeline affects the issue rate of your rays. Which in turn means that you run into diminishing returns if you try to multithread too much. In any case, as I said before one of the big barriers to raytracing hardware is financial. All the current samples and simulations so far are single-pipeline models -- simply because they don't have the time or money to build anything bigger. If, however, they had the kind of money to build a 200-million transistor chip or build a chip with several duplicate pipelines, we could end up with some pretty powerful chips. As a rule, though, it can be assumed that we won't possibly have a huge number of pipelines with a raytracing accelerator chip. Presumably, if we can have 16 rasterizer pipes, 16 raytracing pipes is probably possible with the same financial backing. At that level, we can assume it would have nearly 16 times the performance of a single-pipe version most of the time.

One thing to note, however, is that performance of a raytracer would be measured in raycounts. And that's not quite the same thing as polygon fillrate because more rays doesn't necessarily mean more framerate or more objects rendered in a scene or something. It means more information can be sampled in the same amount of time. And the rate of ray sampling is also a much more uncertain thing than something like peak fillrate because of the asynchronous nature of raytracing. The difference brought about by more raytracing power is the fact that we can increase our recursion depths, or increase the number of secondary sample rays in a stochastic raytracing scheme. So the ultimate point here is that more computational power doesn't just give us higher render speeds... it can also give us higher render quality as well. Make no mistake, though, to have something like MCPT in realtime in a game at speeds competitive to rasterizer hardware, we need a LOT of computational power. It's not unreasonable to ask for something like 15 billion rays per second if we're thinking 1024x768 w/ 4x antialiasing and shooting for 70 fps or so.

Probably the best place to actually introduce raytracing accelerator hardware is in a game console. Why? The biggest reason is the fact that consoles can be platforms unto themselves. Whereas a PC card would be nice, the fact is that you have to worry about market acceptance from the buyers, and even if it ultimately happens, it will not happen quickly. A console is a fixed hardware definition, and is designed based on the ideal configurations to achieve certain specific goals. The idea here is that console design doesn't really mandate a need to worry about legacy garbage, and people are free to make any sort of device they want. While we hear all the buzz about PS2's backward compatibility, it's really not an important thing in the console world. It's not backward compatibility that made PS2 popular, it's the PS2 that made backward compatibility popular. It's the games that sell a console, and if you can make a console that has really terrific hardware and really terrific developer tools that can lead to some great games, that's all that matters. You're free to throw in powerful raytracing hardware if you really think it will make a difference. The consumer doesn't care about the hardware so long as there are great games for the platform. And while the big GPU manufacturers are not going to accept a shift to raytracing chips, I happen to be a little more optimistic about software developers. Those who actually work with graphics and have a good understanding of the principles behind both polygon rasterization and raytracing are likely to actually accept raytracing hardware... so long as it works and works well. And of course, the biggest concern on that end is really to supply the developers with good tools so they can really make use of the hardware. Considering the OpenRT standard, that may not be too much of a problem. The Saarcor and Freon 2/7 projects are showing some promise as it is, but still need a lot of money to actually make it to market. Renderdrives and ART VPS cards are great and all, but they're not made for realtime usage -- they're made to accelerate high-end non-realtime renders.

While you may gain acceptance by putting raytracing hardware into a game console, you'll be losing one very important thing -- money. Consoles have to be sold cheap, and more often than not, they're sold well below cost. Nobody makes money on the console itself as much as they make money on the games sold and all the developer kits and software sales. This is more true now than ever as people want to put enormous chips with excessive bandwidth and loads of fast memory into a little box that really can't sell if it costs more than $300. Sheer power has become the name of the game rather than clever designs. Is there anyone who can really afford to put all that time and money into developing an excessively powerful raytracing accelerator board and bear the financial burden of putting it in a game console? Sure there are -- and they're the exact same people who wouldn't do it over their dead bodies. It's still a great deal cheaper to make a PC board because you don't have to consider it as an integral part of a standalone platform that has to sell for peanuts. You instead just worry about selling your product at whatever price you need to sell it at. Also, with an add-in board, you can at least produce a lesser product than you are capable of in order to serve as a proof-of-concept.

Getting the big GPU manufacturers to accept raytracing as the appropriate path to take is like convincing big oil companies to put money into developing a hydrogen infrastructure. Except for one thing -- the middleman really matters a lot. If developers don't really make use of the hardware effectively and really stress its abilities, then there will be no impetus for consumers to buy new hardware. One of the end results of having all this power at our disposal is that we end up making more content, which adds to our development time. Once upon a time, a single artist could create a character in about an hour. Now it's not all that uncommon for a team of 3 or 4 artists to spend 2 months on one character. Something like normal mapping makes it worse because characters are built twice. All this really adds up in time and cost. As hardware gets more powerful, we will reach a point where what is achievable is far greater than what we actually try to achieve, simply because the costs are too great to consider anything further. What does that mean for the GPU manufacturers? Well, it means that they have 2 more product generations on the drawing board that won't really sell because they don't serve any new purpose. Will raytracing hardware really help there? Mmm... a little. But the main thing is not so much that we end up saving time... it's that a lot of the things we'd like to do with a rasterizer that proved not to be worth the effort would come more naturally with a raytracer. We would at least save some time from the fact that things like normal maps and spherical harmonic maps and reflection maps are all essentially unnecessary. Similarly, we gain a lot more flexibility in character and level design because we get so much for "free". If major developers push for a serious change or a move away from rasterizers, GPU manufacturers will actually listen. The thing is that even as much as some of us might want that change to happen right now, the fact of the matter is that the power of rasterizer GPUs right now is immense, and hardly anything to complain about because we're not even pushing their potential. And as long as nobody is able to put big dollars into raytracing hardware R&D, we can't hope for much on that end. End result, even though raytracing is the best solution for the future, current GPUs are the best we could ask for right now. The move towards raytracing at some point is a must, but rarely do people know, accept, or like what's good for them. We have to face the trouble of getting the children to eat their vegetables... That could take all day, and unfortunately, we're not at a point where we can afford to ground them.


* Just as a side note, my favorite of all the raytracing variations would have to be Monte Carlo Path Tracing (MCPT). It's incredibly simple, it scales well to large scenes, it still models every type of light transport, and it keeps the number of secondary rays low.
View the Soapbox archives .
Back to my home page .