Fundamentals of a 3-d engine

by C.P.I. / ASMiNC


Jump straight to Sections


First off, we must be clear on what this page is all about.  I am not going to tell you how to make some end-all of 3-d engines here.  If you want an engine that casts shadows, uses S-buffers, and Frustum Clipping, with Blinn Shaded polys that run using a 3dfx API, go elsewhere.  This is all about the fundamentals.  Yes, I will get into shading...  Yes, I will get into optimizations...  Yes, I will get into little nit-picky things.  But as far as things like S-buffers, BSP trees...  A 3-d engine can run without them -- not to say that they're useless...  They are very useful, but not a necessity in the truest sense of the word.  For that matter, there is the fact that you can't really use a boatload of HSR algos all in one engine.  If you use S-buffers, then throwing Z-buffering into the engine would be pretty pointless.  This is something that's really up to you, the coder.  Simply put, if you can't get the basics, don't go one better.  Here, you're getting the basics -- the MUST-haves.  I will still bring up HSR's and Clipping -- but which one you use is not my decision to make.

Granted, a 3-d engine can run without shading, but it would be pretty darn worthless.  I don't think anyone will be able to make a 3-d engine of any consequence without shading.  Optimizations are always necessary, because speed is of the utmost importance.  If you can make a moderately fast engine without speedup techniques like clipping, think of what will happen when you DO implement it.  The whole point of a 3-d engine is to wow people with the kinds of things you do on a computer.  True, people are more interested what is done with it rather than the engine itself, but you're working towards that.  People notice things like shading, but they do NOT notice S-Buffers.  People notice shadows, too, but it's not a basic.  You don't want to delve into shadows until you can do everything else.  Collision detection likewise... It's noticeable, but useless unless you plan to do something more with your engine than simply show it off.

Second, it is of the infinite importance that you are fairly well-versed in mathematics.  If you aren't so good at trig and you don't know the first thing about matrices and you've never heard the word "interpolate" before -- Then go away.  You will never have even the tiniest shred of hope of ever writing a 3-d engine if you don't buckle down on your math.  There is no such thing as "enough math" in this world.  Trig and algebra should go without saying.  Some amount of Linear Algebra is required.  I would definitely advise some amount of calculus, too.  Differential Equations won't hurt either, but they're not vital in the code.  Maybe when working something out on paper.

It is also vital that you don't simply know the stuff.  You must be able to read deeper into the material.  Have some insight. If you just happen to know that multiplying matrices is done by yada, yada, yada...  Will you really know why we multiply all our transform matrices together?  Probably not.  It can even help in other ways.  If you've read my Ellipsoid collision DOC, you'd know that it was simply based on a parametric equation for an ellipsoid in Spherical coords.  Or the perspective tmapper, which was based on a little look into the Fundamental Theorem of Calculus.  The fact is that even the most piddley of concepts can have some significance.

If you're hoping for any suggestions for your engine... well, all I can say is to work with things that you know will work well together.  My own engine uses spherical coords for a lot of things -- rotations for one.  I use a 100% perspective tmapper that I developed myself.  It may not be the fastest in the world, but it is beautifully symbiotic with the S-buffers.  Which, by the way, I use for all clipping.  I also include Z-Sorting because it allows for some extra efficiency in the S-buffer insertion.  Some people would prefer to do Frustum clipping.  That's fine, but just remember what works out nicely.  You MUST be able to render convex polys.  Of course, that can also mean you can design your objects with more complex polygons and save on the number to render.  And of course, Backface culling.  It's also not a very difficult move to go from Frustum Clipping to Portal rendering.  Z-buffering could be thrown in there, but then what would be the point of Frustum Clipping in the first place?  These are all things you'll have to consider.  There are do-it-all engines out there, but note that there are option sets with those.  No engine in the world does Z-buffering and S-buffering at the same time or at least not on the same object at the same time.  It's a waste!

Still some things can work together that wouldn't seem so.  S-buffers and Portals can work together.  Why waste time w/ both, you wonder?  S-buffers alone or Portals alone can help eliminate overdraw!  Think of the following...  S-buffering will give you NO overdraw, and you will get pixel-perfect output which can include 2.5D clipping and everything will be laid out in a nice raster scheme that can easily get rid of flicker and you only draw the pixels that need to be drawn.  BUT, all the polys are processed into the S-buffer lists and stored before actually drawing to the screen.  The portalized portion of the engine can really help here because it will cut down on the number of polys that will be processed.  Of course, it also depends on your worlds.  My engine happens to be an outdoor engine, which wouldn't benefit much from portalization because any given portal would have to be utterly huge -- a very large viewing area isn't going cut down on polys to draw.  You can also throw in some BSP or Z-Sorting...  If polys are pre-sorted, that makes the S-Buffer insertion faster because all your insertions will clip rather than having to check <-- But of course, this depends on exactness of sorting.

If you want to code for 3-d accelerators -- IMHO -- you should have paper cuts inflicted over every inch of your body and be doused in lemon juice.  There is no standard for 3-d accelerators.  There is no uniform support set for 3-d accelerators.  They do not work well with half the things that we would like to do within a 3-d engine.  They do not even DO some things that we would like to do yet.  Not everybody has a 3-d accelerator.  Not everybody has the same 3d accelerator that you do.  No two accelerators run at the same speed (even those with the same chipset).  It was the same way with SVGA cards not too many years ago.  People frowned on SVGA-demos because there were only so many machines that could run them and there was no compatibility between cards.  No one is going to go out and buy a 3dfx Voodoo2 card just because YOUR stupid program demands it.  When the hardware and API developers have even the slightest bit of sense, then we'll talk.  SVGA cards were saved when VBE2 came out because it unified all the cards to one standard and everything could be done at appreciable speed.  If there was some sort of universal minimum set of things to support, it would be great.  Coupled with a universal programming interface, that's what it takes.  The manufacturers could do whatever else at whatever speed in whatever way.  When everybody starts doing that, then I suppose, we have the possibility of something like a UniV3DE or something...  I know, I know... keep dreaming.  But just think about this -- the only other salvation is that 100% of 3d cards become compatible with each other, and UniV3DE is several trillion trillion trillion times more likely.  I would win the lottery 650,000 times in a row before that ever happens.

Yeah, I know that there exist things like OpenGL, Glide, Direct3d, etc.  But they still require card compatibility. They still are the slowest code in the universe.  There exists zero compatibility between one and another.  Not one of those standards will work with every card out there.  You, the coder, have given up every ounce of control you had over the program.  All you have written is the geometry engine and even that must be in a format suitable for the API.  It's a hopeless cause and that's all there is to it.

It looks more as if 3-d accelerators will be like soundcards.  Pretty soon, everybody will have them.  But there are still dozens of standards and we'll have to support them all and put a little choice menu up front...

3-d acceleration API?           It's right up there with the old days of...

1) OpenGL                       Soundcard?
2) 3dfx/Voodoo                  1) SoundBlaster/Pro/16
3) X3D                          2) GUS (must have 512k+!)
4) Glaze3d                      3) AWE 32/64
5) Direct3d                     4) Interwave
6) No video...                  5) No sound

Please select -

Anyway, enough of that....  DO NOT CODE for 3d hardware.  At least not until one unifying standard comes forth.  With Microsoft's clout, it looks as if that standard will ultimately be Direct3d.  The only problem with that is that it's not possible to write code that is slower than Direct3d.  You could put as many waits and cache problems and everything you like -- Direct3d will still be several googolplex times inferior.

Many programming professors will tell you that there is no real rule about coding.  They're absolutely right.  The way you must program depends entirely on the application.  When you're writing spreadsheets and wordprocessors, the most important thing is design.  People don't care about the speed of the spreadsheet if it does the job well.  Another advantage of good design is updatability.  This is the problem when big companies start working in 3d graphics.  In graphics and sound, SPEED IS EVERYTHING -- that is not a negotiable statement.  Yes, there is such a concern as weighing speed over quality, but that is mostly a concern with the drawing primitives, not the computational geometry.  If you can do everything else fast, then you can probably spare some more time for real quality rendering.  Also many speed up techniques involve pre-calculation.  If you can be more precise with the precalc, imagine what kind of results you can get.

You may notice that there seems to be very little in the way of graphics on this page...  This is quite simply because I've put the whole doc on one single HTML file.  That makes for a VERY big file.  So why didn't I break it up into several files and save you the loading time?  The reason is that a "How - To" of 3d engines is something that will require serious pondering.  Some serious thought and consideration.  Something like that is probably not going to be read online.  So I've given you the whole file in one shot...  One file for you to download to disk and read at your leisure.


Sections...

- The bare necessities...
- Call of the Primitives...
- Data... "That is correct."
- What to do? What to do?
- Well ... now what?
- Final words


The Bare Necessities

     First and foremost, you need a graphics library of some sort.  I usually advise starting off in regular old Mode13h or in some 256 color mode.  If you want to start in a VESA 256 color mode, I say 'great' because it will make the transition to higher modes even easier.  The only reason I say this is because it is simply easier to code.  True, certain things will be very easy in hicolor modes, but I still say start small and work up.  Remember that we're only trying to make a start here.  You will need some sort of triangle/convex n-gon filler, which I'll get into later.
    There is also the matter of rotation and translation...  A 3-d engine is going to be pretty darn pointless if you're not going to do anything.  Let's start with translation because it's simple...

        dx = translation along X-axis.  dy = trans. along Y-axis.    dz = etc...

        X' = X + dx.  Y' = Y + dy.  Z' = Z + dz.

That's that...

As for rotation -- it can get a bit more interesting, because there are so many different ways to express a rotation.  Ultimately, though, it just gets down to a rotation matrix...

X-Rotation                      Y-Rotation                         Z-Rotation

    |  1     0      0   |                |  cos   0    sin   |                |  cos   -sin   0  |
    |  0   cos  -sin  |                |    0     1     0   |                |   sin    cos   0  |
    |  0   sin    cos  |                |  -sin   0   cos  |                |    0       0    1  |

When you multiply these matrices together, you get the overall transformation in one matrix.  That means 9 muls per point, even though there's a fair amount more in the beginning.  One thing that some people have against using these 3 matrices and developing a final matrix is the redundant storage and extra memory it takes up.  These people use quaternions or euler angles or similar things to represent their rotations and develop the final matrix from there.  I personally use a vector in spherical coordinates and derive rotated basis vectors from that.  The only shortcoming of this system is that I cannot do 3 axes of rotation (directly).

Projection to a 3-d screen is a pretty basic implementation involving similar triangles.

          X' = (X*d) / (Z+d)
          Y' = (Y*d) / (Z+d)

Of course, a simple optimization can be seen -- Why divide by the same number twice?

          k = d / (Z + d)
          X' = X * k
          Y' = Y * k

Nowadays, we simply put it into a big matrix, so it can be concatenated with all the others -- That saves us a whole lot of time because it allows us to do every transform in one single step.  Note that you do need 4x4 matrices for this unlike the 3x3 example shown above w/ rotations AND you will need the quaternion form for the points -- that is, [X Y Z 1].  This is important because when the transform is done, the scalar component will no longer be 1 and we must rectify that.  It will become some [X' Y' Z' n], so we divide by n and get [X'' Y'' Zn 1]  And the X'' and Y'' are the projected X & Y coords.  I won't get into how this matrix is derived because this is not a projections tutorial, but here it is...

   [ cos( FOV/2 )     0                      0                      0      ]
   [ 0            cos( FOV/2 )               0                      0      ]
   [ 0                0           sin( FOV/2 )/(1 - zn/zf)    sin( FOV/2 ) ]
   [ 0                0                    -a*zn                    0      ]

                a = sin( FOV/2 )/(1 - zn/zf)
		FOV = Field Of View angle.
			This is the Total angle from left to right.
		zn = Near Z-Clip value.
		zf = Far Z-Clip value.

I've gotten this question many times -- "Why do you need a 4x4 matrix and 4-d points?" The answer is quite simply because matrices are AFFINE transformations. But perspective projection onto the screen is actually a non-linear transformation! The only way to make that possible by matrices is to have another quantity that is independent of X,Y,Z. So we put our objects in a 4-d space. But the problem is that the objects are actually 3-d... So what we want is simply to have the objects in a 3-d subspace of that 4-d world.

This 3-d subspace is [X Y Z 1].  That is all points where the fourth component is 1...  This is objectspace.  This matrix will result in some [X' Y' Z' n].  We cannot render that!!  So we divide by n -- this is the non-linear part of the transform.  The result in X and Y will be our 2d points...


Call of the Primitives

The fundamental drawing primitive of all 3d engines is the polygon.  Whatever shall we do without it?  First, let's start off with the simplest of polygons -- The flat poly.  All my examples are with triangles.

Task #1 -- Sort the vertices in a suitable order.  You've got the variables X1, X2, X3, Y1, Y2, Y3 passed in.  Say you've also got in the procedure the variables XTop, YTop, XMid, YMid, XBot, Ybot.

XTop, XMid, XBot = X1, X2, X3 respectively.   Likewise for YTop, YMid, and YBot.  (Note that on the screen, increasing Y means going down)
If YTop > YMid, then switch (XTop, YTop) w/ (XMid, YMid)
If YMid > YBot, then switch those points.
... and so on.

Task#2 -- Compute slopes of X w/ Y along every edge.  That is -- (Xe-Xs)/(Ye-Ys)  And you'll have to do this for every edge.  YTop-->YMid,  YMid-->YBot,  YTop-->YBot.  Store the 3 slopes in 3 variables.

Now all that's left is to go from YTop to YBot and increment X's by the slopes at each iteration.  You have two Xvars that start at XTop.  But they have their respective X-slopes added to them. The loop is also split into two parts.  From YTop to YMid and YMid to YBot.  This is because you have a new edge starting at those points.  You also have to determine which slope will correspond to which side -- which slope is less The one from Top-->Mid or the one from Top-->Bot?  The lesser slope corresponds to the left side.  The only other precaution is to prevent divides by zero.

Now let's move up...  The Gouraud Polygon.  This is not very different at all.  Instead of having just X and Y for every point, you have X, Y, and Color for every vertex.  Likewise, you will have to divide color by difference in Y to get color slopes.  There is also the matter of slopes of color along the scanlines.  Fortunately, this is a constant throughout the poly.  We use the widest scanline for accuracy's sake.  If you take the X-Slope and Color-Slope down the edge from Y-Top to YBot...  Multiply by (Ymid-YTop) and then add the starting values (XTop and CTop respectively), you get the values you need.  The other end of this scanline happens to be XMid, YMid, CMid.  Convenient, eh?  This is where you'll calculate the scanline slope.

Then there's the next step -- Affine Tmapping.  Roughly the same thing again.  Now you're interpolating two extra values -- u and v texture coords.  Same process you used for color, but now they are pixel coords in an image somewhere in memory.

A polygon of some sort is really the only drawing primitive you need, though it can be broken down.  Of course, if you plan to do S-buffering, the Poly routine does no drawing and instead you have a scanline renderer in the end.  Even then, you are calling a poly routine which happens to include a call to an insertion routine.  Anyway, that's another matter because there are not a great many rasterization methods like that.


Data ... "That is correct."

Well, you've simply got to have somewhere to store all this information.  So now we get into something that's very crucial --  The DATA.  Let us first think about how much data is to be passed into a routine.  Say -- a Gouraud trifiller.  We've got the X, Y and Color for three points.  That makes 9 parameters.  Think of your precious stack -- Push, Pop, Push, Pop, Push, Pop....  Ya' know what your stack will give you for that?  A HORSE'S HEAD in your BED! That's WHAT! So instead, we'll use whole structures (or records if you're a Pascal person).  The same thing for all data types.  This also makes for some logical connections that make it easier to program in the end.  So our 3d point would probably be ...

         int X, Y, Z;   Which makes a 3d poly --- int X1, Y1, Z1;
                                                  int X2, Y2, Z2;
                                                  int X3, Y3, Z3;

On the other hand, some people may elect to do Frustum Clipping or BSP rendering with poly splittng, in which case, polygons could have an arbitrary number of vertices...

          Thus, a 3d poly structure looks like :
                unsigned char     NumVert
                Point3dStructure  VERTS[20]
	  Likewise, a 2d poly structure would be :
		unsigned char     NumVert
		Point2dStructure  VERTS[20]

Yes, it only allows a maximum of 20 vertices, but that's plenty...
Adding to that...  It is probably better that you define a triangle as an array 
of 3 Point Structures.

Note, of course, that these are mere suggestions.  Well, anyway, that takes care of the basic polygon.  But we still need more info.  What about normal vectors?  We still need Polygon normals for flat shading/backface culling.  We also need vertex normals for more advanced shading techniques.  The Pentium and AlphaPC lines of CPUs have given us great speed in the Floating point, so a single precision float is quite ideal for the normals -- which you ARE going to normalize, so accuracy is important.  If you're a Pascal coder (and I'm preferential to TMT myself -- I just wish it would optimize the code better), you should know that the type "REAL" is actually a double precision float.  Speed Loss!  Your only salvation is with DWORD variables that are treated as single precision floats -- that does inherently imply that all floating point operations must be done in ASM.  Anyway...

	The Vector Type looks something like :
		float I,J,K
	To supplement our 3d Polygon Structure :
		unsigned char     NumVert
                Point3dStructure  VERTS[20]
		VectorType	  FaceNorm
		VectorType	  VNorm[20]

So what else could we possibly use here?  Oh, how about Texture coordinates?  For that matter, what about Gouraud color shades (For the 2d Poly only).  Well 16.16 fixed point is plenty of accuracy for Texture coords.  Seeing as how MOST people only do 256x256 textures at the very largest, you could even get away with 9.23 fixed point.  This is very nice considering you have even MORE accuracy than single precision floats and you get integer speed.  (Floats are only 23 bits of mantissa and you will have 23 bits of fraction!)  The reason I say 9.23 instead of the logical 8.24 is because you will also be adding deltas to these texture values.  These deltas can be positive or negative and since the whole number portion can take up a full 8 bits, you run the risk of corrupting the sign bit with numerical bits.  It's best to have one little open bit.

Likewise, Gouraud shades maximize at 8 bits.  With gouraud, though, accuracy is not as important.  This is quite simply because you won't be doing very special things w/ gouraud fillers.  Just linear interp.  Tmappers are spanning an image onto the poly and you may be doing all sorts of things like mipmapping or perspective correction and in the end, subpixel and subtexel accuracy becomes more important.   9.7 fixed point is quite enough for Gouraud.  Also, I wouldn't concern myself too much with accuracy if I'm just trying to write an engine to start.

	So the texture coords are simply
		long u,v    or   int u,v    
	and our Gouraud shades are
		int r,g,b   or   int color(For an indexed colormode)

	Again, let's supplement our polygon structures with these...
		unsigned char     NumVert
                Point3dStructure  VERTS[20]
		VectorType	  FaceNorm
		VectorType	  VNorm[20]
		TextCoord         Tmapi[20]
	And the 2d poly will be...
		unsigned char     NumVert
                Point2dStructure  VERTS[20]
		TextCoord	  Tmapi[20]
		GouraudTp	  Shade[20]

That seems like it should be enough, but it isn't really.  There's one more important factor that can really make our engine run faster.  Think of when you find your vertex normals.  You are searching through polys for those that share the vertex that you are calculating for.  When you do the transforms for one particular object, why would you bother to transform those shared vertices?  They're the same point!  They will transform to the same point, as long as it's the same object!  You can save a lot of time if you just equate it to those that it's equivalent to...

	Say the Objects are defined as an Array of the PolyStructure...
Your link structure will contain which polygon and which vertex.

		int Poly, Vert

	And in the PolyStructure, we add the line...

		LinkStruc         Shares[20]

	Which will basically say that VERTS[n] is the same as Shares[n],
which is in some other poly...

Ok, THAT should be quite enough for most engines. It can get more complicated still, but that is not really a universal concern.  Well, the only problem is that all we've defined so far is our objects!  Another big thing we need is the matrix...  At least that is simple.  If you really want to define translation, rotation, and projection all into one matrix, then your only choice is to use 4x4 transforms.  A simple undertaking....  That matrix is little more than...

                     float     Mat[4][4]

And of course, you will need variables that are somehow connected to all of this.  Where else are you going to temporarily store the rotation angles and the translation constants and the FOV angle, etc., etc.?  Again, there is still more data that is of importance, but it's dependent on your engine.  If you want to know about the data types that regard Portal engines, look up a portal DOC.  If you want Z-buffering, look up DOC's on Z or 1/Z buffering.

What I've mentioned so far will tide you over for a basic engine.  You will not have the fastest engine in the world with just this.  The fact is, you still need some other speedup techniques and extensions to the engine.  What we have so far will be the basic data that is ultimately used in a final engine.  Note also that when I mentioned speeding up the routine calls using structures, this also means that we have to be able to modify the values in these structures.  In C, the only way to do this is by passing in a pointer to those structures.  In Pascal, of course, you can pass it in as a variable, but it will unquestionably be faster to pass in a pointer.  Why?  Quite simple.  A pointer is simply 1 NUMBER!  That means that all that data in those poly structures is accessible to our subroutines by way of ONE push and ONE pop!  That "mean ol' nasty stack", to paraphrase Paul Nettle, will become far more amiable.

Yeah, there's gotta be more even in the poly sturucture, but that again depends on the engine.  If you are going to do backface culling, you should probably add a little boolean marker that says whether this poly is culled.


What to do? What to do?

Okay, you've got your data...  Now what do you do with it?  Okay, first off, let's talk about everything that happens in object space.  The stuff that the user never gets to see...  The stuff that's all happening in 3d.  A very important part... And probably the thing you should do first [in runtime, that is] is Backface culling -- assuming that you actually plan on doing that.  I can't think of any reason not to do culling unless you're doing a lot of open objects or objects w/ holes in them.  It can always supplement other portions of an engine quite nicely by decreasing the number of polys to be processed...  Here's my favorite method for backface culling.

	There is the friendly plane equation of
		Ax + By + Cz = d
where (A,B,C) is the Normal vector and (x,y,z) is some point in the poly.
usually, (x,y,z) is the centroid of the poly -- average of the vertices.
	Precalculate all the "d" values for the polys [ here's something
else to add to that poly Type structure ]

	Now, we take that eye point (0,0,n) and do the inverse of the
transform for that poly (or object).  The inverse of a rotation matrix
is simply its transpose.  However, you also have to take translation into
account.  Do NOT put translation into the matrix or your inversion method
will not work.  Your only salvation is to transpose separately by the
negative transpose values.
	Note that you have NOT transformed the object, but only moved the
eye point relative to that transformation.  Now plug the eye point into
that plane equation.
	That is, for the poly you are testing, put it's Normal vector in for
(A,B,C) and the transformed eye point in for (x,y,z).  If the d value that
you get out is less than the one you've precalculated for that poly, then
the poly is not visible.  You needn't draw it or even transform it.

I don't think it gets any better than that.  It has the speed of the dot product method (give or take a few clock ticks), but it is also less error prone because it takes translation into account.  Everything is done in object space.  Other methods include simply checking if the z component of the normal is less than 0, but that requires that all the vectors be transformed.  You can back transform the eye vector, but that doesn't take translation into account (location doesn't matter to a vector)  The checking for clockwise points seems faster than others, but it's actually the worst.  1 : The points have to be arranged accordingly; 2 : All the points have to be transformed and even PROJECTED!!  There is the matter of using the eye point and back-transforming it and then generating a vector to the centroid of the poly.  We then do a dot product, and if that is less than zero, the poly isn't visible.  This works just as well as the above method, but we've added a few subtractions into the code.  Not a big slowdown, but ah, well... On the other hand, we do save a compare instruction.

Anyway, let's move on...  Now that we've determined which polys to transform and which not, let's talk about how to transform.  Look back up at the matrices mentioned earlier, if you need to.  You just multiply those matrices together and get the points you need.  Note that they each have their respective angles.  So how do we multiply matrices together? Simple -- It's a lot like the dot product, really, except that it's a whole series of them.

	If you look at each row in a matrix, the numbers in the row form what
is called a row vector.  Likewise, in the columns, they are called column vectors.
A dot product in matrix format is really set up as a multiply between a row vector
and a column vector.
                      [ e ]
                      [ f ]
	[ a b c d ] * [ g ]  = (a,b,c,d) . (e,f,g,h) = (a*e) + (b*f) + (c*g) + (d*h)
                      [ h ]
See how each element muls with its corresponding element in the other vector?
This is the idea...  Now let's move to matrices...  If you multiply [A] * [B]...
Let's take the first row vector in A and multiply it by the first column
vector in B.  This gives us the element in the first row and first column of the
resulting matrix.  If we do [A].row1 * [B].column2, we get the element in row1,
column2 in the result.  And it goes on like that...
	Remember that matrix multiplications DO NOT COMMUTE!  That means that
[A] * [B] IS NOT EQUAL to [B] * [A].  The only exception is when A is the inverse
of B.

I know you must be thinking, "That sounds awfully slow."  It IS awfully slow -- granted, though, you are only doing this once per object per frame and even then it is pretty darn slow.  This is pretty much why many people aim for other methods of setting up the transform matrix.  My method uses a spherical basis and requires 4 muls to get the first two axes of rotation into a matrix.  When you finally get around to performing the transforms, you have to multiply the final matrix by the points, which is pretty fast...  It's just 9 muls really, although, it gets to be more when you start to get into projection...  I usually leave projection separate because it's only 4 muls. (excluding setup of the matrix), but some elect not to, because it can leave them with all processes complete in one matrix multiplication.

If you have very few objects, but they all have high poly counts, then multiplying the matrices together is actually pretty efficient.  With a lot of objects, though, high or low poly count, it starts to get a little hairy.  This is just common sense, since the more objects you have, the more times you'll have to set up a matrix.  If the poly count is low, then one matrix setup doesn't account for a whole lot.  Each object is going to have its own matrix for the most part.  It gets much more interesting, though.

	Let's say you've set up the object transform matrix [O].  This will
be the basic independent movement for that particular object.  But some other
factors affect the motion of the objects, too.  What about the "camera"?
	You have a moving point that represents your view of object space.
This for some reason is referred to as the camera.  The camera has its own
transform matrix (again like the object matrix).  The fundamental difference
is that the camera matrix, [C], is universal.  [O] applied to specific objects.
There is a different [O] matrix for each object.  [C] applies to all objects.
So how do you apply both?  Simple -- multiply again...
	[T] = [O] * [C]
And yet, it goes on...  There is also one little option you can go for called
Hierarchal Transforms  The way this works is that each object has one or more
parents and/or child objects.  The Transform matrices are "inherited" from
parent to child.  What's the point?  Take, for example, a human arm...
	When the bicep/tricep rotate about the shoulder, the rest of the arm
rotates w/ it.  When you rotate the forearm about the elbow, the hand go with it...
They rotate about the elbow, too.  So the way you would have this set up is that
the bicep/tricep is the object w/ no parent (assuming you're doing JUST an arm),
but the forearm is its child.  The forearm's child is the hand.  Then the hand->
fingers, and so on.
	The transform matrix from the parent gets multiplied by the matrix for the
object.  Camera should not be included in the parent matrix.  Only in the result.

Well, that should be enough about matrices.  Let's get on to vectors...   A vector differs from our normal concept of numbers (or scalars) in that it has both magnitude and direction.  The necessity of vectors in life comes from quantities where direction is important, like forces and velocity.  (Note : SPEED is a scalar whereas velocity is a vector).  A famous little insight puzzle is "If I have a 4-ounce object on a table and I push on it with a force of 1 pound, will the object move?"  Obviously, the answer can't be determined...  What if I'm pushing straight down?

Vectors are actually a surprisingly recent addition to mathematics.  Despite this, vector math is not that difficult.  Calculus is almost 300 years old, but it's actually harder than most vector math, which is probably less than 200 years old.  In fact, a lot of calculus proves to be easier w/ vectors than with regular single-variable systems.  BTW, when I say easier, I mean EASIER, not quicker!  We owe the world of vector math to Hamilton, mostly.  Gibbs also gave us some new stuff regarding matrices and vectors (funny, but Gibbs is more remembered for the energy equations you run across in Chemistry and Thermodynamics).  To those of you who wonder, quaternions came BEFORE vectors [Hamilton].  Vector math comes largely from the old quaternion computations.  First off, you have the two types of products -- dot (or scalar) and cross (or vector).

	Dot product --- (a,b,c) . (A,B,C)
		= aA + bB + cC
	Cross product --- (a,b,c) x (A,B,C)
		  |  i  j  k  |
		= |  a  b  c  |  =  i( bC - Bc ) + j( cA - Ca ) + k( aB - Ab )
		  |  A  B  C  |
	As you should know i, j, and k are the unit axis vectors.  So the cross product
	results in a new vector.  The new vector is perpendicular to both of the
	two original vectors.
   This also happens to be another way of expressing a vector...
	in lieu of (X,Y,Z), you can also say iX + jY + kZ.  In the code, of course,
	you will simply use (X,Y,Z)...

The magnitude of a vector is denoted |v|...  Looking at either product
	a . b = |a|*|b|cos T 		  T is the angle between the vectors.
      | a x b | = |a|*|b|sin T

You compute the magnitude by squaring all the components... adding... and taking
the square root.  That is, a vector dotted w/ itself is the magnitude squared.

	a . a = |a|*|a|cos T;  The angle T between a vector and itself is always
0, and cos(0) = 1.  so-ooo,   a . a = |a|*|a|

These are probably the two most important operations you'll ever have to do.  We always have to find normal vectors to our polygons (for flat shading and/or culling).  How do we do it?  Well, cross product, obviously.  Normal vectors means the vectors are perpendicular to the surface... and the cross product results in a vector that is perpendicular to two others.  The problem is that we have no vectors...  Ah, but what about the edges of the polygon?  If you simply take one vertex and subtract its components from two other adjacent vertices, you get two vectors that stand for the edges.  These two will be used for the cross product.

Normal vectors are always to be normalized.  That is we must make sure that the length is equal to one.  How is it done?  Simply find the magnitude of the normal and divide all the components by that value.  The result is now a unit vector.  Unit vectors are really the type that is most useful to us because they tell us only about direction which is all we care about.  Also in floating point, we have 23 bits of accuracy (single precision).  If we only have such small numbers, the 23 bits will stand for a lot more. We get more accuracy!!  Whereas with large numbers, so much is whole number and so much is fractional.

Well, what use are all these normal vectors?  Quite simple...  These normal vectors tell us everything about the direction the polygon is facing.  From that we can derive the values for shading and backface culling.  A predefined light vector for instance can be dotted with a normal vector.  The result is a quantity that is proportional to our shading value.  If the light vector is also a unit vector, the result will always be 1 or less.  1 represents maximum intensity.  0 or less means no intensity.  As for culling, look back...

Another type of normal vector we use is the vertex normal.  The previous paragraphs talked about polygon normals.  There is a small question of how do you determine the normal to a point?  ALL vectors are normal to a point!!  However, since these vertices belong to an object, we can think of each vertex as a point on a surface.  Now to find the vertex normal, we average the normals of all the polys that share that vertex.  Again, normalization must be taken care of.

The vertex normal is only useful in more advanced shading techniques (Gouraud, Phong, Blinn, etc.)  Again, it uses the dot product in the same way we did for flat shading, only it gives us intensity at a vertex.  And of course, we do a dot product at each vertex.  This is Gouraud.  For Phong, we use an environment map and the values of the vertex normals gives us coordinates in that envmap.  Blinn is basically just Phong with a better highlight calculation system.

It's also possible to do Omni-directional lightsourcing.  I use a method that involves Gouraud polys because the calculations on each vertex are much faster than a Phong-based method.  The only thing about it is that I really have to draw a lot more polys to get a good result.  You'd have to read my DOC on the subject at Jado.org.

Okay -- we've decided which polys to actually do anything w/, then transformed them to our heart's delight.  Decided all the values and stuff for the rendering.  Now we just have to draw... or do we?


Well... now what?

This is really the hard part.  ONE - because some real serious decisions regarding the nature of your engine have to be made here.  TWO - You can get into some really advanced topics.  THREE - You have to somehow fit all of this in with what you've already written.  This can get pretty hard in some cases.  If you've already written trifillers and you're adamant about doing frustum clipping, then you've got your work cut out.  If you've written some poly-fillers that work really nicely w/ your screen mode, and you want to do S-buffering... Hoo boy!!  Well, anyway...  I am not here to pressure you into anything.  In fact, what I'm going to bring up here is really entirely summaries of various additions to put in.  For detailed descriptions, you really need to look elsewhere.

Well, let's first start with something easy -- Z-buffering.  Okay... This is a rendering scheme that involves the Z-value of each pixel.  You simply have a "copy" of the screen which contains only the linearly interpolated Z-values of each pixel.  Z-values of course represent distance from the viewer, so whenever a new pixel happens to be at the same (X,Y) as one that's already in the buffer, compare.  The pixel that's closer stays in the buffer.  SLOW AS HELL.  EXTRA MEMORY NEEDED.  Ugh.  Simple, though -- which is why all the hardware out there does Z-buffering.  Most people use Z-Buffering only for hardware compatibility or for special effects.

A faster, but not 100% effective way to simulate Z-buffers is with Z-Sorting.  It basically involves taking some Z-value in a polygon and sorting the list of polys on that basis.  It won't do the spacecutting like you get w/ Z-buffering, but if there is no ambiguity in the order of polys, you won't run into errors.  Of course, two polys could end up having the same priority which can lead to some problems.

An extension to Z-sorting is BSP-Tree rendering.  A BSP-Tree is actually a logical tree that is based around a single polygon.  The idea is that things can be either in front or behind the poly.  We test this using the polygon's normal vector.  In front means it will be a leaf to one side and behind means it will be a leaf to the other side.  A BSP system can be really simple or a real headache.  You can have static worlds with no potential splits and make it simple.  You just use the BSP order to pre-calculate the sorting.  Draw from front to back.  If the root poly is culled, then draw from back to front.  It gets a little hairy when you have to start splitting polygons and if the world is dynamic, then you have to keep re-generating parts of the tree.  A multiple tree route is also possible.  One can sort objects into the tree and for each object have its own polygon tree.

Whatever else Z-Sorting and BSP rendering do, it certainly does not determine viewable or non-viewable.  Frustum clipping is one method that does part of that job.  One can put 2-d clipping into one's poly routine, but speed issues will arise.  The poly-drawing is your fundamental primitive.  This is the lowest-level code.  Your fastest stuff.  And you want to put clipping into it?  This is why we do Frustum or 3-d clipping.  The most common implementation of the View Frustum is one where the FOV angle is 90 degrees.  When you do this, the clipped values for X and Y are always equal to (+/-)Z.  In any case, the ray equation is the big issue.  (Origin + t*direction).  Origin is the starting point.  t is some constant.  direction is the vector along the edge.   You simply check a vertex if it's not within valid params.  If not, then you must clip.  Simply find a value of t for which the value hit the plane.  This is the new clipped point.  Note that you must first add 256(or whatever depending on eye-location) to Z before doing this.  You can also add near and far clipping to this.

There still remains one fundamental problem that robs you of speed.  OVERDRAW.  What this is, is the tendency to draw again and again to the same pixel locations.  Not necessarily the same stuff, but in one frame several pixels will overlap.  Think of all the polys you have in your world, and all the various X,Y,Z positions.  If some poly covers up another, but both are in the FOV and not clipped, why would we draw both? We only SEE ONE?  What about a poly that partially covers up another?  We need to draw both, but only a certain percentage of the covered-up one needs to be drawn.  We need to cut down on drawing to those same spots over and over again.  It literally wastes time for nothing.

A popular way is through portalization.  A Portal engine makes use of Openings and Doorways in a world to define clipping regions or clipping Frusta.  The reason for its popularity is really because of the surge of first-person indoor engines like Quake.  The original Quake had some weirdly shaped doorways and some pillar-structures that really made portalization an impossibility.  Quake II, however, IS portalized and you can see a definite difference in the world design in order to accomodate that.  Again, You will be doing a lot of the stuff from frustum clipping, but now the view frusta are a bit more arbitrary.  However, you will only be doing it once -- it's precalc unless you want to have moving doors.  After you've clipped to some portal, you simply mark which polys are visible through this portal, so they will be the only ones to draw.  If the Invisible Portal Polygon is culled then you do not draw those polys.  Note that for each portal, you have two virtual polygons.  They will be facing opposite directions.  If the portal is not visible from our view, then you don't even consider it.  You draw polys as normal.  Just remember that it need not be just doors and windows that use portals;  the end of a tunnel can also be a portal.

A not-so-popular way is to use S-Buffers.  The reason for its lack of popularity is because it's actually very difficult.  Microsoft, for instance will never implement S-Buffering in Direct3d because it would require some actual work.  The other problem with S-Buffers is that they don't waste enough memory.  Big companies always have a rule of programming -- that is that maintainability and ease of understanding of code is more important than anything -- even your family.  Real programming tells you there IS NO RULE!  It depends on the project.  3d Graphics definitely entails a need for incredible speed.  With S-Buffering, you will have to remember some stuff from the days of your Data Structures classes.  Remember Linked Lists?  Well, if you don't, then refresh your memory.

With S-Buffers you will take an array of size corresponding to your Y-resolution.  e.g. : 320x200 has Y-res of 200.  So the S-Buffer array will have size 200.  In this array of size 200, will be stored pointers to the start of a linked list.  This linked list contains data regarding spans or scanlines in polygons.  These segments will contain Starting X and Ending X as well as Z-values so you can do Z-buffer type effects.  You insert these segments into the list based on position.  In the end you should have a list of scanlines going from top to bottom and left to right.  You draw only the pixels that need to be drawn.  There is not one shred of overdraw.  And there is usually no flicker.  Your polygon renderers will no longer draw to the screen... Instead, they do a few preliminary calculations to generate span segment data and pass it into the S-Buffer insertion routine.  It gets a little more complicated still...  You need to write your own segment manager to get some real speed as well as non-fragmented memory.  Paul Nettle's S-Buffer FAQ has a good idea for this...


Final Words

Well, so far I've talked only about basic objects.  This should be quite enough for starters.  The grand majority of what you'll be rendering are this type of object.  No matter what else you put into the engine.  I don't see anything advanced being a major issue unless you plan to be doing a 3-d VR funhouse or something.

In that spirit, I would like to say that there are many more things you can get into.  There are mirror type objects and polygons that call for a certain portion of the world to be rendered from a different angle that depends on the normal to that polygon.  Lens objects that depend both on normal vectors and thicknesses,  Most people do only normals, though.

In all this talk, I've only brought up techniques.  There's more to it than just rasterizers, texturing, rotation matrices, and the like.  Whatever else it is, this is art.  You could have the best tmapper in the world and it still doesn't hold up to that guy who has a lousy tmapper if he's got better textures to map.  To the end user, who probably doesn't care anything about cache misses and pipable pairs, looks signifies everything.  That guy watching your 3d engine in action will think -- "If it looks better, the guy MUST be a more talented coder."  But you have to wonder, though...  Is it worth doing this or this?

For instance, you can set up your sine tables to have 1440 elements or 256 elements.  If you go for 1440, the rotation will be VERY smooth -- sub-pixel accuracy, usually.  On the other hand, 256 elements will make everything faster and easier because you don't have to manage angle values to loop about a value-range -- it happens anyway.  AND you save memory.  If you're working in lo-res, then even 256 elements can produce very smooth results.

Sometimes, there's middle-ground.  Do I go for 8-bit indexed color? or trucolor?  Hmmm... Less memory in 8-bits, but I have to limit myself to a finite list of colors and it looks just plain ugly!  Truecolor looks good, but drawing is slower and I have to use up more memory especially w/ virtual pages and such.  Plus, your gouraud fillers will have to interpolate 3 values instead of 1 and so on.  And there is a compatibility problem w/ trucolor.  Some systems have only 24-bit and others have only 32-bit.  Well, there is still 15/16-bit hicolor.  Looks infinitely superior to 8-bit.  Medium mem usage.  Still need 3 interpolants in the GFillers, but we don't have to shoot for as much accuracy.  And ALL SVGA cards have 15 or 16 bit modes.

Well, okay -- you've got the geometry and the drawing of polygons -- so what's left?  Well, lot's of things.  We shoot for realism in the 3d engines as much as for speed.  We try to get the environment to look good as well as look real.  What else is there besides giving someone a 3-d world to roam about in?  Well, what about fog?  Fire?  Physics in general?

When you're just developing an engine, none of this matters unless you want it to, but when you're developing an engine for some ultimate purpose, it's a big deal.  Also, you have to think about what it's all for...  Are you doing some sort of flight simulator? -- then fog is probably a good idea.  Fire may not be that big a deal.  Physics isn't that big a deal because you'll only be dealing with one particular branch of it.  Are you doing an indoor Quake-thing?-- Well, fog is useless (fog indoors?!??!).  Fire is a big thing.  Physics may also be a big deal if you're gonna do the whole carnage bit.