What a 3d Engine Should Be Like June 6, 2000 Charles Bloom (cbloom@cbloom.com) (new : an example pipeline at the end) 0. Philosophy A. design for the future; any feature you add typically won't be released in an engine until at least 1 year has passed. B. do a few things, do them very well C. have a modularized pipe so that when you add one feature, it doesn't break another (eg. d3d is NOT like this!) (for example, run all your polygons through the same pipe, so that they can all receive shadows, all get jiggled by an earthquake effect, etc.) D. make it so that artists can do things you never expected E. have a smart back-end (necessary for fast performance); and don't duplicate that behaviour in the front end. F. separate collision and rendering (eg. use OBBTree and BSP's for collision, progressive meshes for rendering) G. minimize memory touches (that's the slowest part of machines these days) with procedural geometry & textures, and just a good pipeline. (eg. use lots of linear arrays you walk straight through) 1. Push enormous numbers of polygons A. about 100k polys/frame on good hardware B. be scalable down to 10k polys/frame for old hardware C. use LOD so that those 100k polys are in the foreground, with characters of 10k polys each D. multi-layered, dot3 bumping, etc. (it's easy to add or remove more texture effects if you are under-pushing or over-pushing the fill rate) E. never show a polygonated sillhouette! F. no per-polygon algorithms such as traditional BSP or span-line renderers can be used G. don't worry about lightmaps and per-pixel lighting, just use lots of triangles! H. most importantly : keep the card running in parrallel with the CPU, never serialize! this means : no FVF changes, no VB or Texture Locks (use Loads and DiscardContents), etc. 2. Manage large worlds A. use heirarchical culling so that only visible things need to be touched (keeping your whole world in a "loose octtree" is one solution) B. no part of the pipe should be O(N) in the total number of world objects C. use spatial data structures to find the nearest lights, etc. D. use portals, beamtrees, or something for gross occlusion so that objects which are not visible don't hurt your framerate (occlusion should be per-object) E. page in distance objects with LOD so that you can have seamless infinite worlds. 3. Cool stuff an engine might do (but see 0.B !) X. Smooth skinned character & procedural vertex animations, IK X. implicit surfaces for gel/blobs (using marching cubes per frame) X. use implicit surfaces to define a potential energy field that particles float on (this is the "marker" method of finding isosurfaces). Great for gaseous stuff. X. cloth physics for character hair and clothes (with curved surfaces for rendering) X. non-photo-real rendering : cartoon, pen and ink, watercolor, etc. X. impostors (ala talisman, etc.) X. volumetric effects with masses of sprites (cheap on modern hardware, easily scalable to smaller particle densities) X. shadow mapping (see my article) (also, movie projectors, stained glass projections, etc.) X. animating perlin textures for awesome cloudy skies & hypertexture (in conjunction with marching cubes) --------------------------------------------------------------------------------- A Simple Example. Keep your whole world in a "Loose OctTree" (see Thatcher Ulrich's description). Basically this is an oct-tree with conservative oversized nodes so that no object need ever be in more than one node. All objects are compressed vertex information buffers (see my Pipeline document) with Triangle Lists. Duplicates of objects simply have different XForms and different animation times (hence different bone xforms), and refer to the same, single packed VB and TriList. Each object only has one texture, which is opaqued and S3TC compressed. Each frame, you descend the oct-tree using heirarchical clip-flagged culling against the view frustum. All objects that are in an oct-tree node that is visible are added to a temporary object list. You find the 8 (or less) most important lights for the frame and set them into D3D. You set the view matrix into D3D. The object list is sorted by object type. For each object type, you set up the d3d state information, eg. the texture, the render states. You have a single D3D Vertex Buffer of type XYZ|NORMAL|UV2|RGBA ; you decompress the object's packed VB into this D3DVB (using NOOVERWRITE if there is left-over space from other objects, or DISCARDCONTENTS if there is no space left and you must re-start the VB indexing at zero). For each object of that type, you simply set the object xform into the d3d world xform and fire the TriList with the object VB. When an object moves, it has a pointer to the one octtree cell it is in; it must simply remove itself from that cell and send itself down the tree again (or, if you prefer, walk to one of the 6 neighbor cells). This structure is bare-ass simple, but it's blazing fast, requires minimal CPU work, uses very little memory, is excellent for hardware parrallelism, and is easily extensible to support character skinning, particle effects, curved surfaces, subdivision, etc..