Review : This article's kinda crappy; just read the ROAM papers, you'll be fine.
-------------------------------------------------------------------------
The Genesis Terrain Renderer Technology
White Paper by Charles Bloom (cbloom@wildtangent.com / cbloom@cbloom.com)
December 4, 1999
(see the addendum at the end for miscellaneous changes since this was written)
-------------------------------------------------------------------------
INTRODUCTION
The Terrain Renderer in the Genesis3D/Wild Tangent Engine was
the first cutting edge terrain renderer to be integrated with
a solid geometry engine. I'm going to describe the rendering
pathway and some of the capabilities of the terrain engine.
A summary of the technologies :
1. Real-time tesselation by view-dependent error. Tesselation
to a minimum screen space error, or to a maximum number of
subdivision steps.
2. Execution time is O(N) in the number of visible tris
3. View-independent simplification to lower memory use
4. Phong lighting in the texture with shadows
5. Polygon-accurate collision
6. Synthetic heightmaps and textures
Recommended reading : Hough Hoppe's papers, the ROAM paper, papers
of Lindstrom et. al.
-------------------------------------------------------------------------
RENDERING
Terrain is a built from a heightmap. This means that it can
only describe elevations off of a plane (eg. no overhangs).
The terrain object also includes an fBm (fractal Brownian motion)
noise algorithm for generating synthetic terrain. When the
heightmap is applied, a quadtree is created containing error
information for real-time tesselation. Tesselation is done by
a visual error metric, so the size of the heightmap has no effect
what so ever on the number of polygons rendered, or the resulting
framerate. That means that you can use arbitrarily large heightmaps,
and the only penalty is in the startup time (and memory usage).
The Terrain also does sub-pixel smoothing of the heightmaps; since the
heightmap is applied in 8-bit format, it has much lower resolution that
the floating point coordinates. A smoothing convolution filter is
applied which makes sure that the less than 1/256 'th of the range is
smooth.
Heightmaps are converted into quadtrees. This starts at the leaves
which are quads with the four corners on heightmap points (note that
this means that for a balanced quadtree, you need a heightmap which is
a power of two plus one (ex: 257)). Then parent quads are the average
of their four children. Whenever the four children of a new quad lie
within a small tolerance of the plane of the new quad, they are simply
deleted. This initial view-independent simplification typically
cuts out 20 - 70 % of the quads (depending on the level of detail in
the heightmap). Notez : you should in general let the Terrain engine
do this simplification, rather than shrinking the inital heightmap itself,
because the terrain engine can do it better than photoshop.
Real-time tesselation is done whenever the camera moves or rotates.
Tesselation of a quadtree simply means choosing a subdivision level
of the tree; imagine starting with the root quad (a giant quad over
the whole terrain) and recursively selectively creating children of
each quad. Tesselation is done with a small priority queue. First, the
queue is cleared. Then the root quad is inserted at the top. After
that, the topmost quad in the queue is removed. Then the view-dependent
error for each child is computed, the children are marked as subdivided,
and added to the queue in the priority determined by their error. This
process is O(N) in the number of subdivided quads (never touches quads
lower than the tesselated ones). Note that quads with very low errors
are added to the back of the queue and never touched (there is no need
to remove unused quads). Subdivision stops when the desired error is
reached (that is, all remaining quads provide less improvement than
desired), or a maximum number of subdivision steps have occured (which
sets a minimum framerate). In practice, having both is an excellent
combination.
The view dependent error is the most important part of the process.
It measure actual screen-space distortion. Let me first describe the
variables that are cached in the quad for fast true-error computation.
Each quad contains a normal (which is an average of the triangle normals
at the four corners) and a "cone of normals". The cone of normals is
the smallest cone which contains the quad normal at its center and contains
all of its child quad normals and corner point normals. The cone of normals
is actually stored as simply the sine squared of the angle of the cone. Each quad also
contains an axially aligned bounding box; the XY extents of this box
exactly match the box corner coordinates (this is an advantage of the quadtree),
and the Z extents contain all the children. Finally, each quad contains an
absolute measure of the heightmaps "non-planarity" over that quad. That is,
in the XY extents of the quad, all the heightmap points are projected onto
the plane of the quad, and the sum of the distances of all these points are
added. Note that no quads with a nonplanarity of zero can ever have children,
because the children would have been cut in the view independent simplification.
Now, with all this data we can compute a screen-space error. First, if the
camera is inside the bounding box of a quad, it is subdivided. Then the bounding
box of the quad is clipped against the view frustum (bounding box - plane tests
are very fast). The frustum clipping is percolated heirarchically down the tree;
eg. the clip flags are initially set to all planes, and then whenever a quad is
totally out of one plane or totally inside it, we know all its children will be
also. Next, the dotproduct of the camera direction and the quad normal is taken.
If this dot product is greater than zero, and the square of it is greate than the sine
squared which represents the cone of normals, then the quad is set as backfaced,
and not subdivided. If the cone of normals is not used, then parent quads may be
backfaced even when their children are front-facing. Now that we have decided a
quad has the potential for subdivision, we compute the error from two components:
the screen-space nonplanarity, and the silhouette bias. The screen space nonplanarity
is very simple : it's just the quad's nonplanarity divided by the distance from the
camera to the quad squared. This is proportional to the screen space area time the average
nonplanarity in screen space. The silhoette bias is a measure of the quad's normal
being perpendicular to the camera; when you see a quad head on, its error is invisible;
furthermore, a quad which is normal to the camera may be a true silhoette, which we
must tesselate. The silhoette bias is measured by taking the sine squared of the angle between
the quad and the camera direction (this is just the magnitude of the cross product, squared),
adding the sine squared of the cone of normals, and adding a constant term. The final
error is just the product of the screen space nonplanarity and the silhoette bias.
Once we have tesselated, we may simply render. We already have the heirarchical camera
frustum clip flags in all the quads, so we may heirarchically render. We start at the
root and recursively descent to the leaves. Any quad which is marked as "all out" is
simply not descended, and it's children are never touched. Any quad which is marked as
"all in", and all its children, are simply rendered directly. Quads which are marked
as partially in must be clipped, but only by the planes that they cross. The result is
that even in many-thousand polygon scenes, only about a hundred plane clips are required!
Polygons are also backfaced in screen space (by clockwise/counterclockwise detection)
immediately before rendering.
The only detail about rendering we haven't addressed is T-joint fixing. T-Joints happen
when a quad is subdivided, but its neighbor isn't. To fix them, we simply keep a linked
list of "next in x" and "next in y" points in each point. This is simply a one way list,
since points are never removed, and the list only has to be touched for subdidivded quads,
so maintaining it is again O(N) in the subdivided quads. The final rendering is then done
by stepping around the points on the linked list to gather all the neighbor points, and
rendering those triangles.
Note that Binary Triangle Trees offer some advantage over the quadtree method. For one
thing, you can render leaves immediately since they're triangles. For another, each node
or leaf has a well-defined plane and normal, unlike quads which may be two triangles.
Finally, because each leaf is an actually rendered triangle, there is an exact correspondence
between node errors and actual visual error, while with the quadtree the triangulation
actually adds polygons which will typically give you significantly more quality than
estimated from the quads.
-------------------------------------------------------------------------
OTHER TECHNOLOGIES
Texturing is done by applying a power-of-two dimension grid of textures to
the terrain. The terrain is always tesselated to at least 3 levels, which
gaurantees that as much as a 16x16 grid of textures can be applied. Using
a grid of textures is better than one big texture, because distance textures
will mip down to smaller versions.
Static lighting can be done on the vertices or cast into the textures. Vertex
lighting is fast, but makes the tesselation visible through popping (actually,
this can eliminated with lighting-dependent tesselation, but that pathway is
not promising). Texture lighting is a slow one-time operation, and can be saved.
Accurate shadowing casting is done, including shadows cast by other objects.
If the texture is higher resolution than the terrain, Phong interpolation is
done on the terrain normals. An ambient light parameter is also taken to help
simulate realistic day-time lighting.
The Terrain provides collision functions for seamless integration into the engine
environment. Collision is done versus rays and bounding boxes extruded along rays.
Collision works by sending the thick rays down the quadtree. At each level, the
thick ray is checked against the bounding boxes of all the children. Any child which
collides with the ray gets the ray sent to it. When the thick ray gets to a leaf, the
ray is clipped into the extbox of the leaf, and then collision is done with the two
triangles that make up the leaf. Note that collision
is done against the full detail terrain, after view independent simplification, so
that T-Joints may exist (and are handled, but may cause small anomalies). If a
collision is found, and the exact collision point was not requested, then the stack
of quads is instantly cleared and we return, for very fast lighting collision.
As previously noted, fBm (aka Perlin) noise is used to generate synthetic heightmaps.
This provides cumulous-cloud like terrain. The fBm noise is also used to synthesize
textures. Several source textures are taken as input, and used as the primary
texture in different height ranges. The textures are blended together between the
height ranges, and the blending is perturbed by the noise.
-------------------------------------------------------------------------
ADDENDUM
May 17, 2000
Since this was written, the terrain renderer has been revised in various ways
to be better suited to modern hardware. This means pushing more triangles using
less CPU time (to make sure the 3d card is being heavily utilized).
The first step is making use of d3d vertex buffers and triangle lists. To do this,
instead of rendering polygons directly, the verts of a polygons are added to a vertex
buffer (if they aren't in it already), and the polygon is added to the indexed
triangle list. This allows for sharing of verts across quads in the transform pipe.
All triangles are still clipped by the frustum with heirarchical clip flags, because
that is so very cheap, so they are added to the triangle list post-clip, and d3d
does no clipping. Hardware Transform cards then transform the vertex buffer and
triangle list for me.
Clipping is also done with the "guard band" on modern cards. That means that if
a triangle goes out of the screen rect, but stays in the larger virtual screen (aka the
guard band) it need not be clipped. To faccilitate this, two frusta are used : a guard
band frustum and a screen frustum. Triangles are all-out-rejected with the screen
frustum, they are all-in-included with the guard band frustum, and if neither of those
pass, they are clipped to the screen frustum.
Refinement priorities (the sorted queue) is now done with a heap data structure. See
Sedgewick's "Algorithms", for example. The heap is a static-sized array; it only needs as
many elements as the maximum number of allowed refinements, which is a user-specified
variable.
Silhouette refinement has been removed. This allows us to stop storing normals and
"Gauss maps" in the quad nodes, which significantly reduces memory use, making the terrain
engine much faster. It also lets us avoid computing the gauss map at startup time, which
is quite expensive.
-------------------------------------------------------------------------