The previous two articles on Scape were about the details of several procedural landscape generating algorithms. Now it’s time to explain how these were actually used as part of a brush-based editing pipeline. Please see the next article for a summary of the project, together with all links to the project’s research, articles, source code and binaries.
In a nutshell, Scape’s editing pipeline continuously executes the following steps:
- The user input is captured, processed and queued into brush operations
- The projected user input data is interpreted as points on a cubic spline
- Depending on the settings, the stroke spline is sampled at some regular interval
- The stroke samples are added to a queue as brush instance operations
- The brush instance operations are processed
- Each queued instance operation is partitioned into one or more ‘paged’ brush operations
- Any pending paged operations batches are processed over time by a scheduler
- Processed pages invalidate actual heightfield geometry tiles
- When all page operations are finished for some stroke, the old terrain version is efficiently delta-compressed, and the undo stack is updated
- The terrain renderer updates the geometry
- Any invalidated tiles are scheduled for (partial) regeneration over time, using the new heightfield pages as input
This article will cover a number of the many details in the first and second step. The third step is not covered at all as it is quite similar to the terrain LOD tile updates already covered in the first article.
In Scape, the terrain is internally partitioned into regular square section called pages. These pages are not to be confused with the geometry tiles mentioned in the first Scape article. A page is a LOD-independent piece of heightfield data stored as a 2D buffer in main memory and/or a one-channel texture in video memory. On the other hand, a tile is basically a small LOD-dependent vertex buffer with position and normal information. Partitioning heightfields into pages instead of simply having one large heightfield requires a more complex framework, but has a few important benefits. For example, the total terrain can be bigger than any maximum allowed texture dimensions, the terrain dimensions can be modified easier and faster, and the smaller individual pages can be copied, cached, swapped, shared, compressed and undo-ed more efficiently.
Brush instances
To apply a brush stroke in Scape, the brush’s effect is calculated for numerous discrete circular brush ‘instances’ along the stroke’s path. And collectively, these separate instances approximate the application of a continuous stroke. Alternatively, a more accurate analytic piecewise integration of stroke segments could have been implemented, which would have reduced the amount of ‘overdraw’ caused by the many overlapping instances. But that would also require the implementation and maintenance of more complex math and might also have been less flexible. Especially in the case of non-linear brush effects. Consequently, I preferred and implemented the simpler and more robust method of using many instances instead.
As seen in the image on the right for a circular stroke, the brush instances need to be close enough together to make the effect look continuous. But applying instances at a high density also requires more processing power. Furthermore, higher densities can also cause more noticeable rounding error issues: To apply more instances without amplifying the total effect, the effect per instance should obviously decrease. But as more smaller effects need to be accumulated per height sample for higher instance densities, the cumulative rounding error could become quite objectionable. As an ‘ideal’ instance density doesn’t seem to exist, Scape leaves it to the user to define, tweak and store this value as one of the user’s many per-brush presets. It also supports two different density modes: per second or per spatial unit. The effect of the former can be compared to how you’d use an airbrush, while the latter behaves more like drawing with a pencil.
Scape lets the user define an inner and outer brush radius for the active brush. The brush instance operations use these settings to calculate the bounds on the affected area and to calculate an influence weight per sample within this affected area. The samples within the inner radius receive the full 100% of the brush’s effect. The area outside this receives less and less effect, hitting 0% at the outer radius.
The weights are used to scale a (procedural) brush-specific algorithm’s output with. The used algorithm can basically be any algorithm that is able to calculate (or look up) a height value given any 2D horizontal input position plus any additional algorithm-specific inputs it might need. Examples include the discussed algorithms covered in the last two articles, but it can also be a smoothing filter or an erosion simulation iteration.
The weight-scaled algorithm output is typically added to or subtracted from the old height value. Note that because the weight of each instance will be exactly 0% at and outside the outer radius, there’s no need to try to update the area outside this radius. Yet, as circles are not ideal to work with, Scape updates all samples within the smallest axis-aligned rectangle, or quad, bounding this circle instead. The advantage of this is that it makes it almost trivial to calculate the per-page per-instance affected areas as well as execute any 2D traversal over a page’s affected samples during an update.
Editing pages
One property of any brush algorithm in particular has a large effect on the type of code design required to implement it as part of an editing pipeline: Can or can’t a new value for some specific position/pixel in the heightfield be calculated solely based on that position’s old value, or does it require data from neighboring positions as well? If it can, each page can be updated independent from all other pages. Otherwise, the underlying algorithm is likely to require support for reading from multiple pages to be able to update even a single page.
For algorithms that do require reading height information from a fixed-sized local neighbourhood to update a single height sample, different (number of) input pages might need to be accessed to calculate the effect at different locations within one affected page. A naive implementation could determine this per updated sample. But for performance reasons, it’s better to detect the different cases of sets of inputs for some target page from outside any inner loops and process these different cases as separate inner loops. Scape’s CPU-based smoothing and erosion-simulation brushes use this strategy. Also, its experimental GPU-based smoothing brush is implemented like this.
Conceptually, CPU-based and GPU-based brush implementations are quite similar. However, the GPU-based brushes require a more involved pipeline implementation to translate the above into render operations. This is explained next.
GPU brush pipeline
To use the GPU to process brush operations on pages, the affected pages in main memory are individually managed and shadowed as 2D textures in video memory. And it’s these texture versions of the pages that are updated on the GPU using brush-specific render operations executed by Ogre’s DirectX or OpenGL backend.
To start updating a page texture, each rectangular area that needs to be edited as a result of some brush instance is prepared as a rendering quad. These quads are rendered using a 1 : 1 ‘screen-space’ top-down projection, covering the areas that need updating. When I implemented Scape, this approach was really the only widely supported technique to write GPGPU applications, and it allowed me to easily drive the hardware through Ogre’s DX9 backend that was already used for rendering the terrain. Nowadays, CUDA, OpenCL and ComputeShader-enabled hardware is commonplace, and using one of these APIs might be considered to be a better alternative.
The quads are rendered with a brush-specific material, applying the brush-specific algorithm in a pixel shader to eventually overwrite (a part of) the texture version of a page. Preferably, the pixel shader’s output would be written directly back into the texture. But as that’s not guaranteed to be supported by OpenGL/DirectX, it’s rendered to one of the available (off-screen) render targets. When a page needs to undergo changes from multiple instances requiring multiple passes, the previous render target can directly be used as input to render the next render target, effectively ping-ponging between target buffers for as long as more passes are required. Only after the last pass has been finished, the result is copied from the latest target back into the page’s texture, completing the page’s update.
Note that the specific areas in an affected page that aren’t directly affected by some brush instance must still be copied from the previous page/target into the next. Furthermore, when multiple instances affect a page, and the brush instance operations are applied by ping-ponging between render targets, the areas that got affected in the previous render call but won’t be directly affected by the next operation’s quad need to be copied as well. Copying these areas is accomplished by simply rendering additional quads that use a trivial and generic input-copying material.
Each brush-specific render quad is typically made just large enough to cover a brush instance. But to support algorithms that need to be able to lookup height values from neighbouring positions, the quad typically needs to be split up further into sub area quads, analogous to having separate conditionless inner loops on a CPU implementation for the different border cases, as mentioned before. Each of these sub quads requires a different set of material inputs and possibly slightly different shaders. Having the option to use pre-processor flags and conditionals in the Cg shaders proved to be a great aid to manage the slightly different versions of the pixel shaders.
Once a page texture is fully updated, it’s mapped/locked as main memory and accessed to update one or more ‘dirty’ tile vertex buffers, as described in the first article. Also, just before releasing any buffers containing the old versions of the affected pages in main memory, the differences between the old and new page buffers are collected and RLE-compressed. This compressed difference buffer is kept around for any future undo/redo operations.
This is the basic GPU pipeline in a nutshell, which will be faster than a CPU-based counterpart, but still suffers from a lot of overhead due to all the reads, writes and copies from/to different intermediate buffers. But luckily, for a large class of brushes, this overhead can greatly be reduced by batching brush instances together.
GPU batching
For complex procedural brushes, the overhead of copying page data from/to textures, render targets and main memory is only a fraction of the total time required to process a single brush instance. But for simple brushes like smoothly pushing or pulling terrain, this overhead can be a considerable portion of the total time required to edit the terrain. In these cases, it helps to be able to process a number of brush instances as a single batch (i.e. single render call) by extending the brush’s pixel shader to allow multiple brush instances to be specified and iterated over from within a for-loop before outputting a new height value.
Batching multiple brush instance operations together into one render call only works for brush algorithms that are able to calculate each new height sample completely independent from any neighboring height samples, as a pixel shader cannot read back intermediate values from neighbouring pixels that are being processed as well. Hence, batching can’t be used for typical implementations of smoothing and erosion simulation brushes. But it can still be used for procedural push/pull and noise-based algorithms when used with care.
In a large and dense brush stroke, many individual instances may cover many individual pages. Furthermore, some of these instances might cover multiple neighboring pages. But when the number of instances affecting a page is larger than could be calculated within one render pass, different pages could end up batching the same brush instance into different (numbers of) batches. In theory, this doesn’t change anything. However, in practice, differences in rounding can become an issue. Scape represents the heightfield height samples as 16-bit integer values by default, while height updates in the pixel shaders are calculated as floats. Consequently, the cumulative rounding differences between summing floats directly (within a pass) and as 16-bit integers (between passes) can therefore become noticeable as discontinuities between neighboring pages. After experimentation, the maximum of brush instances per batch was limited to (up to) 16. This resulted in a minimal amount of rounding error issues, while still delivering a considerable speedup.
Scape uses R5G6B5 textures to internally store the 16-bit height samples for maximum compatibility, as hardware support for 16-bit and 32-bit types wasn’t that common at the time I wrote Scape. When using the packed R5G6B5 format, the pixel shader needs to manually unpack this format when reading from textures and pack to it when writing to the render targets, respectively. Note that any non-linear packing technique, including this one, causes bilinear filtering to break down, but this is irrelevant for most brush algorithms, as they typically only require ‘nearest’ lookups anyway. Nowadays, it might make more sense to stick to 32-bit floats both in and between passes, as bandwidth & memory constraints and compatibility might be less of an issue than they were in 2008. Unpacking/packing the R5G6B5 format to/from an exact 16-bit integer can be somewhat tricky due to, again, hardware rounding issues. After some trial and error, I settled on the following code to reliably pack and unpack this format.
float unpackFromR5G6B5(float3 colour) { const float3 scales = { 8.0f, 1.0f / 8.0f, 1.0f / 256.0f }; const float3 maxs = { 31.0f, 63.0f, 31.0f }; return dot(round(colour * maxs), scales) / 256.0f; } float3 packToR5G6B5(float value) { const float3 scales = { 1.0f, 32.0f, 2048.0f }; const float3 ranges = { 32.0f, 64.0f, 32.0f }; const float3 maxs = { 31.0f, 63.0f, 31.0f }; return floor(frac(value * scales) * ranges) / maxs; }
All the batchable brush types in Scape conceptually work by calculating some brush-specific effect at some sample position, scaling this based on the distance between the sample and the current instance’s center, and adding that to the sample’s previous height. The linear nature of this type of brush operation allows for another useful optimization: The calculated scale for each sample’s currently available set of affecting brush instances may be added together first, independently from the brush-specific effect. This combined weight is then used to scale the batch’s brush-specific effect as a whole. Consequently, the output from the more complex part of procedural brushes (i.e. the actual brush-specific procedural code) needs to be calculated only once per batch of instances instead of per instance in a batch. This optimization resulted in yet another noticeable speed up compared to the most naive implementation, especially for the more complex brush types.
Directional noise
The brushes so far have simply used the brush instances as a way to limit and scale the effect of a brush. However, more complex dependencies can be quite useful as well. For example, the local direction of the stroke could be used to further affect the outcome. Scape’s directional noise brush does exactly that, adding noise anisotropically, stretching or compressing the noise at a fixed angle relative to the stroke’s local direction.
The nice thing about this is that it gives user more freedom and control in how to use procedural algorithms to shape the terrain. For example, when the brush is set to elongate noise features along the stroke, it becomes a tool to create gulleys and river beds carved out by erosion by stroking downhill. When set to elongate in the direction perpendicular to the stroke, it becomes a tool to add mountainous structures by stroking along a mountain ridge. Both uses are demonstrated near the end of the Scape demo clip. In Cg code, directional noise could be implemented as follows:
float directionalTurbulence(float2 p, float seed, int octaves, float lacunarity, float distortSeed, int distortOctaves, float distortLacunarity, float distortScale, float2 mainDirection, float mainDirectionExtraScale) { // Get a 2D distortion vector for the given local position by calculating // the output of a procedural function twice, using two different seed numbers float2 distortion; distortion.x = DISTORT_TURBULENCE_FUNC(p, distortSeed, distortOctaves, distortLacunarity); distortion.y = DISTORT_TURBULENCE_FUNC(p, distortSeed + 0.123f, distortOctaves, distortLacunarity); // Project the distortion on the unit-length mainDirection axis float2 distortionAlong = mainDirection * dot(mainDirection, distortion); // Adjust the amount of distortion in the mainDirection distortion += mainDirectionExtraScale * distortionAlong; // Do the 'normal' procedural lookup at the distorted position return TURBULENCE_FUNC(p + distortion, seed, octaves, lacunarity); }
Besides the position p, a distortScale, a mainDirection and a mainDirectionExtraScale, this function receives two sets of procedural parameters: the seed, number of octaves and the lacunarity (i.e. octave step size). The first set is exclusively used as input to the TURBULENCE_FUNC, which could be any version of the turbulence functions discussed in the previous articles on Scape. Similarly, the second set is used as input to the DISTORT_TURBULENCE_FUNC, which could be any turbulence function that is roughly symmetrical around 0 (and so returns both positive and negative numbers between, for example, -1 and 1). The mainDirection may change per brush instance and is the unit-length 2D axis that is set to be at an user-specified angle to the local brush direction. mainDirectionExtraScale is used to either stretch or compress along the mainDirection. For example, a value of -0.8 will cause the input distortion along the mainDirection to scale by 1 + -0.8 = 0.2, thus locally stretching features by a factor of 1/0.2 = 5 relative to the direction perpendicular to mainDirection. And lastly, distortScale controls the total amount of input distortion to the TURBULENCE_FUNC, both along and perpendicular to the mainDirection.
This article has explained how Scape uses procedural algorithms to locally edit and render the terrain at (near) real time rates. In the (to be written) next and final post on Scape, the prototype code and binary will be released.
Comments (2)
Hi Giliam, this looks awesome, it´s really one of the bests programs i´ve seen to model terrains so far, here´s some suggestions i have:
Textures:
1-texture export
2-define from angle/to angle whitin each texture is applyed
Terrain:
1-a leveling tool with smoothness.
2- creating roads with a spline and setting points to set different heights and a leveling tool for the spline too.
that would kind of make it perfect!
Hope to see a release and more work, really awesome, keep it up! :D
June 30, 2012
Thanks Mauricio! Indeed, the texture export and road tool would be a valuable addition (but are unlikely to be added by me personally, as I’m currently mostly working on other projects). The texture from/to angle, however, is already in there and is controlled by the per-layer properties ‘slope (deg)’, ‘slope fill steeper’, ‘slope spread (deg)’ under the material settings. Also, depending on what you mean by ‘smoothness’, the leveling tool is already there: its the 7th tool from the left. Set ‘Strength’ under its properties to control the amount of effect (or use a pressure-sensitive Wacom) and press&hold ‘F’ and drag to change the brush falloff. Good luck!
May 31, 2012