Creating a procedurally generated universe, one algorithm at a time.

Jupiter Jazz

Posted on Jun 9, 2016 graphics programming directx proceduralcontent


One of the main areas where I was never satisfied with Junkships procedural planetary generation was the generation of convincing looking gas giants and cloud systems on earthlike planets. I got some initially passable results by tweaking simplex noise, and adding some approximations of vortices as described here, but it never looked that great due to the fact that cloud formation is a highly physical process involving wind, temperature, pressure, and a bunch of other phenomena rooted in pysical simulation.

It was always my plan to revisit this part of the system and to add some amount of fluid simulation when generating cloud layers. With that said, my goal was not to accurately model the actual physics, but to build a lightweight simulation that would give a good final result, and run fast.

After reading up on the current state of the art I came across a great article here which proposed a really simple method that looked like it should tick all the boxes I was looking for. While this proved to be broadly true, there were however a number of isssues I had to overcome in order to get a satisfactory solution.

The TL;DR of the technique is that you generate a heightmap using a noise function (in my case simplex noise), calculate a vector which is the gradient of the noise field at each point, then rotate that vector 90 degrees around an axis defined by the normal. This set of rotated normals is your ‘flow map’. You then take your input texture and your flow map and move every texel in the input by the vector in the matching texel of the flow map - if you do this repeatedly the input texture looks as if it is a fluid flowing according to the hills and valleys in the original height map.

There is at least one other implementation of this technique out there (Heres some good slides by the developer of Space Nerds In Space) but I haven’t seen any that use the GPU to do the work. The advantage of using the GPU is that computing the flow calculations for a texture can be done dozens of times per second as opposed to minutes per texture iteration with CPU based implementations. This means that its possible on a mid range PC to generate gas giants or cloud layers within a second or two which is in keeping with my goal of having all asset generation occur near instantaneously at runtime.

There were a few problems with the technique as presented in the original paper, the main one being that it is designed to work in 2 dimensions, whereas I needed to generate the flow map on the surface of a sphere. While ‘rotating’ a vector by 90 degrees in two dimensions is pretty trivial, its not so obvious what this means in 3 dimensions. For those whose linear algebra is a little rusty the solution is:

  • calculate the normal at the position on the sphere
  • calculate the tangent at the position on the sphere (you could also use the binormal/bitangent instead)
  • calculate the normal from the heightmap at the same position on the sphere
  • calculate the cross product of the tangent and heightmap normals

or in HLSL

float3 flowVector = cross(input.tangent,heightmapNormal);

This generates a flowmap with a smooth contour of vectors that can be stored in the rgb components of a texture as shown below.


The second problem with the technique is that it is effectively like putting the input texture into a blender, if you run enough iterations, all the original input colors will become evenly distributed and blended together. Because of this, its important to ensure that there is a limited number of flow iterations in order to allow for some mixing - but not too much. The problem with this is that the animation of the flow process looks really cool and it would be a pity to have to use a static version of a texture which is supposed to be showing cloud systems which should change appearance over time.

My solution to this was to split things into 2 stages - the first is the iterative blending process where each output texture is then recombined with the flow map. Once a number of iterations have been run. Once this has been done we have an input texture that can be used as the input to the second stage.

The second stage still involves the use of the flow map to warp the input texture, but importantly the output is not used as the input to the next frame (to prevent excessive mixing over time). Instead, the same input texture is used on each frame, and the flow map is rotated by a fixed amount. This gives the appearance of some dynamic fluidity when rendering, but ensures that the appearance of the texture remains consistent over long periods of time.

Gas giants: Old technique vs New technique


Clouds: Old technique vs New technique


As you can see the new technique gives vastly superior results to what I was doing before, ands its also proven to be much easier to tweak the generation parameters to provide a more predictable range of good outputs.

In other news I am also planning on fully open sourcing the Junkship solar system generation codebase soon. There are a couple of reasons for this, the main one being that I no longer have any desires to build this out into a fully fledged game or otherwise commercial product, I’d much prefer to refocus it into what it is, which is a cool demo for procedural graphics techniques. Because of this there is no real need or advantage to keeping the code private anymore, I’d much prefer to share and see other game developers take advantage of and build upon the the stuff I’ve worked on. Theres a fair chunk of work that still needs to be done to tidy things up and finish off some of the generation code, particularly around lighting and color palette generation - but I’m hoping to get the code up on Github in the near future (Take that with a grain of salt though, my time estimates tend to run on something approximating Valve Time)

My god its full of stars

Posted on Feb 2, 2014 graphics programming directx proceduralcontent

It has been a long time since I last worked on generating procedural star fields, but I was never particularly happy with the result that I came up with in my first attempt so when some time presented itself I decided to revisit the problem to see if I could come up with a better solution. It turns out I could, below is an example of what I came up with.


My original method (shown below) was largely based on combining successive layers of Perlin noise, and while this worked ok for the actual stars themselves it usually looked pretty terrible when it came to rendering nebulas and the gas and dust surrounding dense clusters of stars.


The reason for this is that star and galaxy formation is an inherently physical process, shaped by the gravitational interactions of billions of stars and giant clouds of gas and dust. The Perlin noise approach failed to take any of these physical factors into account and as a result it was extremely difficult to tweak it to get the relationship between star and gas density correct, and to get the overall shape and structure of nebulas looking convincing.

Because of this it seemed obvious that in order to get this right I was going to have to implement some sort of physics based particle system to at least approximate the physics involved. With this in mind I realized that given the number of particles that would have to be simulated (probably millions) that this was going to be impractical to do on the CPU and was going to be a job for the GPU if I wanted to keep the generation time to a minimum.

I ended up writing a Compute shader in which a large number of particles are simultaneously attracted to a handful of attractor particles which move randomly about 3d space. These attractor points are repositioned each frame on the CPU (as there’s only 2 – 3 of them ) and fed into the shader as a buffer. The shader then calculates the acceleration acceleration caused by each particle due the force of each attractor ( proportional to the square of its distance between the attractor and particle ) and updates the particles position.

The effect of this is a very loose approximation to the force of gravity, but where all the particles can move independently of each other, making it extremely parallelizable. Having multiple attractors is essential in such a simple simulation as having a single attractor results in all particles converging toward a single point, whereas having particles being under the influences of multiple concurrent forces creates interesting and unpredictable patterns. Below is an early rendering showing around 1 million particles being simulated and rendered as point sprites.


Here’s the source code for the compute shader

cbuffer Constants  : register(cb0)
    uint GroupDimX;
    uint GroupDimY;
    uint MaxParticles;

cbuffer Physics : register(cb1) 
    float FrameTime;
    uint AttractorCount;

struct Particle {
    float3 CurrentPosition;
    float3 OldPosition;
    float3 Velocity;
    float Scale;

struct Attractor {
    float3 Position;
    float3 Destination;
    float Velocity;
    float Strength;
    float MinAttractorDistance;

RWStructuredBuffer<Particle> srcParticleBuffer : register(u0);
StructuredBuffer<Attractor> attractorsBuffer : register(t0);

[numthreads(1024, 1, 1)]
void main( uint3 dispatchId : SV_DispatchThreadID )
    uint id = dispatchId.x + ( GroupDimX * 1024 * dispatchId.y ) 
        + ( GroupDimX * GroupDimY * 1024 * dispatchId.z );

    //Every thread renders a particle.
    //If there are more threads than particles then stop here.
    if(id < MaxParticles){
        Particle p = srcParticleBuffer[id];
        float3 a = float3(0.0f,0.0f,0.0f);

        for (uint i=0;i<AttractorCount;++i) {
            float3 diff = attractorsBuffer[i].Position - p.CurrentPosition;
            float distance = length( diff );

            if ( distance < 
                    attractorsBuffer[i].MinAttractorDistance ) {
                // make sure particles don't appear inside an 
                // attractors min distance. If a particle
                // gets inside the min distance, we'll push it 
                // to the opposite side of the min sphere
                // This reduces large numbers of particles
                // converging in a point around an attractor

                float3 push = diff + 
                    normalize( diff ) 
                    * attractorsBuffer[i].MinAttractorDistance;
                p.OldPosition += push;
                p.CurrentPosition += push;

            a += ( diff * attractorsBuffer[i].Strength ) 
                / (distance * distance);
        float3 tempPos = 2.0*p.CurrentPosition 
            - p.OldPosition + a*FrameTime*FrameTime;

        p.OldPosition = p.CurrentPosition;
        p.CurrentPosition = tempPos;
        p.Velocity = p.CurrentPosition - p.OldPosition;

        srcParticleBuffer[id] = p;

The particle simulation when suitably tweaked came up with all sorts of interesting looking nebula type clouds, and worked well for rendering fields of stars. However as you can see in the screenshot above, even with a lot of blur applied to the particles, when rendering a cloud of point sprites they had kind of a lumpy appearance that gave away the fact that the cloud was actually composed of a finite number of particles. I tried all sorts of variations of increased blur, low pass filters, different particle sizes and shapes but the lumpiness was always present to some degree

It took me quite a while to figure out a solution for how to smooth out the particle field and I eventually figured it out after remembering an article I read about the rendering technique used to draw the skyboxes in Homeworld 2 ( In that game the backgrounds were all vertex shaded using actual geometry which gave the backgrounds a very smooth almost painted look.

This got me thinking, could I use vertex shading to try and smooth out the rendered cloud? I started by normalizing all the particle positions so they sat somewhere on a bounding sphere with a radius of 1 unit. Then for each vertex on that sphere I summed up the number of particles within a certain distance and recorded that as the particle density at that vertex.


For example, in this image we can see that the vertex at the center of the red circle has a density of 1 as there is only one particle within the circles radius, whereas the blue vertex has a density of 4 as there are 4 nearby particles

The pixel shader can then render color based on this density, which the graphics card hardware will interpolate for us between the respective vertices for a smooth continuous output. Throwing all these ideas together gave me the following vertex shader (the pixel shader is trivial, it just renders a color whose opacity is based on the vertex.Intensity value and uses the normalized vertex.Velocity vector to lerp between two possible color values )

cbuffer ParamsBuffer: register(cb0)
    float4x4 WorldViewProjection;
    float MaxDistance;
    int MinThreshold;
    int ParticleCount;

StructuredBuffer<Particle> particleBuffer : register(u0);

Nebula_VSOutput main(VSInput input)
    Nebula_VSOutput output;

    float4 worldPosition = mul(float4(input.Position,1.0f), 
    output.LocalPosition = input.Position;
    output.Position = worldPosition.xyww;
    output.Velocity = float3( 0.0f, 0.0f, 0.0f );

    int intensity = 0;
    // find how many particles are within a threshold distance
    // of this vertex
    for (int i = 0; i < ParticleCount; i++)
        Particle p = particleBuffer[i];
        float distance = length( normalize( p.CurrentPosition ) 
                - input.Position );
        intensity += distance <= MaxDistance ? 1 : 0;
        output.Velocity += p.Velocity;

    output.Velocity = normalize( output.Velocity );
    output.Intensity = saturate( ( intensity - MinThreshold ) 
            / (float)ParticleCount );

    return output;

Turns out it worked exactly as I had hoped! the rendered clouds were now perfectly smooth provided that the sphere used to render the skybox had a sufficient polygon count. It also had the ancillary benefit of requiring far less particles for an equivalently detailed cloud which helped reduce the simulation time. Below are a couple of different procedural star field renderings, each field is rendered after around 60 physics iterations and contains around 2 million particles for the point sprite stars, and around fifty thousand particles for the nebula clouds. A full skybox is rendered in 300 layers with one layer per frame ( so as to not adversely affect the frame rate of other rendering ) which ends up taking around 5 seconds.



How to properly distribute DirectX in an installer

Posted on Nov 6, 2012 programming directx

I see threads all the time where people complain about game installers that have DirectX install wizards that pop up mid install, or about the size of the end user redistributable bloating their game installer. I understand why this is a problem, there’s lots of advice and documentation telling you what NOT to do when distributing DirectX (Don’t just include the dlls in your install folder!), but not much in the way of what you SHOULD do. So after seeing another iteration of this discussion on this recent Reddit thread, I decided I’d write this post so that those who are unsure of how to redistribute DirectX in their own games installer will be able to get it right (BTW - the installer in the linked thread is doing it wrong despite the thread title).

Step 1 – Create a minimal installation package

Most Windows installs already have DirectX installed by default, Windows XP SP2+ includes DirectX 9.0c, Windows Vista includes DirectX 9&10 (and SP2 includes DirectX 11), and Windows 7 includes 9,10&11 right out of the box. So really all you have to do to ensure that all prerequisites are installed correctly is to ensure that any dll’s you use that are part of the DirectX SDK and NOT part of the core DirectX install are installed.

The first step in doing this is to go to the DirectX SDK install directory (usually something like “C:\Program Files (x86)\Microsoft DirectX SDK (June 2010)”) and go into the Redist folder. Create a new folder somewhere else on disk and copy the following files from the Redist folder into it

  • DXSetup.exe
  • DSETUP.dll
  • dsetup32.dll

You should now have a folder that looks something like the following


Step 2 – Determine what D3D libraries you are using

Now you need to find what D3D libraries your solution is actually using. If you’re not sure (and your using c++) In Visual Studio, go to project properties and you probably have these listed under the ‘Additional Dependencies’ field in the Linker->Input window. Take note of any .lib files starting with d or x (e.g. xinput.lib or d3dx9.lib).

Now go back to the DirectX SDK Redist folder and for each lib you identified, grab the most recent copy of the corresponding .cab file. For example, if you are using d3dx9.lib, then (assuming you are using the June 2010 SDK) you would grab (also assuming your game is compiled as a 32 bit application – if not, go with the x64 versions of the same .cab). Similarly if you were using xinput.lib, you would grab Copy all the selected .cab files across to your install package folder we created in the previous step.

You should now have a folder that looks something like the following


Step 3 – Bundle these files in your installer and execute DXSETUP silently

The files you identified above are all that are required to install the DirectX prerequisites on any users machine. Note how its only a fraction of the size of the full end user redistributable. Now make sure to include all those files in your installer, then at install time, extract them to a temp folder and run DXSETUP.exe with the /silent argument like so

dxsetup.exe /silent

It is critical that you use the /silent argument as this prevents the DirectX installer UI popping up during your own install. Don’t worry if the user already has the prerequisites, the dxsetup program is smart enough to not install anything that is already installed.

Here’s an example from Junkship’s nsis installer script which does the DirectX install

; install directx11 update
Section "Install Directx 11 June 2010 update"

    SetOutPath $TEMP\Junkship
    File dxredist\*.cab
    File dxredist\*.dll
    File dxredist\DXSETUP.exe

    ExecWait '"$TEMP\Junkship\dxsetup.exe" /silent'
    SetOutPath $TEMP
    RMDIR /r $TEMP\Junkship\dxredist

Step 4 – Profit!

That’s it! (though if you are using DirectX 11 and want to support windows Vista users, you may need to do some additional prerequisite checking as detailed in this article)

NOTE: Most of the information included from this post was sourced from this MSDN article if you want to know more

Porting to Directx 11 - A Code Odyssey

Posted on Nov 4, 2012

Well its finally done. Junkship and the MGDF framework has been fully ported from DirectX 9 to DirectX 11, and while it wasn’t something that I can say I particularly enjoyed at times – I can see in hindsight that the pain endured was worthwhile. Along the way I learnt a whole lot and recorded a number of the issues I encountered, which I will outline here for the benefit of any other wary travellers who are considering going down the same path.

First things first, DirectX 11 is VERY different to DirectX 9. I skipped the whole DirectX 10 thing, so I can’t say for certain but it looks that for the most part, porting from DX10 to DX11 is relatively trivial as most of the API’s are the same (just with all the 10’s changed to 11’s). Unfortunately this is not the case with DX9 to DX11 (and so too with DX9 to DX10) – Almost every API is different, and there isn’t always a clear upgrade path for certain DX9 features. In addition, DX11 is almost universally more verbose than DX9, but in exchange you get much more flexibility. DX9 is very restrictive in certain areas, mostly due to its support for the old fixed function pipeline, and in DX11 these restrictions are gone in favor of having to write more boilerplate.

Domain shaders and Tessellation are awesome

Ok, technically this has nothing to do with porting, but its cool so I’m going to include it. I implemented displacement mapping (bump mapping is just faking it… displacement mapping is the real deal) for the asteroid rendering in Junkship using the new tessellation features in DX11. To see the difference displacement mapping makes to the asteroid geometry compare the screenshot on the left (enabled) with the right (disabled) (though it does bring less capable GPU’s to their knees)


Sane initialization and feature support

One of the main problems with DX9 is that there was no clear minimum feature set required by hardware, this meant you had to laboriously check for support for pretty much every specific feature of the API you wanted to use. With DX10, this is no longer an issue as there is a clearly defined minimum feature set. In DX11 the ability to target a number of different feature levels has been added, so If you have a DX9 game and you are worried about using DX11 because you will leave users with old machines out in the cold – Don’t worry, you can use the DX11 API but specify that you want to support DX9 hardware (The table of feature levels and the minimum requirements are listed here)

In addition, the code for initializing DirectX and managing transitions from windowed to fullscreen mode has been improved. You no longer have to worry about resizing the window rectangle explicitly (Though you will still have to deal with resizing the back buffer)

No more lost devices (Great for GPU generated procedural resources)

In DX11 you no longer have to worry about the device getting lost. This means that you no longer have to regenerate resources from the D3DPOOL_DEFAULT (In fact the notion of putting resources in a DEFAULT or MANAGED pool no longer exists) every time the window is resized, or the user presses alt-tab. This is particularly important for Junkship as I generate large amounts of textures procedurally on the GPU, so in DX9 these resources had to be D3DPOOL_DEFAULT which meant that to prevent having to regenerate the resources every time the device became lost I had to copy each generated resource into a D3DPOOL_MANAGED resource. The problem is is that copying back to the managed pool in DX9 is very slow at the best of times, and ridiculously slow at the worst as GPU’s usually batch instructions and run independently of whatever the CPU is up to, so trying to copy a resource from the GPU before it has finished generating causes a pipeline stall which kills performance. The tricky thing is that you can’t be sure on the CPU side when the GPU has finished, so you have to make a best guess and wait and hope that you don’t stall the pipeline when you try to copy. Not having to do this pointless copying DOUBLED the speed of Junkship’s procedural texture generation.

Multithreading friendly

DX9 by default isn’t thread safe, and enabling multithreading support wasn’t recommended as the performance suffered considerably, luckily DX11 is considerably more multithreading friendly. Instead of executing instructions using the d3d device like in DX9, in DX11 you can construct any number of device contexts which can batch up commands on different threads and submit them to the GPU separately (the commands are still serialized on the GPU side, but this is largely transparent to the API user). Another nice feature is that you can run D3D on a thread other than the main window thread (nice if you want to have a separate render thread from the input poll loop or simulation thread) which wasn’t possible in DX9.

D3D Debug mode is useful and not a global setting

In DX9, there’s a switch in the DirectX control panel to enable D3D debug mode, which unfortunately applies to every D3D application on your system. Unfortunately I would regularly forget to switch it back to retail mode and would wonder why games I was playing would perform poorly or have strange graphical issues. In fact during one period in which I was addicted to Galactic civilizations 2 the in game minimap started appearing blank. Amazingly I put up with this for months until one day I noticed that D3D was running in debug mode. Returning it to retail mode fixed the issue </facepalm>. DX11 has a much more sane approach of letting you define whether debug mode is enabled for specific applications. Also from my experience the debug messages in DX11 seem to be much easier to follow than their more cryptic DX9 counterparts.

D3DX is dead, so is the D3DX effects framework – Don’t use them

The venerable D3DX library has been officially deprecated in Windows 8, and so while they are still available for use with DX11, I wouldn’t recommend it. There are decent alternatives for most of the functionality provided in D3DX in the DirectXTex and DirectXTK libraries, and all the math related functionality in D3DX now exist in the built in xnamath (if you’re using the DirectX SDK) or directX math (Which is just a renamed version of xnamath if you’re using the Win8 SDK) libraries. In addition the effects framework is no longer a core part of D3DX and is instead supplied as a separate source code library, however once again I wouldn’t recommend using it. There are three main reasons, firstly is that D3DX style .fx files are now deprecated by the HLSL compiler, secondly the D3DX effects framework makes use of shader reflection and run time compilation which is verboten if you want to write a metro style game, and thirdly – its just not that good. With a little work you should be able to write an effects system that is more tailored to your games needs and is also more flexible. I wrote my own effects management system and now I get all sorts of nice benefits like altering shaders while the program is running and seeing the rendering change accordingly – I might write some more about this in future if I get time.

Compile your shaders offline

If you weren’t already, compile your shaders offline using the FXC compiler supplied in the SDK (the flags etc. for making it work can be found here). Going forward runtime shader compilation will not be supported for Windows 8 Metro applications and should only ever be done for debug builds on your dev machine. I prefer to precompile my shaders even when developing as it lets me know instantly if my shaders are valid without having to boot up the game. To do this I wrote a small program that runs as a post build step which resolves all shader last modified times + the last modified times of any #imported shader fragments invokes FXC on them (I found this easier than configuring custom build tasks for the hlsl files in visual studio as it provides a centralized place to setup what flags etc. I want to compile with and to apply any custom processing required as a part of my effects framework). This is something that I might post up to GitHub in the future once I tidy it up.

No point sprites – Use instancing or geometry shaders

In Junkship I used the DX9 point sprites feature to render the background starfield, in DX11 the point sprites feature has been removed so to get the equivalent functionality you have two choices – Object instancing or geometry shaders. To get a point sprite effect using object instancing involves creating two vertex buffers – the first being a quad, the other being a matrix transform defining how to orient that quad for every sprite you wish to display. To render the sprites, you would then use a pixel shader which draws a texture over the quad, and using the ID3D11DeviceContext::DrawIndexedInstanced method will ensure that the sprite will be drawn for every orientation specified in the second vertex buffer. The other option (the one I chose) was to use a Geometry shader. Geometry shaders are a relatively new type of shader (introduced in DX10) which allow the GPU to generate new vertices in addition to those supplied via the usual vertex buffers. To render billboard style point sprites (sprites that always face the camera) you will need to supply all the positions of the point sprites in a vertex buffer, then in a geometry shader create 4 vertices about this position that exist on a plane that the position->camera vector is normal to (the HLSL code for this is shown below) Once this is done you should be able to use the same pixel shader as used by the instanced object technique.

[maxvertexcount(4)] // point sprite quad, so 4 vertices

void main(point Starfield_VSOutput input[1],inout TriangleStream<Starfield_GSOutput> stream)
    float3 right;
    float3 up;

    if (abs(input[0].LocalPosition.y) < 0.5)
        right = cross(UP,input[0].LocalPosition);
        up = cross(input[0].LocalPosition,right);
        //calculate the positions using absolute right instead
        //of absolute UP when the position is near the north/south
        //poles to avoid errors in the cross product with near
        //parallel vectors
        up = cross(input[0].LocalPosition,RIGHT);
        right = cross(input[0].LocalPosition,up * -1);

    up = normalize(up);
    right = normalize(right);

    float intensity =  saturate(1.0 - length(input[0].LocalPosition));
    float scale =  MinScale + (intensity * (MaxScale - MinScale));
    float3 v1 = input[0].LocalPosition    + (right * scale) - (up * scale);
    float3 v2 = input[0].LocalPosition  + (right * scale) + (up * scale);
    float3 v3 = input[0].LocalPosition    - (right * scale) - (up * scale);
    float3 v4 = input[0].LocalPosition    - (right * scale) + (up * scale);

    Starfield_GSOutput output;
    output.UV = float2(1.0,1.0);
    output.Intensity = intensity;
    output.LocalPosition = input[0].LocalPosition;
    output.Position = mul(float4(v1,1.0),WorldViewProjection);

    output.UV = float2(1.0,0.0);
    output.Intensity = intensity;
    output.LocalPosition = input[0].LocalPosition;
    output.Position = mul(float4(v2,1.0),WorldViewProjection);

    output.UV = float2(0.0,1.0);
    output.Intensity = intensity;
    output.LocalPosition = input[0].LocalPosition;
    output.Position = mul(float4(v3,1.0),WorldViewProjection);

    output.UV = float2(0.0,0.0);
    output.Intensity = intensity;
    output.LocalPosition = input[0].LocalPosition;    
    output.Position = mul(float4(v4,1.0),WorldViewProjection);

Direct2D/DirectWrite doesn’t work with DX11

some reason Microsoft decided not to add support in Windows 7 for Directwrite/Direct2d access to DX11 surfaces (apparently this is remedied in Windows 8), which means that to render text one either has to rely on sprite fonts or (warning WTF approaching) create a DX10.1 device that uses DirectWrite + Direct2D to render text to a shared backbuffer with DX11. Instead of dealing with this mess I used the FW1FontWrapper library, which has met my needs thus far.

Ditch DirectInput (If you haven’t already)

I have been dragging my heels on this for a number of years as I know the use of DirectInput is deprecated and not recommended, so I finally bit the bullet and switched to using RawInput. Despite the (horrendously bad) documentation on how to use it, it actually proved to be considerably simpler than the DirectInput code that it replaced. One thing to note with RawInput is that if you want responsive input you have to ensure that your main window thread processes messages quickly and doesn’t experience any delays. For this reason I moved all rendering into a separate thread (the game simulation was already) as I didn’t want the input latency to be tied to the rendering frame rate. So now the window thread is now only handling input and windows API messages, and rendering and game simulation are run on two separate threads. (see here for how I set up the sim/rendering/input threads in the MGDF framework that Junkship runs on)

Visual studio 2012 Graphical debugging is great

I also upgraded to Visual Studio 2012, which has a great new feature for debugging DirectX applications. Its pretty much the old PIX tool that we all know and love that has been given a serious facelift (PIX was great functionally, but the UI was truly horrible) and integrated into Visual Studio. For me this is a killer feature over VS2010 (Once you get over the ALL CAPS MENUS which everyone seems to be enraged about – Its kind of weird but doesn’t really annoy me to be honest)


So now Junkship is fully ported (and leaner and faster than ever) I can get back to doing the fun stuff again Smile

Timing is everything

Posted on Jul 3, 2012 graphics programming


According to Donald Knuth

We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil

Now this seems like sage wisdom, and its wisdom I agree with, however in practice I’ve found that if you begin a project without thinking at all about the performance considerations of your design decisions, you’ll come to regret it later. After you get annoyed at how slow things are, you’ll decide to do some optimizin’ and you get a couple of low hanging fruit out of the way but then quickly find that there are no more hot-spots left, but the program still performs horribly because of numerous small architectural problems. So how does one reconcile getting the important things done and not getting bogged down by excessive optimization, but also ensure that your program isn’t carrying the weight of accumulated poor decisions?

Now I’m not going to claim any silver bullets here – but for me the key is to build in good instrumentation and profiling information early on, so that the performance impact of each new feature is immediately apparent. This ensures that you can control bloat as it appears, but also means that you can plow on ahead with new features if the profiling shows no performance red flags. Now profiling CPU time is a relatively trivial thing, you can use the QueryPerformanceFrequency (warning this blog post is going to be windows and DirectX centric) function to get the frequency of the high resolution timer, then take timing samples using QueryPerformanceCounter, and finally divide the sample differences by the frequency and voila! accurate profiling information!

Unfortunately though in this day and age, most games are GPU - rather than CPU - bound and the previous timing method will not give you any useful information as to what’s eating your GPU’s cycles. The reason for this is that the GPU runs in parallel to the CPU, (usually a frame or two behind the CPU) so most DirectX API calls are largely non-blocking on the CPU side of things; they just queue up commands which the GPU will execute later. This means that if you sample the CPU time before and after Direct3D API calls, all you are measuring is the direct CPU cost of the call, not the time that the GPU will use executing those commands at some point in the future.

Luckily DirectX does have a means of measuring the time taken on the GPU side of things, but due to the asynchronous nature of the GPU, it can be a bit tricky to work with. In DirectX 9 (Dx10 & 11 are largely the same) it’s a 3 step process, firstly we create some Query objects, we then issue the queries either side of the API calls we want to profile, then at some point in the future we ask for the result of those queries when they become available. The last part is the trickiest, as we don’t want to stall the CPU waiting for the GPU to give us the result of our queries, we want to buffer the responses so that we’ll only try to get the results after we’re pretty sure that they are ready (i.e. a few frames in the future). This does mean that the profiling information for the GPU is going to lag behind by a few frames, but in practice this isn’t a big issue. The important bits of code are as follows

//determine if the queries we need for timing are supported

if (tsHr || tsdHr || tsfHr)
    //oh no, the timing queries we wan't aren't supported BAIL OUT!


//The disjoint query is used to notify whether the frequency

//changed during the sampling interval. If this is the case 

//Then we'll have to throw out our measurements for that interval

IDirect3DQuery9* disjointQuery;
device->CreateQuery(D3DQUERYTYPE_TIMESTAMPDISJOINT, &disjointQuery);

//This query will get us the tick frequency of the GPU, we will use

//this to convert our timing samples into seconds.

IDirect3DQuery9* frequencyQuery;
device->CreateQuery(D3DQUERYTYPE_TIMESTAMPFREQ, &frequencyQuery);

//these two queries will record the beginning and end times of our

//sampling interval

IDirect3DQuery9* t1Query;

device->CreateQuery(D3DQUERYTYPE_TIMESTAMP, &t1Query);

IDirect3DQuery9* t2Query;
device->CreateQuery(D3DQUERYTYPE_TIMESTAMP, &t2Query);


//before we start rendering





//a few frames later, lets try and get the result of the query

BOOL disjoint;
if (disjointQuery->GetData(&disjoint,sizeof(BOOL),0) == S_OK)
    //if the timing interval was not disjoint then the measurements

    //are valid

    if (!disjoint)
        UINT64 frequency;
        if (frequencyQuery->GetData(&frequency,sizeof(UINT64),0) == S_OK)
            UINT64 timeStampBegin;
            if (t1->GetData(&timeStampBegin,sizeof(UINT64),0) != S_OK)
                return;//not ready yet


            UINT64 timeStampEnd;
            if (t2->GetData(&timeStampEnd,sizeof(UINT64),0) != S_OK)
                return;//not ready yet


            UINT64 diff = timeStampEnd - timeStampBegin;
            //The final timing value in seconds.

            double value = ((double)diff/frequency);

Another important point to note is that these query objects MUST be released whenever the D3D device is lost/reset. I haven’t included code for doing this or ensuring that the commands are buffered correctly, however if you want a complete working example check out the Timer class in the MGDF framework that powers junkship.