oZone3D.Net Tutorials - Mandelbrot Fractal - GPGPU Programming

The Mandelbrot Set: Colors of Infinity

By Nathan Reed - www.reedbeta.com
and
J�r�me Guinot - www.oZone3D.Net

Initial draft: January 25, 2006

[ Index ]

4 - Mandelbrot on the GPU

The classical algorithm for a Mandelbrot renderer has been known for decades, and most graphics enthusiasts have at one time or another written one. Today, however, is the era of the Graphics Processing Unit. Instead of rendering fractals on the CPU, we can exploit the greater power and parallelism of a GPU to make fractal rendering much faster. Let's see how we can adapt the above algorithm to the GPU.

4.1 - A First Attempt

Fortunately for us, the algorithm for rendering the Mandelbrot set is on a per-pixel basis. Each pixel is processed the same way and independently of all the others. This means we can simply render a quad that covers the whole screen, and write a fragment shader to perform the iteration process:

uniform vec4 insideColor;
uniform sampler1D outsideColorTable;
uniform float maxIterations;
void main ()
{
	vec2 c = gl_TexCoord[0].xy;
	vec2 z = c;
	gl_FragColor = insideColor;
	for (float i = 0; i < maxIterations; i += 1.0)
	{
		z = vec2(z.x*z.x - z.y*z.y, 2.0*z.x*z.y) + c;
		if (dot(z, z) > 4.0)
		{
			gl_FragColor = texture1D(outsideColorTable, 
			i / maxIterations);
			break;
		}
	}
}

This is precisely the same algorithm as was presented above; it has been rewritten in GLSL (OpenGL Shading Language).

It assumes that you have set up the color table as a 1D texture map in outsideColorTable and that the texture coordinates passed to the shader correspond to complex numbers. Then the shader calculates the sequence using complex multiplication and checks each value to see if it escapes from the radius-2 circle.

Unfortunately, this shader will be very slow, and you will not be able to render the fractal in real-time, unless maxIterations is set very low. Moreover, the looping and branching inside the shader means that it will not even run on graphics hardware that does not support SM 3.0. At the time of this writing, you need a GeForce 6600 or better GPU to even execute the above shader, and it would still be very slow even on a GeForce 7800, currently the most powerful card in the consumer market.

4.2 - Stream Processing

Instead of performing the Mandelbrot iteration in a single complicated fragment shader, a better approach is to use a multipass algorithm. To implement this, we can use the stream processing model of general purpose GPU computation. One shader generates a set of data, which is then used as input to another shader; the output of the second shader is the input to a third, and so forth. Since shaders cannot operate "in place," i.e. they cannot write to the same framebuffer from which they are reading, we must use a minimum of two buffers. We then ping-pong between these buffers, first using one as the input and the other as the output, then switching their roles for the next shader pass.

To render the Mandelbrot set in a multipass algorithm, we will use floating-point framebuffers to store the value of z_n at each pixel. Three shaders in total are used; the first sets up the initial values, by simply storing the texture coordinates (which represent the values of c, which equals z₁) to the red and green components of each pixel:

void main ()
{
	vec2 c = gl_TexCoord[0].xy;
	gl_FragColor = vec4(c, 0, 0);
}

The second shader will be executed over and over again. This shader actually performs the iteration, calculating the next value of z_n at each pixel. In this shader, we get the previous value z_n-1 by sampling a texture, which is the output of the previous pass; the value of c is provided by texture coordinates as before.

uniform sampler2D input;
uniform float curIteration;
void main ()
{
	// Lookup value from last iteration
	vec4 inputValue = texture2D(input, gl_TexCoord[0].xy);
	vec2 z = inputValue.xy;
	vec2 c = gl_TexCoord[0].xy;
	
	// Only process if still within radius-2 boundary
	if (dot(z, z) > 4.0)
		// Leave pixel unchanged (but copy 
		//through to destination buffer)
		gl_FragColor = inputValue;
	else
	{
		gl_FragColor.xy = vec2(z.x*z.x - z.y*z.y, 2.0*z.x*z.y) + c;
		gl_FragColor.z = curIteration;
		gl_FragColor.w = 0.0;
	}
}

As you can see, we are also storing n (the number of the current iteration, passed in as a uniform variable), in the blue component of the pixel. This is of use in the third shader, which is used for finally displaying the fractal. Its input is the floating-point buffer containing the final z_n values, but its output is an ordinary color buffer.

uniform sampler2D input;
uniform vec4 insideColor;
uniform sampler1D outsideColorTable;
uniform float maxIterations;
void main ()
{
	// Lookup value from last iteration
	vec4 inputValue = texture2D(input, gl_TexCoord[0].xy);
	vec2 z = inputValue.xy;
	
	// If Z has escaped radius-2 boundary, shade by outer color
	if (dot(z, z) > 4.0)
		gl_FragColor = texture1D(outsideColorTable, 
		inputValue.z / maxIterations);
	else
		gl_FragColor = insideColor;
}

This three-shader multipass algorithm can render the same image as our original shader, but will run on any hardware that supports floating-point buffers. In the ATI world, this includes the Radeon 9500 and up, and in nVIDIA hardware the GeForce 5200 and up—much more reasonable expectations. (Note that although two of the shaders use if statements, on pre-SM 3.0 cards this will produce a CMP instruction rather than an actual branch.)

The shaders presented above can be made to work together in a variety of different ways. The most obvious is to use the iterating shader a few times on off-screen surfaces and then display the result. However, a cool effect can be created by running the viewing shader after each iteration. The fractal animates, appearing to grow inward as more and more detail is revealed. Another possibility is to map the fractal onto a 3D surface, which can be done as easily as ordinary texture mapping, simply using the viewing shader to draw the triangles of the surface.

Two problems still remain unsolved in our improved GPU Mandelbrot renderer, though. First, there is no antialiasing on the fractal, since it is essentially point-sampled. This makes the generated images look rather ugly, unless they are rendered at a high resolution and then downsampled. The second problem is precision. Currently, ATI cards use a 24-bit floating-point format and nVIDIA cards a 32-bit format for the registers in the shading units, and the floating-point buffers also store only 32-bit floats. This means that one cannot zoom very far into the fractal before precision breaks down; one can get to about 10^-7 with 32-bit floats and only 10^-5 with 24-bit floats. This is currently an unavoidable limitation of GPU hardware; CPU-based Mandelbrot renderers use 64-bit floats or even provide their own implementations of arbitrary-precision floating-point arithmetic.

[ Index ]

�Next Page