I've done a fair amount of learning since I last posted. OpenCL stopped being just a specification and now has a concrete implementation.
I upgraded my MacBook to Snow Leopard and got access to a fairly solid implementation. I say 'fairly' because it exhibits a few interesting quirks... I can pretty reliably get it to throw memory access errors even when I am doing nothing exotic. More about this if I can nail down why it is happening.
I've recently installed ATI Stream SDK 2.0 beta 2 to my 'Windows Beast' only to discover that beta 3 is the only one so far that works with the toolkit that I am using.
At present I am using
OpenCL4Java to provide the Java bindings to OpenCL they are developing fast and have the advantage of using JNA so there needs to be no native installs past the OpenCL drivers.
It has proved both easier and harder than I expected - I hadn't realised how much C I had forgotten, but then again the syntactical similarities with Java made it easier for me to get to grips with.
I walked through the various examples and tinkered with them until I had at least a basic grip. For my first program I decided to use a nearly perfect algorithm for scaling on multiple threads, the Mandelbrot set. Pseudo code for the algorithm is widely disseminated but to be honest the OpenCL implementation is clear enough. I'll present my take here and with it I will highlight some of the interesting features of OpenCL.
__kernel mandelbrot(
const float deltaReal,
const float deltaImaginary,
const float realMin,
const float imaginaryMin,
const unsigned int maxIter,
const unsigned int magicNumber,
const unsigned int hRes,
__global int* outputi
)
{
int xId = get_global_id(0);
int yId = get_global_id(1);
float realPos = realMin + (xId * deltaReal);
float imaginaryPos = imaginaryMin + (yId * deltaImaginary);
float real = realPos;
float imaginary = imaginaryPos;
float realSquared = real * real;
float imaginarySquared = imaginary * imaginary;
int iter = 0;
while ( (iter < maxIter) && ((realSquared + imaginarySquared) < magicNumber) )
{
imaginary = (2 * (real * imaginary)) + imaginaryPos;
real = realSquared - imaginarySquared + realPos;
realSquared = real * real;
imaginarySquared = imaginary * imaginary;
iter++;
}
if(iter >= maxIter){
iter = 0;
}
outputi[(yId * hRes) + xId] = iter;
}
First thing I'll highlight is the fact that I've only non-pixel specific parameters - all pixel specific variables are calculated in the OpenCL program. At present I'm only using floats - the implementations have yet to support doubles.
The coordinates of the pixel / work group are retrieved using the OpenCL specific function : get_global_id(int dimension). Currently Open CL supports 1,2 and 3 dimensional coordinate spaces.
The OpenCL language is mainly derived from C99 and should be pretty easy for C/C++/Java developers to read. The main difference is exclusions - a lot of the default libraries and some control of flow instructions. There are a few additions such as the work group / unit functions and the '__kernel' keyword.
The OpenCL4Java source code is pretty simple:
package bbbob.gparallel.mandelbrot;
import com.nativelibs4java.opencl.*;
import static com.nativelibs4java.opencl.OpenCL4Java.*;
import com.nativelibs4java.util.NIOUtils;
import javax.imageio.ImageWriter;
import javax.imageio.ImageIO;
import javax.imageio.stream.ImageOutputStream;
import java.io.InputStreamReader;
import java.io.Reader;
import java.io.IOException;
import java.io.File;
import java.nio.IntBuffer;
import java.awt.image.BufferedImage;
public class Mandelbrot {
//boundary of view on mandelbrot set
public static void main(String[] args) throws IOException, CLBuildException {
//Setup variables for parameters
//Boundaries
float realMin = -2.25f; //-0.19920f; // -2.25
float realMax = 0.75f; //-0.12954f; // 0.75
float imaginaryMin = -1.5f; //1.01480f; // -1.5
float imaginaryMax = 1.5f; //1.06707f; // 1.5
//Resolution
int realResolution = 640; // TODO validate against device capabilities
int imaginaryResolution = 640;
//The maximum iterations to perform before returning and assigning 0 to a pixel (infinity)
int maxIter = 64;
//TODO describe what this number means...
int magicNumber = 4;
//Derive the distance in imaginary / real coordinates between adjacent pixels of the image.
float deltaReal = (realMax - realMin) / (realResolution-1);
float deltaImaginary = (imaginaryMax - imaginaryMin) / (imaginaryResolution-1);
//Setup output buffer
int size = realResolution * imaginaryResolution;
IntBuffer results = NIOUtils.directInts(size);
//TODO use an image object directly.
//CL.clCreateImage2D(context.get(), 0, OpenCLLibrary);
//TODO set up a Float4 array in order to be able to provide a colour map.
//This depends on whether we will be able to pass in a Float4 array as an argument in the future.
//Read the source file.
String src = readFully(
new InputStreamReader(Mandelbrot.class.getResourceAsStream("opencl/mandelbrot.cl")),
new StringBuilder()
).toString();
buildAndExecuteKernel(realMin, imaginaryMin, realResolution, imaginaryResolution, maxIter, magicNumber,
deltaReal, deltaImaginary, results, src);
outputResults(realResolution, imaginaryResolution, results);
}
private static void outputResults(int realResolution, int imaginaryResolution,
IntBuffer results) {
int[] outputResults = new int[realResolution * imaginaryResolution];
results.get(outputResults);
BufferedImage image = new BufferedImage(realResolution, imaginaryResolution, BufferedImage.TYPE_INT_RGB);
for(int y = 0; y < imaginaryResolution; y++){
int rowPos = y * imaginaryResolution;
for(int x = 0; x < realResolution; x++){
image.setRGB(x,y,outputResults[rowPos + x] * 32 );
}
}
try {
ImageWriter writer = ImageIO.getImageWritersByFormatName("gif").next();
ImageOutputStream stream = ImageIO.createImageOutputStream(new File("test.gif"));
writer.setOutput(stream);
writer.write(image);
} catch (IOException e) {
e.printStackTrace(); //To change body of catch statement use File | Settings | File Templates.
}
// for (int i = 0; i < outputResults.length; i++) {
// if((i % realResolution) == 0 ){
// System.out.print("\n");
// }
// int outputResult = outputResults[i];
// if(outputResult == 0){
// System.out.print("0");
// }else{
// System.out.print("" + outputResult % 10);
// }
// }
}
private static void buildAndExecuteKernel(float realMin, float imaginaryMin, int realResolution,
int imaginaryResolution, int maxIter, int magicNumber, float deltaReal,
float deltaImaginary, IntBuffer results, String src) throws CLBuildException {
//TODO build some intelligence into the mechanism for pulling out platforms and devices.
CLPlatform[] platforms = listPlatforms();
CLDevice[] devices = platforms[0].listGPUDevices(false);
//Create a context and program using the devices discovered.
CLContext context = platforms[0].createContext(devices);
CLProgram program = context.createProgram(src).build();
//Create a kernel instance from the mandelbrot kernel, passing in parameters.
CLKernel kernel = program.createKernel(
"mandelbrot",
deltaReal,
deltaImaginary,
realMin,
imaginaryMin,
maxIter,
magicNumber,
realResolution,
context.createIntBuffer(CLMem.Usage.Output, results, false)
);
//Enqueue and complete work using a 2D range of work groups corrsponding to individual pizels in the set.
//The work groups are 1x1 in size and their range is defined by the desired resolution. This corresponds
//to one device thread per pixel.
CLQueue queue = context.createDefaultQueue();
kernel.enqueueNDRange(queue, new int[]{realResolution, imaginaryResolution}, new int[]{1,1});
queue.finish();
}
public static CharSequence readFully(Reader reader, StringBuilder builder) throws IOException {
char[] buffer = new char[8192];
for (int readLen = reader.read(buffer); readLen >= 0; readLen = reader.read(buffer)){
builder.append(buffer, 0, readLen);
}
return builder;
}
}
I'm finding it pretty easy to work with OpenCL - now I need to identify some more complex problems to solve.