October 26, 2009

Octopussies...


Nope - I'm not referring to multiples of a James Bond Film - I'm referring instead to a family tradition.

My mother has for as long as I can remember given an 'Octopussy' to the young children of family and friends. An Octopussy is a friendly octopus crocheted out of wool:

To start the body:
  • Make a chain of 6 or 7 stitches and then form a circle using whatever colour wool you have to hand..
  • Using double crochet make a flat circle about 2 or 3 inches in diameter.
  • Continue adding rows to the circle without increasing the number of stitches and this will form a cup shape.
  • Change to white to continue in order to make a space for the eyes to be embroidered.
  • Continue until you have a nice wide band for the eyes.
Now make 8 thick crochet legs as long as you like:
  • Start trimming the edge with double crochet in a contrasting colour going through the middle of the chain.
  • Continue down to one end and crochet in the long strands.
  • Turn and do a second row of double crochet and turn again to give a 3 row 'foot'.
  • Continue until you reach the end that you started with.
  • Using the loose strands from each of the feet pull them through the flat circle at the base of the body and tie them off to attach the legs.
Finish the body;
  • Embroider the eyes.
  • Change colour from the white to whatever colour you like.
  • Continue with crochet decreasing with each row.
  • When you are close to the end stuff the body with a non-toxic filling (my mother uses a form of foam chips).
  • finish the top of the body and pull final thread through to lose it in the body.
  • Make a long chain with two or three strands of wool.
  • Pull the chain halfway through the top of the Octopussy placing a knot so that it cannot be shifted.
  • The chain can then be tied onto a convenient point so that the Octopussy cannot be thrown out of a pram or cot.
The Octopussy contains only wool, thread and stuffing so there are no small hard parts that can be pulled off and cause a hazard. It is very robust to chewing and is fully washable.

It also has the advantage of leaving the child with a friendly view of Octopi!

October 13, 2009

GPGPU Mandelbrot with OpenCL and Java

I've done a fair amount of learning since I last posted. OpenCL stopped being just a specification and now has a concrete implementation.

I upgraded my MacBook to Snow Leopard and got access to a fairly solid implementation. I say 'fairly' because it exhibits a few interesting quirks... I can pretty reliably get it to throw memory access errors even when I am doing nothing exotic. More about this if I can nail down why it is happening.

I've recently installed ATI Stream SDK 2.0 beta 2 to my 'Windows Beast' only to discover that beta 3 is the only one so far that works with the toolkit that I am using.

At present I am using OpenCL4Java to provide the Java bindings to OpenCL they are developing fast and have the advantage of using JNA so there needs to be no native installs past the OpenCL drivers.

It has proved both easier and harder than I expected - I hadn't realised how much C I had forgotten, but then again the syntactical similarities with Java made it easier for me to get to grips with.

I walked through the various examples and tinkered with them until I had at least a basic grip. For my first program I decided to use a nearly perfect algorithm for scaling on multiple threads, the Mandelbrot set. Pseudo code for the algorithm is widely disseminated but to be honest the OpenCL implementation is clear enough. I'll present my take here and with it I will highlight some of the interesting features of OpenCL.
__kernel mandelbrot(
const float deltaReal,
const float deltaImaginary,
const float realMin,
const float imaginaryMin,
const unsigned int maxIter,
const unsigned int magicNumber,
const unsigned int hRes,
__global int* outputi
)
{
int xId = get_global_id(0);
int yId = get_global_id(1);

float realPos = realMin + (xId * deltaReal);
float imaginaryPos = imaginaryMin + (yId * deltaImaginary);
float real = realPos;
float imaginary = imaginaryPos;
float realSquared = real * real;
float imaginarySquared = imaginary * imaginary;

int iter = 0;
while ( (iter < maxIter) && ((realSquared + imaginarySquared) < magicNumber) )
{
imaginary = (2 * (real * imaginary)) + imaginaryPos;
real = realSquared - imaginarySquared + realPos;
realSquared = real * real;
imaginarySquared = imaginary * imaginary;
iter++;
}
if(iter >= maxIter){
iter = 0;
}
outputi[(yId * hRes) + xId] = iter;
}
First thing I'll highlight is the fact that I've only non-pixel specific parameters - all pixel specific variables are calculated in the OpenCL program. At present I'm only using floats - the implementations have yet to support doubles.

The coordinates of the pixel / work group are retrieved using the OpenCL specific function : get_global_id(int dimension). Currently Open CL supports 1,2 and 3 dimensional coordinate spaces.

The OpenCL language is mainly derived from C99 and should be pretty easy for C/C++/Java developers to read. The main difference is exclusions - a lot of the default libraries and some control of flow instructions. There are a few additions such as the work group / unit functions and the '__kernel' keyword.

The OpenCL4Java source code is pretty simple:
package bbbob.gparallel.mandelbrot;
import com.nativelibs4java.opencl.*;
import static com.nativelibs4java.opencl.OpenCL4Java.*;
import com.nativelibs4java.util.NIOUtils;

import javax.imageio.ImageWriter;
import javax.imageio.ImageIO;
import javax.imageio.stream.ImageOutputStream;
import java.io.InputStreamReader;
import java.io.Reader;
import java.io.IOException;
import java.io.File;
import java.nio.IntBuffer;
import java.awt.image.BufferedImage;

public class Mandelbrot {
//boundary of view on mandelbrot set

public static void main(String[] args) throws IOException, CLBuildException {

//Setup variables for parameters

//Boundaries
float realMin = -2.25f; //-0.19920f; // -2.25
float realMax = 0.75f; //-0.12954f; // 0.75
float imaginaryMin = -1.5f; //1.01480f; // -1.5
float imaginaryMax = 1.5f; //1.06707f; // 1.5

//Resolution
int realResolution = 640; // TODO validate against device capabilities
int imaginaryResolution = 640;

//The maximum iterations to perform before returning and assigning 0 to a pixel (infinity)
int maxIter = 64;

//TODO describe what this number means...
int magicNumber = 4;

//Derive the distance in imaginary / real coordinates between adjacent pixels of the image.
float deltaReal = (realMax - realMin) / (realResolution-1);
float deltaImaginary = (imaginaryMax - imaginaryMin) / (imaginaryResolution-1);

//Setup output buffer
int size = realResolution * imaginaryResolution;
IntBuffer results = NIOUtils.directInts(size);

//TODO use an image object directly.
//CL.clCreateImage2D(context.get(), 0, OpenCLLibrary);
//TODO set up a Float4 array in order to be able to provide a colour map.
//This depends on whether we will be able to pass in a Float4 array as an argument in the future.

//Read the source file.
String src = readFully(
new InputStreamReader(Mandelbrot.class.getResourceAsStream("opencl/mandelbrot.cl")),
new StringBuilder()
).toString();

buildAndExecuteKernel(realMin, imaginaryMin, realResolution, imaginaryResolution, maxIter, magicNumber,
deltaReal, deltaImaginary, results, src);


outputResults(realResolution, imaginaryResolution, results);
}

private static void outputResults(int realResolution, int imaginaryResolution,
IntBuffer results) {
int[] outputResults = new int[realResolution * imaginaryResolution];

results.get(outputResults);
BufferedImage image = new BufferedImage(realResolution, imaginaryResolution, BufferedImage.TYPE_INT_RGB);
for(int y = 0; y < imaginaryResolution; y++){
int rowPos = y * imaginaryResolution;
for(int x = 0; x < realResolution; x++){
image.setRGB(x,y,outputResults[rowPos + x] * 32 );
}
}

try {
ImageWriter writer = ImageIO.getImageWritersByFormatName("gif").next();
ImageOutputStream stream = ImageIO.createImageOutputStream(new File("test.gif"));
writer.setOutput(stream);
writer.write(image);
} catch (IOException e) {
e.printStackTrace(); //To change body of catch statement use File | Settings | File Templates.
}


// for (int i = 0; i < outputResults.length; i++) {
// if((i % realResolution) == 0 ){
// System.out.print("\n");
// }
// int outputResult = outputResults[i];
// if(outputResult == 0){
// System.out.print("0");
// }else{
// System.out.print("" + outputResult % 10);
// }
// }
}

private static void buildAndExecuteKernel(float realMin, float imaginaryMin, int realResolution,
int imaginaryResolution, int maxIter, int magicNumber, float deltaReal,
float deltaImaginary, IntBuffer results, String src) throws CLBuildException {
//TODO build some intelligence into the mechanism for pulling out platforms and devices.
CLPlatform[] platforms = listPlatforms();
CLDevice[] devices = platforms[0].listGPUDevices(false);

//Create a context and program using the devices discovered.
CLContext context = platforms[0].createContext(devices);
CLProgram program = context.createProgram(src).build();

//Create a kernel instance from the mandelbrot kernel, passing in parameters.
CLKernel kernel = program.createKernel(
"mandelbrot",
deltaReal,
deltaImaginary,
realMin,
imaginaryMin,
maxIter,
magicNumber,
realResolution,
context.createIntBuffer(CLMem.Usage.Output, results, false)
);

//Enqueue and complete work using a 2D range of work groups corrsponding to individual pizels in the set.
//The work groups are 1x1 in size and their range is defined by the desired resolution. This corresponds
//to one device thread per pixel.
CLQueue queue = context.createDefaultQueue();
kernel.enqueueNDRange(queue, new int[]{realResolution, imaginaryResolution}, new int[]{1,1});
queue.finish();
}

public static CharSequence readFully(Reader reader, StringBuilder builder) throws IOException {

char[] buffer = new char[8192];
for (int readLen = reader.read(buffer); readLen >= 0; readLen = reader.read(buffer)){
builder.append(buffer, 0, readLen);
}
return builder;
}

}

I'm finding it pretty easy to work with OpenCL - now I need to identify some more complex problems to solve.