September 08, 2011

On the Insanity of Expecting Nothing to Happen...

On my current project we're building an asynchronous message based system with various outputs, in particular a file for consumption by a mainframe.

On a number of occasions tests have been written that assert that something has not happened.

When you're testing synchronous processes this is a perfectly reasonable assertion to make, when the process returns all the work will be done.

For asynchronous processes?When the process returns then you have no direct way of knowing if it has completed.

For many people the most obvious way of checking that something has not happened in an asynchronous process is the counterpart for checking that something has happened asynchronously. You put in a timeout.

The problem with this is the likelihood of a given outcome. When expecting something to happen asynchronously you only have to wait until it has happened which means that you only ever wait until the timeout when something has gone wrong. When expecting something not to happen, you end up waiting for the time out most of the time. This means that for every assertion that nothing will happen you end up adding a fixed timeout that causes your test phase to increase in duration. 

When you use a time out to wait for something not to happen, you risk setting the timeout too short and the test becoming invalid because the thing you asserted wouldn't happen occurs after the test completes. This can lead to further increases in the timeouts to ensure that the tests remain valid.

There is a related problem with testing asynchronous processes where the test only checks an intervening step in the process before moving on the next test. This leaves you open to side effects from the processes started by the preceding tests causing hard to diagnose problems with your later tests. This is particularly likely with tests that assert that something doesn't happen as they have a greater chance of leaving the test before the process under test has completed. 

How do we avoid these problems and still be able to make assertions that something hasn't happened?

The 'Sentinel' pattern is a good place to start. 

Send a second request after the first that will produce a result after the first has completed, in this way you can assert that if you only see the result of the second request then you can state that the first request hasn't produced anything. Maybe you can ask the asynchronous process to do two things in one request and can then assert that only the effects of the additional work are visible. 

Alternatively you can look at other ways of knowing that the process under test has completed. Check the logging for a statement that is only produced when the process has completed or maybe introduce an event or message that is produced when the process completes regardless of whether the other output is produced. 

It is always a bad idea to expect 'nothing' to happen asynchronously, always expect something. 

April 04, 2011

Debugging Neo4J Spatial - Update 3 on Neo4J Spatial and British Isles OSM Data import.

I have been having a long discussion with Craig Taverner over an issue that I think that I have identified in the Neo4J spatial codebase with particular reference to the OSM code. Unfortunately I have completely failed to communicate the issue that I have perceived. Craig has variously believed that the issue is in my code or my misperception of the Neo4J spatial unit test code.

The actual issue is buried deep in private and protected methods within the Neo4J Spatial codebase in particular in the interaction between the spatial search classes and the OSM layer and index. I have yet think of a way to write a simple and direct unit test that will illustrate the problem so instead I am going to write this blog entry to explain what I found and how I found it.

I actually found the issue when I was attempting to debug why some of my unit test code was failing to return the results that I expected. It turned out that my problem was because I had swapped latitude and longitude on the point that I passed in. While debugging I spotted another issue in the way that the spatial search classes (in particular the org.neo4j.gis.spatial.AbstractSearch and org.neo4j.gis.spatial.query.AbstractSearchIntersection classes) interact with the OSM Layer.

The actual issue is that the AbstractSearch class implements its own getEnvelope(Node geomNode) method that takes a node as a parameter and uses the related layer's GeometryEncoder to parse the node and create the envelope. This method is used in AbstractSearchIntersection's needsToVisit(Node geomNode) method to determine whether to visit the node or any of its children.

When I execute the following code which I believe is correct:

SearchContain searchContain = new SearchContain(point);
osmLayer.getIndex().executeSearch(searchContain);

The OSMLayer's GeometryEncoder is used by AbstractSearch to decode the node's envelope and this produces an envelope with the bounding attributes mixed up.

For many searches this is not a huge issue and the search will appear to work. In the case of the Buckinghamshire (Bucks) search the Bucks envelope no longer subtended fractions of a degree but instead subtended 50 odd degrees of latitude and longitude which included Bucks' true envelope.

As far as I can make out the correct results are returned because within the OSM layer the nodes coordinates and bounding boxes are handled differently so even if the spatial search classes visit unnecessary nodes, the correct answers are returned. This means however that there may be performance issues due to unnecessary visits. Also I think that there may be situations where index nodes that should be visited aren't.

At a fundamental level I think that there is some pretty bad schizophrenia between the way that OSMGeometryEncoder handles envelopes in the Envelope decodeEnvelope(PropertyContainer container) method and the way that the RTreeIndex handles envelopes in the private Envelope bboxToEnvelope(double[] bbox) method.

My gut feeling is that all envelopes should be handled by the relevant GeometryEncoder rather than by a private method on an index, However the OSMGeometryEncoder just plain seems to get it wrong.

March 31, 2011

Update 2 on Neo4J Spatial and British Isles OSM Data import.

Well even 6GB of heap space isn't enough to fully import and re-index the British Isles Open Street Map Data set. I'm thinking that there will have to be some work on memory management somewhere. First thing I'm going to need to do is profile the import and work out where the memory is being used (I'm assuming it isn't a memory leak per se). Basically I want to determine whether it is because of a misconfiguration by me or there is a need to look at the memory usage in the OSMImporter class.

I'm also thinking that for reverse geo and local searches I really don't need the full OSM data set so a customised version of the OSM importer would be a good idea. I really don't need ways or the node data associated with them. I suspect a multi-pass approach would be a good idea. The first pass through the OSM data set determines which nodes are needed for the features that I want to import, the second pass imports the required nodes and their associated features, followed by a final re-indexing.

I've been in discussions with one of the people behind Neo4J Spatial, Craig Taverner, on the Neo4J User List. I found an issue with the bounding box used in the spatial search index visitor pattern. It's been quite illuminating to dig into this. I'd been thinking for a while that a number of the techniques that I encountered working with 3D graphics and scene graphs would be relevant and it definitely seems to be the case. I'm beginning to think that OpenGL or OpenCL backed indices could prove very useful in high throughput micro-batched situations where the bus latency can be hidden.

March 24, 2011

Update 1 on Neo4J Spatial and British Isles OSM Data import.

Successes and failures...

I managed to shepherd the import all the way through to import completion with 4096 MB of memory but unfortunately fell short when it came to the re-indexing. As I commented on my post I upped the memory to 6144 MB and restarted and it has again successfully imported and has been re-indexing overnight without falling over. I'm waiting to see how this goes.

During the past import I did a little more characterisation of the system utilisation and came up with the following: As reported before the node import process is single threaded and does not saturate my disk IO, what is more interesting is the way import which is heavily reliant on the already imported nodes. The way import saturated neither IO nor a single core on my CPU (again it appeared largely single threaded), I'm not sure what the problem is, it could be that the limit is the disk seek time / latency as the ways are composed of nodes already imported. I'm guessing that there would be a need to look at how the nodes are stored and whether any useful caching can be done to accelerate the way import.

The 'relation' import phase was too short for me to characterise.

The indexing phase does appear to be multi-threaded during a brief glance this morning before I left. using 2 cores quite consistently (4 cores are available).

Aside from the importing I've started playing with the API, using the Buckinhamshire data set that I imported on my laptop. So far not too impressed by performance, I used the code from some of the Neo4J spatial unit tests to do a way (highway) search from a given point. I chose coordinate in the middle of a lake (51.808721,-0.689735) and gradually expanded the search from 10m, 100m, 1000m, 10000m. quite slow but at least not an exponential increase in time - 14.4s, 85.8s, 88.7s, 86.1s.

Interesting to note that the OSM parts of the Neo4J Spatial codebase so far does not seem to have any convenience methods for area and boundary searches, only for way searches.

Here is my current (very rough) code below:
package com.presynt.neo4j;

import static org.junit.Assert.*;

import com.vividsolutions.jts.geom.Coordinate;
import com.vividsolutions.jts.geom.GeometryFactory;
import com.vividsolutions.jts.geom.Point;
import org.geotools.referencing.CRS;
import org.junit.AfterClass;
import org.junit.BeforeClass;
import org.junit.Ignore;
import org.junit.Test;
import org.neo4j.gis.spatial.Layer;
import org.neo4j.gis.spatial.SpatialDatabaseService;
import org.neo4j.gis.spatial.SpatialTopologyUtils;
import org.neo4j.gis.spatial.osm.OSMImporter;
import org.neo4j.gis.spatial.osm.OSMLayer;
import org.neo4j.graphdb.GraphDatabaseService;
import org.neo4j.kernel.EmbeddedGraphDatabase;
import org.neo4j.kernel.impl.batchinsert.BatchInserterImpl;

import javax.xml.stream.XMLStreamException;
import java.io.IOException;


public class TestNeo4JSpatial {

private static GraphDatabaseService graphDB;
private static SpatialDatabaseService spatialDB;

@Test
@Ignore
public void createSpatialDB() throws XMLStreamException, IOException {
OSMImporter importer = new OSMImporter("OSM-BUCKS");
BatchInserterImpl inserter = new BatchInserterImpl("/testDB");
importer.importFile(inserter ,"/OSMData/buckinghamshire.osm");//"british_isles.osm"); // "C:\\british_isles.osm");
inserter.getGraphDbService().shutdown();
graphDB = new EmbeddedGraphDatabase("/testDB");
importer.reIndex(graphDB, 100000);
graphDB.shutdown();
}

@Test
public void retrieveLayer(){
final Layer layer =
spatialDB.getLayer("OSM-BUCKS");
assertNotNull(layer);
assertEquals(OSMLayer.class, layer.getClass());
}

@Test
public void useLayer() {
final OSMLayer osmLayer = (OSMLayer)spatialDB.getLayer("OSM-BUCKS");
final GeometryFactory factory = osmLayer.getGeometryFactory();
System.out.println("Unit of measure: " + CRS.getEllipsoid(osmLayer.getCoordinateReferenceSystem()).getAxisUnit().toString());

final Point point = factory.createPoint(new Coordinate(51.808721,-0.689735));

// final Layer boundaryLayer = osmLayer.addSimpleDynamicLayer("boundary", "administrative");
// SearchContain searchContain = new SearchContain(point);
// boundaryLayer.getIndex().executeSearch(searchContain);
// for(SpatialDatabaseRecord record: searchContain.getResults()){
// System.out.println("Container:" + record);
// }

final Layer highwayLayer = osmLayer.addSimpleDynamicLayer("highway", null);
long startTime = System.currentTimeMillis();
SpatialTopologyUtils.findClosestEdges(point, highwayLayer, 10.0);
System.out.println("10m took:" + ((System.currentTimeMillis() - startTime) / 1000d) +"s");
startTime = System.currentTimeMillis();
SpatialTopologyUtils.findClosestEdges(point, highwayLayer, 10.0);
System.out.println("10m took:" + ((System.currentTimeMillis() - startTime) / 1000d) +"s");

startTime = System.currentTimeMillis();
SpatialTopologyUtils.findClosestEdges(point, highwayLayer, 100.0);
System.out.println("100m took:" + ((System.currentTimeMillis() - startTime) / 1000d) +"s");
startTime = System.currentTimeMillis();
SpatialTopologyUtils.findClosestEdges(point, highwayLayer, 100.0);
System.out.println("100m took:" + ((System.currentTimeMillis() - startTime) / 1000d) +"s");

startTime = System.currentTimeMillis();
SpatialTopologyUtils.findClosestEdges(point, highwayLayer, 1000.0);
System.out.println("1000m took:" + ((System.currentTimeMillis() - startTime) / 1000d) +"s");
startTime = System.currentTimeMillis();
startTime = System.currentTimeMillis();
SpatialTopologyUtils.findClosestEdges(point, highwayLayer, 1000.0);
System.out.println("1000m took:" + ((System.currentTimeMillis() - startTime) / 1000d) +"s");

SpatialTopologyUtils.findClosestEdges(point, highwayLayer, 10000.0);
System.out.println("10000m took:" + ((System.currentTimeMillis() - startTime) / 1000d) +"s");
startTime = System.currentTimeMillis();
SpatialTopologyUtils.findClosestEdges(point, highwayLayer, 10000.0);
System.out.println("10000m took:" + ((System.currentTimeMillis() - startTime) / 1000d) +"s");
startTime = System.currentTimeMillis();
// for(SpatialTopologyUtils.PointResult result : SpatialTopologyUtils.findClosestEdges(point, highwayLayer, 10000.0)){
// System.out.println(result);
// }
}

@BeforeClass
public static void initialiseDatabase() {
graphDB = new EmbeddedGraphDatabase("target/test-classes/spatialTestDB");
spatialDB = new SpatialDatabaseService(graphDB);
}

@AfterClass
public static void shutdownDatabase() {
graphDB.shutdown();
spatialDB = null;
graphDB = null;
}
}

March 21, 2011

Neo4J Spatial and Importing the British Isles Open Street Map Data

I know, I know. Overly ambitious and all that.

While we've been working on Presynt we've been having fun and games with Geo Data; both for Local Search and Reverse Geo lookups. We've also got a number of ambitious new Geo features planned for future versions that needs needs some form of Geo store.

So with that in mind I started playing with Neo4J Spatial and the Open Street Map data set.

It's been an interesting experience with some gotchas as Neo4J Spatial is still not fully ready for prime time, so I thought I would share my experience with those who are interested.

First off I thought I would try to import the full global data set from Open Street Map. Not necessarily a mistake, but a huge undertaking as we are talking about over 200GB of map data in raw OSM format, billions of data points and relationships.

The first gotcha is that if you wish to import the full OpenStreetMap data set I'd recommend starting with at least Neo4j version 1.3 Milestone 4 as that version was the first one to hugely expand the number of nodes supported from 4 billion to 128 billion.

Of course attempting this type of import I also started running into memory problems. I ended up running with -Xmx4096m and -XX:+UseConcMarkSweepGC. The concurrent mark and sweep garbage collector proved most resilient to the loads placed on it. The default parallel GC would fall over and cry quite regularly due to an issue with consuming too much CPU to too little effect.

In the end I discovered that the full global import would take too long to run and set my sights a little lower on just the British Isles data set which is all that Presynt needs right now.

In order to test my set up I decided to use the Buckinghamshire data set which is the smallest county data set in the UK and this highlighted another gotcha: Not all the dependencies required are actually included in the Neo4J POMs.

In the end I used the following dependencies:
  • org.neo4j:neo4j:1.3.M04
  • org.neo4j:neo4j-spatial:0.5-SNAPSHOT
  • org.geotools:gt-referencing:2.6.5
  • org.geotools:gt-main:2.6.5
  • org.geotools:gt-cql:2.6.5
  • org.geotools:gt-epsg-hsql:2.6.5
  • com.vividsolutions:jts:1.11
They may not all be necessary but they certainly work...

Once I had resolved these issues then I was able to start the full British Isles import. I was running it on a quad-core machine and found that the OSMImporter/BatchInserter combination (all default configuration mind) was using just one CPU. Looking at the OSMImporter codebase it became obvious that it was written to stream read the OSM xml data and synchronously write it out to Neo4J. I think that there may be some room to parallelise it as large amounts of the GPS node data is utterly independent and the XML data set is structured in such a way that it could be broken down into independent units of work. The only points where serialised behaviour are required are during the parsing of the XML, the construction of the Way and Relation data and possibly the writing to Neo4J.

The process seems to be largely CPU bound as I ran it on SATA II RAID 0 SSDs and their read and write capabilities were barely touched while the single CPU in use was pegged at 100% almost continuously.

The import of the British Isles dataset seems to take about 48 hours.

I used the vanilla OSMImporter and BatchInserter configurations as I am still a Neo4J Neophyte (bad pun intended ;-)) I will be reading up on alternative configurations to see if I can improve the performance. I will also look at the import of changesets so that I shouldn't need to do a full inport again.