April 04, 2011

Debugging Neo4J Spatial - Update 3 on Neo4J Spatial and British Isles OSM Data import.

I have been having a long discussion with Craig Taverner over an issue that I think that I have identified in the Neo4J spatial codebase with particular reference to the OSM code. Unfortunately I have completely failed to communicate the issue that I have perceived. Craig has variously believed that the issue is in my code or my misperception of the Neo4J spatial unit test code.

The actual issue is buried deep in private and protected methods within the Neo4J Spatial codebase in particular in the interaction between the spatial search classes and the OSM layer and index. I have yet think of a way to write a simple and direct unit test that will illustrate the problem so instead I am going to write this blog entry to explain what I found and how I found it.

I actually found the issue when I was attempting to debug why some of my unit test code was failing to return the results that I expected. It turned out that my problem was because I had swapped latitude and longitude on the point that I passed in. While debugging I spotted another issue in the way that the spatial search classes (in particular the org.neo4j.gis.spatial.AbstractSearch and org.neo4j.gis.spatial.query.AbstractSearchIntersection classes) interact with the OSM Layer.

The actual issue is that the AbstractSearch class implements its own getEnvelope(Node geomNode) method that takes a node as a parameter and uses the related layer's GeometryEncoder to parse the node and create the envelope. This method is used in AbstractSearchIntersection's needsToVisit(Node geomNode) method to determine whether to visit the node or any of its children.

When I execute the following code which I believe is correct:

SearchContain searchContain = new SearchContain(point);
osmLayer.getIndex().executeSearch(searchContain);

The OSMLayer's GeometryEncoder is used by AbstractSearch to decode the node's envelope and this produces an envelope with the bounding attributes mixed up.

For many searches this is not a huge issue and the search will appear to work. In the case of the Buckinghamshire (Bucks) search the Bucks envelope no longer subtended fractions of a degree but instead subtended 50 odd degrees of latitude and longitude which included Bucks' true envelope.

As far as I can make out the correct results are returned because within the OSM layer the nodes coordinates and bounding boxes are handled differently so even if the spatial search classes visit unnecessary nodes, the correct answers are returned. This means however that there may be performance issues due to unnecessary visits. Also I think that there may be situations where index nodes that should be visited aren't.

At a fundamental level I think that there is some pretty bad schizophrenia between the way that OSMGeometryEncoder handles envelopes in the Envelope decodeEnvelope(PropertyContainer container) method and the way that the RTreeIndex handles envelopes in the private Envelope bboxToEnvelope(double[] bbox) method.

My gut feeling is that all envelopes should be handled by the relevant GeometryEncoder rather than by a private method on an index, However the OSMGeometryEncoder just plain seems to get it wrong.