The actual issue is buried deep in private and protected methods within the Neo4J Spatial codebase in particular in the interaction between the spatial search classes and the OSM layer and index. I have yet think of a way to write a simple and direct unit test that will illustrate the problem so instead I am going to write this blog entry to explain what I found and how I found it.
I actually found the issue when I was attempting to debug why some of my unit test code was failing to return the results that I expected. It turned out that my problem was because I had swapped latitude and longitude on the point that I passed in. While debugging I spotted another issue in the way that the spatial search classes (in particular the org.neo4j.gis.spatial.AbstractSearch and org.neo4j.gis.spatial.query.AbstractSearchIntersection classes) interact with the OSM Layer.
The actual issue is that the AbstractSearch class implements its own getEnvelope(Node geomNode) method that takes a node as a parameter and uses the related layer's GeometryEncoder to parse the node and create the envelope. This method is used in AbstractSearchIntersection's needsToVisit(Node geomNode) method to determine whether to visit the node or any of its children.
When I execute the following code which I believe is correct:
SearchContain searchContain = new SearchContain(point);
osmLayer.getIndex().executeSearch(searchContain);
The OSMLayer's GeometryEncoder is used by AbstractSearch to decode the node's envelope and this produces an envelope with the bounding attributes mixed up.
For many searches this is not a huge issue and the search will appear to work. In the case of the Buckinghamshire (Bucks) search the Bucks envelope no longer subtended fractions of a degree but instead subtended 50 odd degrees of latitude and longitude which included Bucks' true envelope.
As far as I can make out the correct results are returned because within the OSM layer the nodes coordinates and bounding boxes are handled differently so even if the spatial search classes visit unnecessary nodes, the correct answers are returned. This means however that there may be performance issues due to unnecessary visits. Also I think that there may be situations where index nodes that should be visited aren't.
At a fundamental level I think that there is some pretty bad schizophrenia between the way that OSMGeometryEncoder handles envelopes in the Envelope decodeEnvelope(PropertyContainer container) method and the way that the RTreeIndex handles envelopes in the private Envelope bboxToEnvelope(double[] bbox) method.
My gut feeling is that all envelopes should be handled by the relevant GeometryEncoder rather than by a private method on an index, However the OSMGeometryEncoder just plain seems to get it wrong.
3 comments:
Thanks for the much more detailed explanation this time. I was able to find and confirm the bug, and also see why all test cases (and most users code) still works. As you already noticed, the impact is primarily on performance since more RTree branches will be tested than need to be.
But why are all valid branches still tested? The answer is simply that for most users data, and the test cases we have, the values of latitude and longitude are quite different. When swapping one, but not both, pairs, you simply end up with a much bigger bounding box, so the search becomes more inclusive. Obviously this does not work for all cases, so the bug needs to be fixed.
The bug itself is based on that fact that the RTree has its own internal structure, while domain data (like OSM) can have a different internal structure. I had said in an email to you that it is not valid to use the domain data intelligence (in the form of the GeometryEncoder) to decode the RTree. Unfortunately I had not noticed that the RTree was doing exactly that. This seems to be an artifact of the history of the code, the RTree was developed before the code was refactored to support multiple, different, domain data models. Is is somewhat amazing that this bug has not been found earlier. We need to enhance the test cases to catch situations where RTree branches are followed, but return no actual data.
Anyway, I will try fix the bug. It requires a change to the vistor pattern interface, so we stop passing internal RTree structure to the domain model. This is a single method change, but will affect code in a number of places.
I pushed a fix to this yesterday. Today I've been doing performance benchmarking to determine the impact of the bug. I've got some test cases running 10x searches, while others show no improvement. I will investigate further, but in general I think it looks good.
Thanks for the heads up Craig - I'l try running a way search with the Buckinghamshire data set and see what happens...
Post a Comment