March 31, 2011

Update 2 on Neo4J Spatial and British Isles OSM Data import.

Well even 6GB of heap space isn't enough to fully import and re-index the British Isles Open Street Map Data set. I'm thinking that there will have to be some work on memory management somewhere. First thing I'm going to need to do is profile the import and work out where the memory is being used (I'm assuming it isn't a memory leak per se). Basically I want to determine whether it is because of a misconfiguration by me or there is a need to look at the memory usage in the OSMImporter class.

I'm also thinking that for reverse geo and local searches I really don't need the full OSM data set so a customised version of the OSM importer would be a good idea. I really don't need ways or the node data associated with them. I suspect a multi-pass approach would be a good idea. The first pass through the OSM data set determines which nodes are needed for the features that I want to import, the second pass imports the required nodes and their associated features, followed by a final re-indexing.

I've been in discussions with one of the people behind Neo4J Spatial, Craig Taverner, on the Neo4J User List. I found an issue with the bounding box used in the spatial search index visitor pattern. It's been quite illuminating to dig into this. I'd been thinking for a while that a number of the techniques that I encountered working with 3D graphics and scene graphs would be relevant and it definitely seems to be the case. I'm beginning to think that OpenGL or OpenCL backed indices could prove very useful in high throughput micro-batched situations where the bus latency can be hidden.

No comments: