March 24, 2011

Update 1 on Neo4J Spatial and British Isles OSM Data import.

Successes and failures...

I managed to shepherd the import all the way through to import completion with 4096 MB of memory but unfortunately fell short when it came to the re-indexing. As I commented on my post I upped the memory to 6144 MB and restarted and it has again successfully imported and has been re-indexing overnight without falling over. I'm waiting to see how this goes.

During the past import I did a little more characterisation of the system utilisation and came up with the following: As reported before the node import process is single threaded and does not saturate my disk IO, what is more interesting is the way import which is heavily reliant on the already imported nodes. The way import saturated neither IO nor a single core on my CPU (again it appeared largely single threaded), I'm not sure what the problem is, it could be that the limit is the disk seek time / latency as the ways are composed of nodes already imported. I'm guessing that there would be a need to look at how the nodes are stored and whether any useful caching can be done to accelerate the way import.

The 'relation' import phase was too short for me to characterise.

The indexing phase does appear to be multi-threaded during a brief glance this morning before I left. using 2 cores quite consistently (4 cores are available).

Aside from the importing I've started playing with the API, using the Buckinhamshire data set that I imported on my laptop. So far not too impressed by performance, I used the code from some of the Neo4J spatial unit tests to do a way (highway) search from a given point. I chose coordinate in the middle of a lake (51.808721,-0.689735) and gradually expanded the search from 10m, 100m, 1000m, 10000m. quite slow but at least not an exponential increase in time - 14.4s, 85.8s, 88.7s, 86.1s.

Interesting to note that the OSM parts of the Neo4J Spatial codebase so far does not seem to have any convenience methods for area and boundary searches, only for way searches.

Here is my current (very rough) code below:
package com.presynt.neo4j;

import static org.junit.Assert.*;

import com.vividsolutions.jts.geom.Coordinate;
import com.vividsolutions.jts.geom.GeometryFactory;
import com.vividsolutions.jts.geom.Point;
import org.geotools.referencing.CRS;
import org.junit.AfterClass;
import org.junit.BeforeClass;
import org.junit.Ignore;
import org.junit.Test;
import org.neo4j.gis.spatial.Layer;
import org.neo4j.gis.spatial.SpatialDatabaseService;
import org.neo4j.gis.spatial.SpatialTopologyUtils;
import org.neo4j.gis.spatial.osm.OSMImporter;
import org.neo4j.gis.spatial.osm.OSMLayer;
import org.neo4j.graphdb.GraphDatabaseService;
import org.neo4j.kernel.EmbeddedGraphDatabase;
import org.neo4j.kernel.impl.batchinsert.BatchInserterImpl;

import javax.xml.stream.XMLStreamException;
import java.io.IOException;


public class TestNeo4JSpatial {

private static GraphDatabaseService graphDB;
private static SpatialDatabaseService spatialDB;

@Test
@Ignore
public void createSpatialDB() throws XMLStreamException, IOException {
OSMImporter importer = new OSMImporter("OSM-BUCKS");
BatchInserterImpl inserter = new BatchInserterImpl("/testDB");
importer.importFile(inserter ,"/OSMData/buckinghamshire.osm");//"british_isles.osm"); // "C:\\british_isles.osm");
inserter.getGraphDbService().shutdown();
graphDB = new EmbeddedGraphDatabase("/testDB");
importer.reIndex(graphDB, 100000);
graphDB.shutdown();
}

@Test
public void retrieveLayer(){
final Layer layer =
spatialDB.getLayer("OSM-BUCKS");
assertNotNull(layer);
assertEquals(OSMLayer.class, layer.getClass());
}

@Test
public void useLayer() {
final OSMLayer osmLayer = (OSMLayer)spatialDB.getLayer("OSM-BUCKS");
final GeometryFactory factory = osmLayer.getGeometryFactory();
System.out.println("Unit of measure: " + CRS.getEllipsoid(osmLayer.getCoordinateReferenceSystem()).getAxisUnit().toString());

final Point point = factory.createPoint(new Coordinate(51.808721,-0.689735));

// final Layer boundaryLayer = osmLayer.addSimpleDynamicLayer("boundary", "administrative");
// SearchContain searchContain = new SearchContain(point);
// boundaryLayer.getIndex().executeSearch(searchContain);
// for(SpatialDatabaseRecord record: searchContain.getResults()){
// System.out.println("Container:" + record);
// }

final Layer highwayLayer = osmLayer.addSimpleDynamicLayer("highway", null);
long startTime = System.currentTimeMillis();
SpatialTopologyUtils.findClosestEdges(point, highwayLayer, 10.0);
System.out.println("10m took:" + ((System.currentTimeMillis() - startTime) / 1000d) +"s");
startTime = System.currentTimeMillis();
SpatialTopologyUtils.findClosestEdges(point, highwayLayer, 10.0);
System.out.println("10m took:" + ((System.currentTimeMillis() - startTime) / 1000d) +"s");

startTime = System.currentTimeMillis();
SpatialTopologyUtils.findClosestEdges(point, highwayLayer, 100.0);
System.out.println("100m took:" + ((System.currentTimeMillis() - startTime) / 1000d) +"s");
startTime = System.currentTimeMillis();
SpatialTopologyUtils.findClosestEdges(point, highwayLayer, 100.0);
System.out.println("100m took:" + ((System.currentTimeMillis() - startTime) / 1000d) +"s");

startTime = System.currentTimeMillis();
SpatialTopologyUtils.findClosestEdges(point, highwayLayer, 1000.0);
System.out.println("1000m took:" + ((System.currentTimeMillis() - startTime) / 1000d) +"s");
startTime = System.currentTimeMillis();
startTime = System.currentTimeMillis();
SpatialTopologyUtils.findClosestEdges(point, highwayLayer, 1000.0);
System.out.println("1000m took:" + ((System.currentTimeMillis() - startTime) / 1000d) +"s");

SpatialTopologyUtils.findClosestEdges(point, highwayLayer, 10000.0);
System.out.println("10000m took:" + ((System.currentTimeMillis() - startTime) / 1000d) +"s");
startTime = System.currentTimeMillis();
SpatialTopologyUtils.findClosestEdges(point, highwayLayer, 10000.0);
System.out.println("10000m took:" + ((System.currentTimeMillis() - startTime) / 1000d) +"s");
startTime = System.currentTimeMillis();
// for(SpatialTopologyUtils.PointResult result : SpatialTopologyUtils.findClosestEdges(point, highwayLayer, 10000.0)){
// System.out.println(result);
// }
}

@BeforeClass
public static void initialiseDatabase() {
graphDB = new EmbeddedGraphDatabase("target/test-classes/spatialTestDB");
spatialDB = new SpatialDatabaseService(graphDB);
}

@AfterClass
public static void shutdownDatabase() {
graphDB.shutdown();
spatialDB = null;
graphDB = null;
}
}

1 comment:

Robert Boothby said...

Craig Taverner kindly pointed out that I had the Latitude and Longitude transposed in the example above.