OSM Data Mirror on SUSE13.1

Created by: Lester Caine, Last modification: 07 Dec 2013 (23:20 GMT)

OSM tool set on SUSE13.1 provides all the tools ready configured for use with an empty database for the planet extract.

Part of the process of creating an offline system based on the OSM data is to maintain a mirror of the planet file data. Doing this for the whole planet is somewhat impractical on a small machine, so a subset of the data is a more practical proposition. Fortunately a number of sites maintain mocal subsets which are updated perhaps daily, and this may be more than adaquate for many operations. However once an intial download has been processed, downloading update files allows regular corrections to be added at a rate suitable for the job in hand.

While a little overkill, I'm working with the British Isles extract provided by Geofabrik who provide one of the most comprehensive set of data extracts. They do further subdivide the UK into smaller units, but being of Manx origins, the Isle of Man is an area that I need, along with Northern Ireland, and the additional areas do not enlarge the data significantly.

So the starting point is to download the great-britain-latest.osm.pbf file. For various reasons the remote servers have not been configured with a desktop, so all of the necessary operations are carried out via the command line, so ...

cd /srv
mkdir maps
cd maps
mkdir british-isles
cd british-isles
wget -v -U http://download.geofabrik.de/europe/british-isles-latest.osm.pbf

This gives a quite complex area and include the Channel Islands as well GeofabrikExtractBritishIsles

The installation of osm2pgsql from the GEO repository also creates the default user and an empty database already configured, so all that is required is to start the main import.

sudo -u postgres osm2pgsql -s -U postgres -d gis british-isles-latest.osm.pbf

The sudo is important here as on has to pretend to be the postgres user in order torun commands into postgres. The security setup prevents other users accessing the data. (But I need to find where the data is stored ... with Firebird one has a single file which stores the database and can be located with the project).

In addition to the raw data, the extract page also provides a set of pre-extracted shape files of various elements, along with a single profile of the extract area, and importantly the state file for the extract which records which sequence index the data was extracted up to. This is then used to download further incremental extracts to bring the data even more up to date, and is the basis for the ongoing replication. For the current cycle this is 642976, and the database is now populated with :-

Nodes       - 64311009  (at average of 163.6k/s taking 393s)
Ways         - 7391692  (at average of 13.89k/s taking 532s)
Relations - 120461  (at average of 48.95/s taking 2461s)

It would be interesting to see who the creation rates are affected by processor and disk performances. This machine is a quad code AMD with 8Gb of DDR3 memory.

Having downloaded the state.txt file into the same directory as the extract, we are ready to configure things to run further updates on the data. This takes data direct from the planet archive using osmosis. This first needs initialising.

osmosis --rrii

And this creates a configuration.txt file to go with the state.txt file. Although the file is created I found it needed modifying to work for me. It is a simple text file, so can be edited with your preferred text editor, and the baseUrl to end with /replication/minute . I also increased the maxInterval to 14400 or 4 hours just while we bring the data closer up to date. It gets dropped back to 3600 (1 hour) when things are ticking over. The state.txt file also needs an uodate before running, copying the sequence number on the top line to replace the one on the third line. The following pair of commands then grabs a 4 hour slice and then adds it to the database, but at present I've not found how to update the pbf file itself. It may be that I need to run a clean extract using the shape file to remove updates outside the area?

osmosis --rri workingDirectory=. --wxc **.bi.osc.gz
osm2pgsql -a --slim --append **.bi.osc.gz

I've been updating the number each pass so I maintain a set of change files that goes with the pbf file so I can update that directly, but currently attempts to run 'osmosis --apply-change' is failing with warnings and then errors. I will have a complete set of data to allow updates later if possible. Currently I believe I need the .pbf file for the osrm extract process.