source: subversion/applications/utils/planetdiff/readme.txt @ 2605

Last change on this file since 2605 was 2406, checked in by jonb, 12 years ago

planetdiff & planetpatch tool - Allows incremental planet.osm updates using a custom XML diff format

File size: 4.8 KB
Line 
1planetdiff
2==========
3Generates a file containing the differences between two planet.osm dumps.
4
5The program supports .gz and .bz2 compressed files transparently.
6It also runs an internal version of UTF8sanitizer on the input data so
7that it can be used on a file downloaded from
8http://planet.openstreeetmap.org without needing any other manipulation.
9
10Note that the algorithm used relies on the strict ordering of data in the
11planet.osm file to operate correctly. Data produced by other OSM tools
12normally do not follow these rules and can not be manipulated using this
13program.
14
15Build requirements
16------------------
17The code relies on the libraries below:
18
19 libxml2
20 bzip2
21 zlib
22
23To compile this code on Fedora you need at least the following packages
24installed:
25
26 libxml2-devel
27 bzip2-devel
28 zlib-devel
29
30
31Compiling
32---------
33On a Linux or other Unix-like system:
34
35 $ make
36
37This will produce both planetdiff and planetpatch (as below).
38
39Data ordering rules
40-------------------
41The input OSM file must obey the following rules to work with the current
42algorithms. The planet.osm export script used to generate the planet.osm
43dumps does conform to these rules (whether by accident or design).
44
45- The OSM file must be generated in node, segment, way order.
46- The ID of each object of a given type (e.g. nodes) must be increasing.
47
48
49Diff file format
50----------------
51The diff format is an XML file containing OSM objects to delete and add.
52Objects which are modified have both a delete and add section. The format
53of each section is a copy of the OSM object from the planet.osm file.
54
55<?xml version="1.0" encoding="UTF-8"?>
56<planetdiff version="0.1" generator="OpenStreetMap planetdiff" from="a.osm" to="b.osm">
57  <add>
58    <node id="10310557" timestamp="2006-07-10 23:17:35" lat="51.7670078090236" lon="-0.471281873153888">
59      <tag k="created_by" v="JOSM"/>
60    </node>
61  </add>
62  <add>
63    <node id="13602100" timestamp="2006-08-16 00:02:13" lat="51.778541285096" lon="-0.448173637230418"/>
64  </add>
65  <delete>
66    <node id="26983956" lat="51.77874880458334" lon="-0.450481106821043">
67      <tag k="created_by" v="JOSM"/>
68    </node>
69  </delete>
70  <add>
71    <node id="26983956" lat="51.77874880458334" lon="-0.450481106821043">
72      <tag k="created_by" v="JOSMXX"/>
73    </node>
74  </add>
75...
76</planetdiff>
77
78
79See example-diff.xml for an example file.
80
81
82Example usage:
83--------------
84This example shows how the tool can be used to extract the differences
85between two planet.osm dumps. The errors below are from the UTF8sanitizer
86code and can be ignored.
87
88
89$ planetdiff planet-070307.osm.bz2 planet-070321.osm.bz2 > delta2.xml
90
91Processing: node(8420k)
92Processing: segment(0k)Error at line 29333138
93Error at line 29334932
94Error at line 29334990
95Error at line 29334990
96Error at line 29334994
97Error at line 29334994
98Error at line 29336882
99Error at line 29337279
100Error at line 29337338
101Error at line 29337351
102Processing: segment(8830k)
103Processing: way(370k)Error at line 72505269
104Error at line 73944573
105Processing: way(380k)Error at line 74022760
106Error at line 72583739
107Processing: way(430k)
108
109$ bzip2 -c delta2.xml > delta2.xml.bz2
110$ ls -l
111-rw-rw-r-- 1 jburgess jburgess   10732308 Apr  6 11:41 delta2.xml.bz2
112-rw-rw-r-- 1 jburgess jburgess  147704026 Apr  6 03:50 delta2.xml
113-rw-rw-r-- 1 jburgess jburgess  186168637 Mar  7 20:21 planet-070307.osm.bz2
114-rw-rw-r-- 1 jburgess jburgess  193761852 Mar 22 19:24 planet-070321.osm.bz2
115
116The planet.osm file can be regenerated using planetpatch below. The compressed
117diff file is only 10MB which is a much smaller download than a whole new planet.osm
118dump.
119
120
121
122planetpatch
123===========
124Generates a new planet.osm file by applying a differences file created by
125planetdiff to an existing file.
126
127
128Example usage:
129--------------
130The patch file generated by the planetdiff example above is used to
131regenerate the planet-070321.osm file:
132
133$ time planetpatch planet-070307.osm.bz2 delta2.xml > regen.xml
134
135Processing: node(8420k)
136Processing: segment(8830k)
137Processing: way(370k)Error at line 72505269
138Processing: way(380k)Error at line 72583739
139Processing: way(430k)
140
141real    19m54.654s
142user    12m59.771s
143sys     3m35.929s
144
145The output file, in this case 'regen.xml' should now be the same as an
146uncompressed and UTF8sanitized version of planet-070321.osm.bz2
147
148
149
150Verification
151------------
152To verify that this is equal to the new planet.osm file we can compare it
153to a previously generated UTF8sanitized version of the same file.
154'cmp -l' reports every single byte of difference between the files (in octal)
155
156$ cmp -l planet-070321a.osm regen.xml
1571403627544  11  40
1583457266276  11  40
159
160It seems that the process has converted the tab character (ASCII 9) to space (32).
161Other than these two character differences the generated output is identical to the
162original version of the new planet.osm. This seems close enough to be useable right
163now.
164
165
166
Note: See TracBrowser for help on using the repository browser.