source: subversion/sites/other/trapi/INSTALL @ 31161

Last change on this file since 31161 was 20486, checked in by deelkar, 10 years ago

fix old api 0.5 references

File size: 8.6 KB
Line 
1
2Trapi has only been tested on Debian unstable on x86 machines, but
3should work on any modern unix-derived OS.  For Debian, you will need
4the packages listed in packages.  (or equivelents if you prefer
5another web server than apache2.)  The data files are
6endian-independent.  Porting to Microsoft would be a challenge, there
7are subtle dependencies on a reasonable OS.
8
9The code depends on pack supporting "N!" for signed 32-bit values.
10This feature is apparently new at perl 5.10.
11
12The time on the system should be close to correct.  If it is off by
13more than a minute it could affect the data update process.  The map
14fcgi-bin program will refuse to serve data if the clock is too far in
15the future, and having the clock in the past will cause Trapi to
16server obsolete data.
17
18Trapi does many seeks and small reads and writes, on millions of small
19files.  The indexes are fairly large (Around 2.5 Gb in 4 files as of
20December 2008.)  The data files should be on a filesystem set up for
21many small files.  If using ext2 or ext3, directory indexing should be
22on and using a 1 or 2kbyte allocation will save disk space.  (mke2fs
23-j -O dir_index -b 1024 -i 2048 works well.)  (About 13 Gb and 7.5
24million inodes as of December 2008.)  Room for both growth and garbage
25to accumulate should be planned for.  Low latency disks and lots of
26memory to cache the files are recomended.
27
28The only part of trapi that tends to be CPU bound is the database
29import and update.  This is single-threaded other than the file
30decompression, so not much benifit is had with more than 2 cores.
31
32The configuration of trapi is in trapi.pm.  Constants that probably
33shouldn't be changed are defined in ptdb.pm.  VERBOSE is for how many
34messages to output.  1 will be only detected problems in the data, 25
35will be many debugging messages.  TRAPIDIR is where the trapi data
36files are stored.  DBDIR is for the indexes that are larger files.
37MAXOPEN is the number of data files to open at once, not including the
38indexes.  Bigger is generally better if your OS can handle it.
39KEEPOPEN should be a bit less, closer to MAXOPEN will mean more CPU
40and less file opening.  SPLIT is the number of bytes in a node data
41file before splitting to higher zoom.  (16 bytes/node) IGNORETAGS is a
42regular expression to match tags that will not be stored in the
43database.  (Use '' to store all tags.)  GCCOUNT is the number of tiles
44to garbage collect per change file processed.  (Setting it low will
45cause the list of tiles to be garbage-collected to build up in busy
46times.)  OSCDELAY, WAITDELAY, and WAITFAIL determine how the change
47files are checked for.  THRESH is used by the tags file generator,
48explained below.  TAGSVERSION selects which tags file to use for new
49tiles.
50
51The .pm files need to be where your perl will find them, and the
52executable files where your shell will find them.
53
54Trapi does not need special priveliges.  Your web server will need
55read access to the trapi database and indexes.
56
57Initial data load:
58
59   cleardb
60   bzcat planet.bz2 | tahdbload.pl
61   echo YYYYMMDD >timestamp
62
63YYYYMMDD is the day before the planet file was generated.  It's better
64to reprocess a bit of duplicated data than miss something.  The
65initial data load is disk IO intensive and will take several days.
66
67
68Updating the data:
69
70   trup.pl | trpcs.pl
71
72This will fetch the daily, hourly, and minute change files and update
73the trapi database.  It will complete processing the currently fetched
74data then stop if the file stopfile.txt exists.  timestamp will be
75updated as the change files are processed.
76
77
78Garbage collection:
79
80   trgarb.pl
81
82Garbage collection must be done when nothing else is updating the
83database.  It takes a little over a day.  Now that tiles are garbage
84collected as trpcs.pl runs, this should only be needed to convert
85tiles to the current version.  stopfile.txt will stop the garbage
86collection.
87
88
89Web access:
90
91map is a fastcgi script.  Your web server should be configured so
92api/0.6/map requests will go to it.
93
94
95Tags files:
96
97Trapi uses a variable-length encoding scheme to store common tag keys,
98values, and roles.  alltags.pl will analyze your current Trapi
99database and create a tags.z14x16384y0 file.  (As well as several
100other files checkpointing the creation.)  tagsproc.pl will take this
101file, sort and eliminate things less common than THRESH, and create a
102tags file to standard output.  THRESH is a compromise between disk
103space and memory, higher THRESH will cause Trapi to use less memory
104but more disk space.  You should run this twice if your Trapi database
105is significantly different than the one used to create your current
106tags file, since it will only capture values for known common tags.
107Trapi will use the current version selected by TAGSVERSION for new
108tiles, and the version the tile was created at for old tiles until
109they are garbage-collected.  TAGSVERSION of 0 is used for old format
110tiles.
111
112To update from tags.1 based on current trapi database:
113    cd /trapi
114    alltags.pl
115    tagsproc.pl >db/tags.2
116    edit trapi.pm and change TAGSVERSION to 2
117    alltags.pl
118    tagsproc.pl >db/tags.3
119    edit trapi.pm and change TAGSVERSION to 3
120    restart apache
121    restart trpcs.pl
122
123
124
125
126Trapi will return more data than requested.  All request are rounded
127up to z14 tile boundaries, and in low node density areas may be up to
128z11.  Some tags not used by tiles@home are not stored by trapi.  The
129user and timestamp information is also not stored.  This is fine for
130tiles@home, but trapi data must not be uploaded to openstreetmap.
131Ways and relations that are no longer in the requested area may be
132returned.
133
134
135
136UPDATE NOTE:
137
138When upgrading from a version prior to Feb 18, 2009: This version of
139trapi is designed to do a gradual upgrade of tile format as tiles are
140split or garbage collected.  You will need to add the new settings to
141trapi.pm, and install tags.1 in your db directory if TAGSVERSION is 1.
142The new version is also less tollerant of some database errors.  Due
143to bugs in the previous version, you should do a complete database
144reload.
145
146
147
148When updating from a version prior to Jan 9, 2009 you'll need to put
149your configuraiton in trapi.pm and the relation files need to be
150rebuilt.  (or a complete database rebuild) Since relation extracts of
151planet files are available, this can be done: (This takes most of a
152day.)
153
154   cd TRAPIDIR
155   touch stopfile.txt
156   wait for trpcs.pl to stop
157   find z0 z1? -name relation -print | xargs rm
158   bzcat relation-DATE.osm.bz2 | tahdbload.pl
159   echo YYYYMMDD > timestamp
160   restart trpcs
161
162
163IMPORTANT NOTE
164
165As of early 2010 that the old minute and hourly diffs that TRAPI uses
166are no longer in service. You must follow the instructions below to
167fetch and apply the new replicate diffs to your TRAPI database.
168The new method uses Osmosis and the --rri option to fetch the replicate
169diffs and convert them to the type expected by TRAPI.
170
171The process is as follows: use the "go.sh" bash script to invoke
172osmosis with --rri, rename the output file, and move it to the change
173directory (CHANGEDIR). The "go.sh" script should be located in your
174osmosis/bin/ directory. There are two locations that need to be changed
175in the go script.
176Please edit the script and set CHANGEDIR and WORKDIR equal to something
177appropriate for your system. CHANGEDIR should be an empty directory
178where Osmosis will dump the .osc files and WORKDIR is the Osmosis
179workingDirectory used with the --rri task.
180
181monitor.pl replaces trup.pl. It monitors CHANGEDIR for new files and
182feeds the filenames to trpcs.pl via STDOUT. monitor.pl should be placed
183in the same directory as trapi.pm as it reads config info from there.
184trapi.pm needs to define the constant CHANGEDIR, so a line like
185"use constant CHANGEDIR => "/home/user/change/"; should be added to
186trapi.pm.
187Of course you should change the directory to suit your environment.
188
189Note that monitor.pl requires File::Monitor for perl.
190
191If you haven't yet loaded your database with your initial planet dump,
192now would be a good time. See the TRAPI INSTALL file for details. You
193should fetch the initial dump from a mirror, because TRAPI will
194(obviously) catch up.
195
196At this point you should install Osmosis, run the --rrii task, and find
197and download the appropriate state.txt file from planet.openstreetmap.org
198or mirror. This is typically one from slightly before the dump you used
199to initialize your database.
200
201Once osmosis is ready, just run the "go.sh" script and osmosis should
202start dumping .osc.gz files into your CHANGEDIR. Once it's done at least
203one, you can then invoke monitor.pl in your TRAPI directory with
204something like ./monitor.pl | ./trpcs.pl. Trpcs should start updating
205your database with the files that it finds in CHANGEDIR.
206
207Any questions, email the talk list or contact user Milenko.
208
Note: See TracBrowser for help on using the repository browser.