Opened 4 years ago

Last modified 3 years ago

#4961 new enhancement

Nominatiom: Tolerance in Streetnames (Doktor/Dr)

Reported by: wwwFrank Owned by: geocoding@…
Priority: minor Milestone:
Component: nominatim Version:
Keywords: Street Tolerance Cc:

Description

Example Doktor / Dr.:

Search for: "Dr.-Josef-Fieger-Straße 7, 50374 Erftstadt"

OSM: "Doktor-Josef-Fieger-Straße 7, 50374 Erftstadt"

Example Blank/no Blank:

Search for "Vor dem Schültinger Tor Soest"

OSM: 'Vor dem Schültingertor Soest'

Change History (3)

comment:1 Changed 4 years ago by lonvia

  • Summary changed from Nominatiom: Tolerance in Streetnames (Germany) to Nominatiom: Tolerance in Streetnames (Doktor/Dr)

Doktor/Dr? is supposed to work but there is a typo in tokenstringreplacements.inc. Not sure if that can be fixed without a DB reimport.

For handling of blank/noblank see #4827.

comment:2 Changed 3 years ago by florian_rittmeier

I'd like to know, how we can proceed with fixing this issue?

tokenstringreplacements.inc has to be patched so that new installations do not get this typo for their databases. Ionvia, could you just fix this or are there other steps to be done before?

As far as I understand the database layout and Geocode.php correctly we would have to update the word_token column of the word table if we want to skip a DB reimport. We would have to look for entrys containing " d r " and change it to " dr ". Furthermore we would have to replace "d r " at the beginning of an entry to "dr ". If we would add the patched and recompiled version of module/nominatim.c to the running postgresql instance this might be all to update a running instance.

I am quite new to Nominatim so I'm not sure whether there are other table columns which have to be updated. The approach has another open question: There are other terms which might lead to an entry containing " d r ". It might be a construction but if there exists something like "Auf der Rue Paris" this would be saved as " a d r paris". Thus a DB reimport might be the cleanest option. The question is, is this an option or is this perhaps done regularly?

comment:3 Changed 3 years ago by lonvia

On a general note: changes to tokenstringreplacements.inc break running instances that keep up with the latest github version. There are a few of those around. So before changing it, some plan to update existing DBs would be always needed. As for reimports (you mean on osm.org I presume), it is not impossible but it is rather difficult to free up the resources to do that while keeping the service running. Migration is always the preferred option.

For the specific case of Dr/Doktor?, there is an additional problem with the Russian abbreviation here. 'Доктор' (doktor) gets abbreviated to 'Д-р' (d-r) which indeed then gets further reduced to 'd r'. So there might be an additional normalisation necessary from 'd-r' to 'dr'.

There might be clashes with other languages as well. I've created a list of tokens containing 'd r' from the current planet, if you are interested to check that out. Side note: The list is not very long. It might be a feasible migration strategy to simply reindex all places concerned to force the search index to be updated with the new normalizations.

Note: See TracTickets for help on using tickets.