Skip to content
This repository has been archived by the owner on Jul 24, 2021. It is now read-only.

Nominatiom: Tolerance in Streetnames (Doktor/Dr) #4961

Open
openstreetmap-trac opened this issue Jul 23, 2021 · 3 comments
Open

Nominatiom: Tolerance in Streetnames (Doktor/Dr) #4961

openstreetmap-trac opened this issue Jul 23, 2021 · 3 comments

Comments

@openstreetmap-trac
Copy link

Reporter: wwwFrank
[Submitted to the original trac issue database at 9.21pm, Tuesday, 20th August 2013]

Example Doktor / Dr.:

Search for: "Dr.-Josef-Fieger-Strae 7, 50374 Erftstadt"

OSM: "Doktor-Josef-Fieger-Strae 7, 50374 Erftstadt"

Example Blank/no Blank:

Search for "Vor dem Schltinger Tor Soest"

OSM: 'Vor dem Schltingertor Soest'

@openstreetmap-trac
Copy link
Author

Author: lonvia
[Added to the original trac issue at 9.40pm, Wednesday, 21st August 2013]

Doktor/Dr is supposed to work but there is a typo in tokenstringreplacements.inc. Not sure if that can be fixed without a DB reimport.

For handling of blank/noblank see #4827.

@openstreetmap-trac
Copy link
Author

Author: florian_rittmeier
[Added to the original trac issue at 9.49pm, Friday, 27th December 2013]

I'd like to know, how we can proceed with fixing this issue?

tokenstringreplacements.inc has to be patched so that new installations do not get this typo for their databases. Ionvia, could you just fix this or are there other steps to be done before?

As far as I understand the database layout and Geocode.php correctly we would have to update the word_token column of the word table if we want to skip a DB reimport. We would have to look for entrys containing " d r " and change it to " dr ". Furthermore we would have to replace "d r " at the beginning of an entry to "dr ". If we would add the patched and recompiled version of module/nominatim.c to the running postgresql instance this might be all to update a running instance.

I am quite new to Nominatim so I'm not sure whether there are other table columns which have to be updated. The approach has another open question: There are other terms which might lead to an entry containing " d r ". It might be a construction but if there exists something like "Auf der Rue Paris" this would be saved as " a d r paris". Thus a DB reimport might be the cleanest option. The question is, is this an option or is this perhaps done regularly?

@openstreetmap-trac
Copy link
Author

Author: lonvia
[Added to the original trac issue at 10.36am, Saturday, 28th December 2013]

On a general note: changes to tokenstringreplacements.inc break running instances that keep up with the latest github version. There are a few of those around. So before changing it, some plan to update existing DBs would be always needed. As for reimports (you mean on osm.org I presume), it is not impossible but it is rather difficult to free up the resources to do that while keeping the service running. Migration is always the preferred option.

For the specific case of Dr/Doktor, there is an additional problem with the Russian abbreviation here. '' (doktor) gets abbreviated to '-' (d-r) which indeed then gets further reduced to 'd r'. So there might be an additional normalisation necessary from 'd-r' to 'dr'.

There might be clashes with other languages as well. I've created a [https://gist.github.com/lonvia/8158065 list of tokens containing 'd r'] from the current planet, if you are interested to check that out. Side note: The list is not very long. It might be a feasible migration strategy to simply reindex all places concerned to force the search index to be updated with the new normalizations.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

1 participant