Opened 5 years ago

Last modified 4 years ago

#4827 new enhancement

dividing of street names (handling of composita)

Reported by: jotpe Owned by: geocoding@…
Priority: minor Milestone:
Component: nominatim Version:
Keywords: Cc:

Description

Nominatim helps me to resolve incorrect divided streetnames like this:
"Main Str., 51149 Köln" to "Mainstr., 51149 Köln"
http://nominatim.openstreetmap.org/search.php?q=Main+Str.%2C+51149+Köln

German street names have often "Weg" as last name part. Is it possible to handle this correction also with "Weg".

Examples:

  1. Real street name is "Ziegeleiweg, 51149 Köln": Should be found with "Ziegelei Weg" or "Ziegelei-Weg", if no other hits are available.
  2. Real street name is "Urbacher Weg, 51149 Köln": Should be found with "Urbacherweg", if no other hits are available.
  3. To be complete: Real street name is "Theodor-Schnitzler-Weg, 51149 Köln": Can be found with "Theodor Schnitzler Weg", but should be also be findable with "Theodorschnitzlerweg", if no other hits are available.

Thanx

Change History (3)

comment:1 Changed 4 years ago by lonvia

  • Summary changed from dividing of street names to dividing of street names (handling of composita)

Normalization does split off "strasse" from German street names and the same could be done with "weg" and some other common German suffixes. That won't help for the third case, though. So a more general language-independent handling of composita is required here.

See also #4572, #4961

comment:2 Changed 4 years ago by florian_rittmeier

Regarding the third case: Would it be a good solution to modify osm2pgsql/output-gazetter.c so that it adds an additional alt_name containing the variant of the name without spaces and minus signs? So if the name tag holds a composita the non composita variant would be added as alternative.

The question is, should this only apply to the name tag or to all name like tags (tags like int_name, nat_name, loc_name...)?

comment:3 Changed 4 years ago by lonvia

This could even be done during indexing in sql by simply adding an unhyphened version to the search terms and it would be less of a hack there.

I don't see too much of an issue reducing hyphens(1) but I'm not sure about spaces. It is simply too difficult distinguish composite-like words (e.g. Freiberger Weg) and true multi-word names (e.g. Auf dem Berg) and would introduce a lot of bad search terms. They probably wouldn't do much harm for searching itself but we already have issues with DB indexes over the search terms growing too large, so the less unnecessary terms the better.

(1) Thinking a bit further, it might even be a good idea to always remove hyphens and full stops from the complete word while still adding the composita parts as partial words.

Note: See TracTickets for help on using tickets.