Opened 9 years ago

Closed 3 years ago

Last modified 3 years ago

#2758 closed defect (fixed)

Search street is case sensitive in Russian

Reported by: Dmitriy.Ovdienko@… Owned by: twain
Priority: major Milestone:
Component: nominatim Version:
Keywords: search search_result Cc:

Description

Try search "гмыри" and "Гмыри". It is street in Kyiv. Full name is "Бориса Гмыри ул."

Attachments (2)

utfasciitable.h.zip (85.9 KB) - added by Dmitriy.Ovdienko@… 4 years ago.
Updated UTF-ASCII mapping file
utfasciitable.h.2.zip (85.9 KB) - added by Dmitriy.Ovdienko@… 4 years ago.
Updated UTF-ASCII mapping file v2

Download all attachments as: .zip

Change History (13)

comment:1 Changed 9 years ago by Tom Hughes

Component: websitenominatim
Owner: changed from Tom Hughes to openstreetmap@…

comment:2 Changed 9 years ago by twain

Owner: changed from openstreetmap@… to twain
Status: newassigned

comment:3 Changed 9 years ago by emj

Keywords: search_result added

comment:4 Changed 6 years ago by Dmitriy.Ovdienko@…

"6а, полуботка, чернигов" does not work. However "6, полуботка, чернигов" does work.

comment:5 Changed 6 years ago by Dmitriy.Ovdienko@…

4 years old bug... I was sure it is fixed. I believe it is core component and search should be tolerant to user typos as much as possible.

comment:6 in reply to:  description Changed 5 years ago by saintam1

It seems to revolve specifically around the handling of the letter Г (the cyrillic "G"). It looks like Nominatim does not know that "г" (unicode 0x0433) is lowercase for "Г" (0x0413).

Searching for William Gladstone St. in Sofia (way 230377106), all of these variations work correctly:

Note that the above use a mixture of upper and lower case, but they all have an uppercase "Г". If you take any of them, however, and simply replace the uppercase "Г" with a lowercase "г", they all fail:

I searched through https://github.com/twain47/Nominatim and couldn't find at a cursory glance where the casing is handled. Is there somewhere a manually defined, hardcoded list of upper/lower character mappings, that perhaps has a typo in it?

comment:7 Changed 5 years ago by saintam1

FWIW I had a look at utfasciitable.h, and it looks OK to me. If I understand correctly how it works (look up the unicode codepoint in UTFASCIILOOKUP, and use the value there as an index in UTFASCII), all cyrillic characters in the 0x410-0x044F range, which includes both upper and lower case, map to lowercase ASCII transliterations. So there's no anomaly around the "Г" character here, both "г" and "Г" map to "g".

Changed 4 years ago by Dmitriy.Ovdienko@…

Attachment: utfasciitable.h.zip added

Updated UTF-ASCII mapping file

comment:8 Changed 4 years ago by Dmitriy.Ovdienko@…

I guess mapping of the "г" and "Г" is wrong. I've attached corrected file.

Changed 4 years ago by Dmitriy.Ovdienko@…

Attachment: utfasciitable.h.2.zip added

Updated UTF-ASCII mapping file v2

comment:9 Changed 4 years ago by Dmitriy.Ovdienko@…

Fixed i->I transition. See attached v2 file.

comment:10 Changed 3 years ago by Sarah Hoffmann

Resolution: fixed
Status: assignedclosed

comment:11 Changed 3 years ago by Dmitriy.Ovdienko@…

Next step is to make search more typo friendly (like google) :)

Note: See TracTickets for help on using tickets.