Opened 9 years ago

Last modified 9 years ago

#3149 accepted defect

Searching for the Japanese name of the Japanese city Fukuoka fails to return the city node from OSM

Reported by: spod Owned by: twain
Priority: major Milestone:
Component: nominatim Version: 2.0
Keywords: Cc:

Description

Searching for 福岡 either from the OSM page or on the Nominatim site does not return the city node (331385074).

It seems to return various other items which do contain the substring "福岡", but not the city node, which is strange. If it can find the substring "福岡" within these items, you would expect it to be able to find the substring "福岡" in the city node's name:ja which is "福岡市".

http://nominatim.openstreetmap.org/search?q=福岡 should return node 331385074.

Searching for "福岡市", which is the exact name:ja tag, seems to work. The "市" means city, but wouldn't normally be used in a search (just like you wouldn't search for "Sheffield city", but just "Sheffield", in the UK).

I tried searching for the Japanese city name of Yokohama (横浜) but that failed as well, so maybe a problem for all Japanese cities (except Tokyo, which is actually "incorrect" in that the OSM name:ja tag is "東京" when it should really be "東京市", so ignore Tokyo in any tests to confirm whether it is fixed!).

Change History (5)

comment:1 Changed 9 years ago by spod

BTW: This problem occured, when using Safari browser with default encoding of "Western (ISO) Latin 1). I tried it in Firefox with English encoding and the same problem happened. Setting the default encoding to "Japanese (Shift-JIS)" in Safari made no difference.

comment:2 in reply to:  description Changed 9 years ago by twain

Owner: changed from openstreetmap@… to twain
Status: newaccepted

Replying to spod:

Searching for 福岡 either from the OSM page or on the Nominatim site does not return the city node (331385074).

It seems to return various other items which do contain the substring "福岡", but not the city node, which is strange. If it can find the substring "福岡" within these items, you would expect it to be able to find the substring "福岡" in the city node's name:ja which is "福岡市".

This is a combination of two problems. First I seem to be missing handlers for come strings so 福岡リバレイン gets converted to 福岡 by the code that handles abbreviations.

Would you be able to give me a literal translation of 福岡リバレイン to help me understand what is happening? Is リバレイン in a different character set or something?

Searching for "福岡市", which is the exact name:ja tag, seems to work. The "市" means city, but wouldn't normally be used in a search (just like you wouldn't search for "Sheffield city", but just "Sheffield", in the UK).

This is the second part of the problem. I agree with the above but you will see that Sheffield isn't labelled as "Sheffield City" in osm:

http://www.openstreetmap.org/browse/node/422162

From a data point of view the extra 市 seems wrong - it is already tagged 'place=city'.

?

--

Brian

comment:3 Changed 9 years ago by spod

Thanks for the investigation.

The literal translation of "福岡リバレイン" is "Fukuoka RiverRain?" which is the name of a shopping complex.

"福岡" is kanji. "リバレイン" is katakana (it's actually a phonetic character set generally used for 'foreign' words in Japanese). They are not different character sets as such, just different categories of Japanese "letters". In Unicode, the kanji (actually unified Chinese/Japanese? etc symbols) code table starts from 4E00 and the katakana code table starts from 30A0, so I guess they are "separated" in a code sense (if the software is using Unicode encoding at the point of searching).

The inclusion of the "市" (city) in the name tag is the convention in Japan: http://wiki.openstreetmap.org/wiki/Japan_tagging Not sure why they did this - I wasn't part of the discussions!

comment:4 Changed 9 years ago by spod

Some more info I thought of: If the software is using a change of code page to indicate "the start of a new word", then that's not always correct in Japanese. In the RiverRain? example it does indicate the start of a new and separate word, but especially with Hiragana (another Japanese code page, starting at 3040) it is possible to have a single word containing kanji and hiragana.

e.g. 親富孝通り (Oyafuko-dori), a road in Fukuoka city (way 43105756). The " 親富孝通" is kanji and the "り" ("ri") is hiragana. Nominatim doesn't seem to return that way when searching for either the whole Japanese name, or any substring of it. Searching for the "English" name does work. I'll add that to the test cases page as well.

comment:5 Changed 9 years ago by spod

To clarify my last point:

The Japanese name 親富孝通り (Oyafuko-dori), consists of 2 "words" ("親富孝"/oyafuko and "通り"/dori/street"), with the second word being a mixture of kanji ("通/do") and hiragana ("り/ri"). i.e. it's a single, unsplittable word containing kanji and hiragana, which doesn't make sense if parsed by splitting it at the point where it changes from kanji to hiragana.

Note: See TracTickets for help on using tickets.