non-ascii UTF-8 symbols in GPX traces names are converted to '_' on upload #4127
Comments
Author: TomH I think you might guess wrong... I doubt we are doing anything to the name, so most likely this is an issue with how the browser sends the name and how rails interprets what it receives. Character encoding of form data is quite a sticky issue as there is no clear way for the browser to indicate what encoding it has used for the data it sends. There is an extra complication when sending a file that the browser may not even know what the encoding of the filename is. On windows it's not an issue but on any unix system there is generally no way to know what the encoding is for a filename. |
Author: one_half_3544 Hm. Don't you think assuming utf-8 by default would be a good idea? tcpdumping while uploading (this trace http://www.openstreetmap.org/user/one_half_3544/traces/1149633 ) shows that at least browser (firefox) lists utf-8 in Accept-Encoding:
And transmits utf-8 filename as is:
Some tcp packets later, in the same POST request comes trace description field (duplicates filename):
It comes in utf-8, but it is not converted to '_'. Do you know the place in the source, which handles trace upload? (or at least - where is the source hosted? =)) I have more traces, so I want to resolve this problem. |
Author: TomH It looks like it is something we are doing deliberately - the code is here: http://git.openstreetmap.org/rails.git/blob/HEAD:/app/controllers/trace_controller.rb#l368 I would be reluctant to take any patch which changes that though without a thorough understanding of why it was put in and what the implications are (especially from a security point of view) of relaxing the sanitisation. |
Author: one_half_3544 Well, I hoped to invite the author of that line of code to the conversation, but since you are already here
=) In general modern filesystems (like ext3) should deal fine with utf-8 filenames. Wikipedia, for example, accepts utf-8 filenames as is. And, afaik, mediawiki engine stores them directly on the filesystem. Of course they could have changed that or could have introduced their own sanity checks. I'll study that, but at the first glance it seems that regex could be changed to utf-8 equivalent of [[:print:]] without downgrading security. |
Author: TomH If you look more closely you'll see that I didn't actually write that line. All that happened in that commit is that I moved it as part of a refactoring of the code to merge the common parts of the two different paths for adding traces, |
Author: one_half_3544 Oh, indeed
I'll try to contact him. Hope he remembers the reason of this. |
Author: mmd Closing due to lack of feedback from OP. |
Reporter: one_half_3544
[Submitted to the original trac issue database at 2.18am, Sunday, 4th December 2011]
I'm uploading a couple of gpx traces with russian names. All those symbols are converted to underscores on upload, and I have to duplicate it in description (where utf-8 chars seem to be ok).
Should be no problem support utf-8 in filenames, I guess.
The text was updated successfully, but these errors were encountered: