Potlatch enters invalid UTF-8 into the database, resulting in the API outputing invalid XML #1936
Comments
Author: Richard Sure, but you'll have to tell me how to do UTF-8 validation. I know nothing about it (and, in traditional Anglo-centric fashion, care less ;) ). |
Author: tom[at]compton.nu I believe one of the bots actually goes round fixing these things up in fact... |
Author: avarab[at]gmail.com As an example of the main API rejecting this: create_changeset.xml:
Then POST it with curl:
|
Author: tom[at]compton.nu The reason the XML API rejects it is just that we us libxml2 to parse the incoming XML and that rejects it. |
Author: Matt Richard: in an anglo-saxon fashion - well, more of a teutonic fashion - i have already fixed it for you. see http://trac.openstreetmap.org/browser/sites/rails_port/lib/validators.rb#L21 and call it with any string you'd like to check. |
Author: avarab[at]gmail.com Replying to [comment:1 Richard]:
This is the string that Potlatch saves into the database, escaped:
The bit that makes this invalid UTF-8 is the ^C, the API would e.g. accept this:
Now, ^C here is \03, or ETX (end of text):
But as this is a double-encoding issue in the first place perhaps you could do better than merely validating that the data is correct (& rejecting it), but automagically fix it too:
I.e. when you encounter \03 slurp up the trailing non-ASCII characters and presume that they're double-encoded data, and decode them. If you did this then people could happily type "Hrgrbraut" into Potlatch, which flash on linux would convert into " |
Author: avarab[at]gmail.com Replying to [comment:7 Richard]: .. in [16525] |
Author: avarab[at]gmail.com And see related ticket:2072 which deals with the general issue of Potlatch not doing server-side validation of client-supplied data. |
Reporter: avarab[at]gmail.com
[Submitted to the original trac issue database at 3.02pm, Monday, 8th June 2009]
When using Potlatch 1.0 on Ubuntu with Firefox 3.0.10 and Shockwave Flash 10.0 r22 entering (note: not copy/pasting) non-ASCII data into Potlatch will result in double encoded data being saved into the database. E.g. if I enter:
In Potlatch it will be shown as:
This is a known bug in itself. However Potlatch should do validation on this data before it saves it to the database. When I try to create a changeset in JOSM with this comment the API will respond with HTTP/1.1 400 Bad Request. Potlatch however will happily save it, resulting in this:
The API will then read that when serving read request via the API and output invalid XML as a result:
See xmlstarlet validation output:
So both Potlatch and the API are at fault, the API should handle invalid byte sequences in the database and output valid XML anyway, but if Potlatch did UTF-8 validation in the first place that invalid data wouldn't have been there.
The text was updated successfully, but these errors were encountered: