[wget-notify] [bug #20422] Wget should handle IRIs.

Micah Cowan INVALID.NOREPLY at gnu.org
Mon Nov 5 12:05:17 PST 2007


Follow-up Comment #3, bug #20422 (project wget):

See RFC 3987, and also
http://www.w3.org/TR/REC-html40/appendix/notes.html#non-ascii-chars

Probably, we should carry sufficient information around with the URL that we
can try each of the following in turn, until we find one that works:

0 Transcode from the document's encoding to UTF-8, and percent-encode
0 Directly percent-encode the actual bytes that were used

There might also be cases where we'd want to try transcoding into
ISO-8859-1.

Host-names, of course, should only support punycoding. Perhaps if an
internationalized host name is detected, we should forgo the "actual bytes"
version, as there really can't be any expected meaning for a URI that has
non-ASCII characters in the host name, and non-UTF-8 characters in the
remainder.

    _______________________________________________________

Reply to this item at:

  <http://savannah.gnu.org/bugs/?20422>

_______________________________________________
  Message sent via/by Savannah
  http://savannah.gnu.org/



More information about the wget-notify mailing list