[wget-notify] [bug #20422] Wget should handle IRIs.
Micah Cowan
INVALID.NOREPLY at gnu.org
Mon Nov 5 12:05:17 PST 2007
Follow-up Comment #3, bug #20422 (project wget):
See RFC 3987, and also
http://www.w3.org/TR/REC-html40/appendix/notes.html#non-ascii-chars
Probably, we should carry sufficient information around with the URL that we
can try each of the following in turn, until we find one that works:
0 Transcode from the document's encoding to UTF-8, and percent-encode
0 Directly percent-encode the actual bytes that were used
There might also be cases where we'd want to try transcoding into
ISO-8859-1.
Host-names, of course, should only support punycoding. Perhaps if an
internationalized host name is detected, we should forgo the "actual bytes"
version, as there really can't be any expected meaning for a URI that has
non-ASCII characters in the host name, and non-UTF-8 characters in the
remainder.
_______________________________________________________
Reply to this item at:
<http://savannah.gnu.org/bugs/?20422>
_______________________________________________
Message sent via/by Savannah
http://savannah.gnu.org/
More information about the wget-notify
mailing list