unicode - How does the browser encode non-ASCII characters when posting them to the server? -
is there standard browser http-posting follows? if not can server detect encoding in way?
is there standard browser http-posting follows?
there html5 has codified it, it's not straightforward.
the encoding used browser encode text when submitted form same encoding used view page containing form. if have included content-type: ...;charset=...
http header or <meta>
tag encoding used unless user deliberately changes encoding of page browser settings.
users won't change setting unless page has been served wrong charset , unreadable. (even then, setting getting more obscure in modern browsers.)
if don't set encoding of page containing form anything; it'll non-utf encoding associated user's region, bets off.
if include attribute accept-charset="..."
in <form>
element supposed form submitted in encoding, regardless of encoding of form page (whether set page or chosen user). unfortunately, accept-charset
broken in ie: given charset used when form contains characters outside of range can encoded in page's encoding. makes submitted encoding inconsistent depending on entered content.
there workaround if charset want utf-8 (and be): include field containing character not exist in non-utf encoding. 1 possible choice replacement character:
<form accept-charset="utf-8"> <input type="hidden" name="enforce-charset" value="�"/>
finally, if form contains characters outside chosen encoding submitting form, characters sent encoded html character references. confusing because kind of encoding never used in forms, , it's unrecoverable mangling because given é
can never tell if user typed é
or é
.
if not can server detect encoding in way?
this should have been doable @ least post forms having browsers pass content-type: ...;charset=
headers form submissions. unfortunately no actual browsers this. few servers support it, when guys @ mozilla tried implement in firefox broke loads of other servers, reality ain't ever going happen.
there newer ie extension has been included in html5, add form:
<input type="hidden" name="_charset_"/>
(both type , name important.) browsers support hack submit form parameter called _charset_
set encoding sending, eg utf-8
, or windows-1252
. if server knows encoding can pick , work it.
generally recipe handling form submissions consistently is: serve own forms in pages marked containing utf-8; if care enough user sabotaging encoding, include accept-charset
, enforcement hack.
if have accept form submissions elsewhere , can't persuade them include either accept-charset
, enforcement hack, or _charset_
hack, have guesswork.
Comments
Post a Comment