unicode - How does the browser encode non-ASCII characters when posting them to the server? -


is there standard browser http-posting follows? if not can server detect encoding in way?

is there standard browser http-posting follows?

there html5 has codified it, it's not straightforward.

the encoding used browser encode text when submitted form same encoding used view page containing form. if have included content-type: ...;charset=... http header or <meta> tag encoding used unless user deliberately changes encoding of page browser settings.

users won't change setting unless page has been served wrong charset , unreadable. (even then, setting getting more obscure in modern browsers.)

if don't set encoding of page containing form anything; it'll non-utf encoding associated user's region, bets off.

if include attribute accept-charset="..." in <form> element supposed form submitted in encoding, regardless of encoding of form page (whether set page or chosen user). unfortunately, accept-charset broken in ie: given charset used when form contains characters outside of range can encoded in page's encoding. makes submitted encoding inconsistent depending on entered content.

there workaround if charset want utf-8 (and be): include field containing character not exist in non-utf encoding. 1 possible choice replacement character:

<form accept-charset="utf-8"> <input type="hidden" name="enforce-charset" value="&#xfffd;"/> 

finally, if form contains characters outside chosen encoding submitting form, characters sent encoded html character references. confusing because kind of encoding never used in forms, , it's unrecoverable mangling because given &#233; can never tell if user typed &#233; or é.

if not can server detect encoding in way?

this should have been doable @ least post forms having browsers pass content-type: ...;charset= headers form submissions. unfortunately no actual browsers this. few servers support it, when guys @ mozilla tried implement in firefox broke loads of other servers, reality ain't ever going happen.

there newer ie extension has been included in html5, add form:

<input type="hidden" name="_charset_"/> 

(both type , name important.) browsers support hack submit form parameter called _charset_ set encoding sending, eg utf-8, or windows-1252. if server knows encoding can pick , work it.

generally recipe handling form submissions consistently is: serve own forms in pages marked containing utf-8; if care enough user sabotaging encoding, include accept-charset , enforcement hack.

if have accept form submissions elsewhere , can't persuade them include either accept-charset , enforcement hack, or _charset_ hack, have guesswork.


Comments

Popular posts from this blog

javascript - Count length of each class -

What design pattern is this code in Javascript? -

hadoop - Restrict secondarynamenode to be installed and run on any other node in the cluster -