javascript - text nodeValue containing HTML entity -
i'm creating real time html editor loads after dom has been rendered, , builds source looping through nodes. i've noticed when try read nodevalue of text node containing html entity, rendered unicode value of entity.
how can read rendered text node, , keep html entity code? (using vanilla js)
example:
<div id="test">copyright ©</div> <script> var test = document.getelementbyid('test'); console.log(test.childnodes[0].nodevalue); // expected: copyright © // actual: copyright © </script>
unfortunately can't. text interface inherits characterdata, , both interfaces provide domstrings return value, contains unicode characters.
furthermore, html5 parsing algorithm removes entity entirely. defined in several sections of 8.2.4 tokenization.
- 8.2.4.1 data state: describes ampersand puts parser character reference in data state
- 8.2.4.2 character reference in data state describes tokens followed ampersand should consumed. if works fine, return unicode character tokens, not entity!
- 8.2.4.69 tokenizing character references describes how 1 interprets
&...;
(basically things , if ok, look in table).
so time parser has finished entity gone , has been replaced unicode symbols. not surprising, since can put symbol © right html code if want.
however, can still undo transformation: need take copy of table, , check character in document whether has entry in it:
var entitytable = { 169: "©" } function reentity(character){ var index = character.charcodeat(0), name; if( index < 127) // ignore ascii symbols return character; if( entitytable[index] ) { name = entitytable[index]; } else { name = "#"+index; } return "&"+name+";" }
this quite cumbersome task, due parser's behaviour have it. (don't forget check whether has done that).
Comments
Post a Comment