Trim in BOM and JavaScript

Source: Internet
Author: User

Today encountered a IE7 under the Json.parse failure problem. A troubleshooting Discovery: The server-side profile encoding is UTF-8 + BOM that the output string starts with the BOM character, not the legitimate JSON.

IE7 does not support native JSON, we are using json2.js in our project, but JSON that does not parse the BOM character at the beginning is not Json2 's fault, and other browsers are normal because they ignore the BOM at the beginning of the response body. If you write as follows, each browser will throw an exception:

<script>var a = ' {' A ': 1} '; Try {    catch(e) {    alert (e.message);} </script>

By pasting this piece of code into a powerful codemirror, you can easily discover this invisible BOM character:

In terms of today's scenario, although the problem is with the interface provider, it is possible to trim the string and then json.parse it, given the robustness of the code.

String.prototype.trimIs the ES5 added method, for old browsers, but also to use their own trim to achieve. Let's take a look at Qwrap and JQuery's implementation of Trim:

// jQuery 1.7.2:trimleft =/^[\s\xa0]+/=/[\s\xa0]+$/; return null ? "" : text.tostring ()                                                              "") "");

JQuery 1.7.2 Filters the and at both ends of the string \s \xA0 . For IE low version, \s equivalent to [ \t\v\f\r\n] . The meanings of these characters are shown in the following table:

name Unicode encoding string representation description
<sp> u+0020 "", "\x20", "\u0020" half-width whitespace, keyboard spacebar
<tab> u+0009 "\ T", "\x09", "\u0009" tab, keyboard TAB,
<vt> u+000b "\v", "\x0b", "\u000b" vertical Tab
<ff> u+000c "\f", "\x0c", "\u000c" page Break
<cr> u+000d "\ r", "\x0d", "\u000d" carriage return
<lf> u+000a "\ n", "\x0a", "\u000a" line break
<nbsp> u+00a0 "\xa0", "\u00a0" no-break Space
Prohibit wrap whitespace

The last "Disable automatic line breaks" <NBSP> is actually used frequently in HTML &nbsp; . In HTML, consecutive whitespace characters (half-width spaces, line breaks, tab, and so on) are combined into a single space, and &nbsp; are not compatible with other adjacent white-space words.

As you can see, at least in the low version of IE, JQuery 1.7.2 cannot filter BOM characters at both ends of a string.

// jQuery 1.8.1RTrim =/^[\s\ufeff\xa0]+| [\s\ufeff\xa0]+$/G,returnnull ?                "]  :                "" );

JQuery 1.8.1 on the basis of the previous, but also increased \uFEFF . It is ES5 new whitespace character, called "Byte order mark character (byte order mark)", which is mentioned earlier BOM .

name Unicode Encoding string Representation
<BOM> U+feff "\ufeff"

Unicode3.2 before, \uFEFF said "0 wide non-newline space (Zero width no-break space)";unicode3.2 new \u2060 to represent the 0-wide non-newline space \uFEFF , only to represent the byte order mark.

As you can see, the JQuery 1.8.1 can filter the BOM. In addition, given that some browser-implemented trim does not filter <NBSP> or <BOM> , JQuery adds a layer of detection, not the presence of native trim must be native.

// Qwrap 1.1.6 return s.replace (/^[\s\ufeff\xa0\u3000]+|[ \ufeff\xa0\u3000\s]+$/g, "");

The trim of the qwrap also increased \u3000 . It is "ideographs space", used in CJK unified ideographic text, can simply think of it is usually we encounter the Chinese full-width space ideographic.

Unicode Encoding string Representation Description
u+3000 "", "\u3000" Ideographic SPACE,CJK Full-width space

Take a look at what characters the browser trim should handle, as described in the ES5 documentation:

Let-T is a String value, which is a copy of the S with both leading and trailing white space removed. The definition of white space is the union of whitespace and LineTerminator.

This means that both ends of the string are WhiteSpace LineTerminator removed.

The ES5 documentation stipulates that, WhiteSpace in addition to the above mentioned <SP>、<TAB>、<VT>、<FF>、<NBSP> and <BOM> <USP> other whitespace characters defined, USP represents the characters in "separator, space" classification in Unicode, as follows (Source: 1, 2):

Unicode Encoding Description
u+0020 Space,<sp>
U+00a0 No-break space,<nbsp>
u+1680 Ogham SPACE MARK, Augan
u+180e Mongolian vowel Separator, Mongolian vowel delimiter
u+2000 EN QUAD
u+2001 EM QUAD
u+2002 EN space,en spaces. Same width as en (half of EM)
u+2003 EM space,em space. Same width as em
u+2004 Three-per-em Space,em One-third Spaces
u+2005 Four-per-em Space,em One-fourth Spaces
u+2006 Six-per-em Space,em one-sixth Spaces
u+2007 Figure space, numeric space. Same width as single digit
u+2008 Punctuation space, punctuation space. Width with narrow punctuation of the same font
u+2009 THIN space, narrow spaces. EM one-sixth or one-fifth wide
u+200a HAIR space, more narrow spaces. Narrower than narrow spaces
u+200b Zero width space,<zwsp>, 0 wide spaces
u+200c Zero width Non joiner,<zwnj>, 0 wide hyphenation space
u+200d Zero width joiner,<zwj>, 0 wide hyphen space
u+202f NARROW no-break Space, narrow non-newline space
u+205f MEDIUM mathematical space, medium math space.
For mathematical equations
u+2060 Word Joiner, same as u+200b, but does not break the line.
Unicode3.2 new, instead of U+feff
u+3000 Ideographic space, ideographic space. That is, full-width spaces
U+feff byte order mark,<bom>, byte order tag character.
No line-wrapping function abolished in Unicode3.2

Take a look at the definition of the document pair, LineTerminator in addition to the previously described <LF> (line Feed, newline character) and <CR> (carriage return, carriage return), there are two:

name Unicode Encoding Description
<LS> u+2028 Row delimiter
<PS> u+2029 Paragraph separator

As you can see, the trim method defined by ES5 is very powerful. Browser implementation, I tested Chrome and Firefox, most of the above invisible characters can be filtered, but also can be matched in the regular \s .

Qwrap or JQuery implements trim, it only handles common characters and is usually sufficient. If you need more consistent trim with ES5, you can look at Es5-shim this project:

varWS = "\x09\x0a\x0b\x0c\x0d\x20\xa0\u1680\u180e\u2000\u2001\u2002\u2003\u2004\u2005\u2006\u2007\u2008\u2009\u200a \u202f\u205f\u3000\u2028\u2029\ufeff ";if(! String.prototype.trim | |Ws.trim ()) {    //Http://blog.stevenlevithan.com/archives/faster-trim-javascript    //http://perfectionkills.com/whitespace-deviations/WS = "[" + WS + "]"; varTrimbeginregexp =NewRegExp ("^" + ws + WS + "*"), Trimendregexp=NewREGEXP (ws + WS + "*$")); String.prototype.trim=functiontrim () {if( This===void0 | | This===NULL) {            Throw NewTypeError ("Can ' t convert" + This+ "to Object"); }        returnString ( This). Replace (Trimbeginregexp,""). Replace (Trimendregexp,""); };}

Trim in BOM and JavaScript

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.