Trim in BOM and JavaScript

Last Update:2014-10-05 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Today encountered a IE7 under the Json.parse failure problem. A troubleshooting Discovery: The server-side profile encoding is UTF-8 + BOM that the output string starts with the BOM character, not the legitimate JSON.

IE7 does not support native JSON, we are using json2.js in our project, but JSON that does not parse the BOM character at the beginning is not Json2 's fault, and other browsers are normal because they ignore the BOM at the beginning of the response body. If you write as follows, each browser will throw an exception:

<script>var a = ' {' A ': 1} '; Try {    catch(e) {    alert (e.message);} </script>

By pasting this piece of code into a powerful codemirror, you can easily discover this invisible BOM character:

In terms of today's scenario, although the problem is with the interface provider, it is possible to trim the string and then json.parse it, given the robustness of the code.

String.prototype.trimIs the ES5 added method, for old browsers, but also to use their own trim to achieve. Let's take a look at Qwrap and JQuery's implementation of Trim:

// jQuery 1.7.2:trimleft =/^[\s\xa0]+/=/[\s\xa0]+$/; return null ? "" : text.tostring ()                                                              "") "");

JQuery 1.7.2 Filters the and at both ends of the string \s \xA0 . For IE low version, \s equivalent to [ \t\v\f\r\n] . The meanings of these characters are shown in the following table:

name	Unicode encoding	string representation	description
<sp>	u+0020	"", "\x20", "\u0020"	half-width whitespace, keyboard spacebar
<tab>	u+0009	"\ T", "\x09", "\u0009"	tab, keyboard TAB,
<vt>	u+000b	"\v", "\x0b", "\u000b"	vertical Tab
<ff>	u+000c	"\f", "\x0c", "\u000c"	page Break
<cr>	u+000d	"\ r", "\x0d", "\u000d"	carriage return
<lf>	u+000a	"\ n", "\x0a", "\u000a"	line break
<nbsp>	u+00a0	"\xa0", "\u00a0"	no-break Space Prohibit wrap whitespace

The last "Disable automatic line breaks" <NBSP> is actually used frequently in HTML   . In HTML, consecutive whitespace characters (half-width spaces, line breaks, tab, and so on) are combined into a single space, and   are not compatible with other adjacent white-space words.

As you can see, at least in the low version of IE, JQuery 1.7.2 cannot filter BOM characters at both ends of a string.

// jQuery 1.8.1RTrim =/^[\s\ufeff\xa0]+| [\s\ufeff\xa0]+$/G,returnnull ?                "]  :                "" );

JQuery 1.8.1 on the basis of the previous, but also increased \uFEFF . It is ES5 new whitespace character, called "Byte order mark character (byte order mark)", which is mentioned earlier BOM .

name	Unicode Encoding	string Representation
<BOM>	U+feff	"\ufeff"

Unicode3.2 before, \uFEFF said "0 wide non-newline space (Zero width no-break space)";unicode3.2 new \u2060 to represent the 0-wide non-newline space \uFEFF , only to represent the byte order mark.

As you can see, the JQuery 1.8.1 can filter the BOM. In addition, given that some browser-implemented trim does not filter <NBSP> or <BOM> , JQuery adds a layer of detection, not the presence of native trim must be native.

// Qwrap 1.1.6 return s.replace (/^[\s\ufeff\xa0\u3000]+|[ \ufeff\xa0\u3000\s]+$/g, "");

The trim of the qwrap also increased \u3000 . It is "ideographs space", used in CJK unified ideographic text, can simply think of it is usually we encounter the Chinese full-width space ideographic.

Unicode Encoding	string Representation	Description
u+3000	"", "\u3000"	Ideographic SPACE,CJK Full-width space

Take a look at what characters the browser trim should handle, as described in the ES5 documentation:

Let-T is a String value, which is a copy of the S with both leading and trailing white space removed. The definition of white space is the union of whitespace and LineTerminator.

This means that both ends of the string are WhiteSpace LineTerminator removed.

The ES5 documentation stipulates that, WhiteSpace in addition to the above mentioned <SP>、<TAB>、<VT>、<FF>、<NBSP> and <BOM> <USP> other whitespace characters defined, USP represents the characters in "separator, space" classification in Unicode, as follows (Source: 1, 2):

Unicode Encoding	Description
u+0020	Space,<sp>
U+00a0	No-break space,<nbsp>
u+1680	Ogham SPACE MARK, Augan
u+180e	Mongolian vowel Separator, Mongolian vowel delimiter
u+2000	EN QUAD
u+2001	EM QUAD
u+2002	EN space,en spaces. Same width as en (half of EM)
u+2003	EM space,em space. Same width as em
u+2004	Three-per-em Space,em One-third Spaces
u+2005	Four-per-em Space,em One-fourth Spaces
u+2006	Six-per-em Space,em one-sixth Spaces
u+2007	Figure space, numeric space. Same width as single digit
u+2008	Punctuation space, punctuation space. Width with narrow punctuation of the same font
u+2009	THIN space, narrow spaces. EM one-sixth or one-fifth wide
u+200a	HAIR space, more narrow spaces. Narrower than narrow spaces
u+200b	Zero width space,<zwsp>, 0 wide spaces
u+200c	Zero width Non joiner,<zwnj>, 0 wide hyphenation space
u+200d	Zero width joiner,<zwj>, 0 wide hyphen space
u+202f	NARROW no-break Space, narrow non-newline space
u+205f	MEDIUM mathematical space, medium math space. For mathematical equations
u+2060	Word Joiner, same as u+200b, but does not break the line. Unicode3.2 new, instead of U+feff
u+3000	Ideographic space, ideographic space. That is, full-width spaces
U+feff	byte order mark,<bom>, byte order tag character. No line-wrapping function abolished in Unicode3.2

Take a look at the definition of the document pair, LineTerminator in addition to the previously described <LF> (line Feed, newline character) and <CR> (carriage return, carriage return), there are two:

name	Unicode Encoding	Description
<LS>	u+2028	Row delimiter
<PS>	u+2029	Paragraph separator

As you can see, the trim method defined by ES5 is very powerful. Browser implementation, I tested Chrome and Firefox, most of the above invisible characters can be filtered, but also can be matched in the regular \s .

Qwrap or JQuery implements trim, it only handles common characters and is usually sufficient. If you need more consistent trim with ES5, you can look at Es5-shim this project:

varWS = "\x09\x0a\x0b\x0c\x0d\x20\xa0\u1680\u180e\u2000\u2001\u2002\u2003\u2004\u2005\u2006\u2007\u2008\u2009\u200a \u202f\u205f\u3000\u2028\u2029\ufeff ";if(! String.prototype.trim | |Ws.trim ()) {    //Http://blog.stevenlevithan.com/archives/faster-trim-javascript    //http://perfectionkills.com/whitespace-deviations/WS = "[" + WS + "]"; varTrimbeginregexp =NewRegExp ("^" + ws + WS + "*"), Trimendregexp=NewREGEXP (ws + WS + "*$")); String.prototype.trim=functiontrim () {if( This===void0 | | This===NULL) {            Throw NewTypeError ("Can ' t convert" + This+ "to Object"); }        returnString ( This). Replace (Trimbeginregexp,""). Replace (Trimendregexp,""); };}

Trim in BOM and JavaScript

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Trim in BOM and JavaScript

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Trim in BOM and JavaScript

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support