How do I solve a null character when JavaScript splits a string?

Source: Internet
Author: User
Tags regular expression split

I. Description of the problem

There are some empty string "" when splitting a string using the JavaScript split method, especially when using regular expressions as delimiters.

Ii. related issues

A JavaScript regular expression produces an empty string group when grouping strings?

In the above question, the master uses a regular expression to split the string to produce multiple empty strings "", the following code:

The code is as follows:

' Zhang Sdf four-way ASDF Onfen aa33 net S '. Split (/([u4e00-u9fa5]{1})/gi);

Output ["", "Zhang", "SDF", "Four", "" "," "," "," "," "", "" "," "", "" "", "" "", "" "", "" Asdf "," "", "Aa33"

So what is the reason for these empty strings?

Third, the problem analysis

Google search for some, found that the relevant results are not much, even if there is not much detailed explanation, probably said a bit, and then gave a ECMAScript specification link. It seems to want to know the real reason, you can only bite the bullet to see the norms.

Iv. Relevant standards

Then, next, according to international practice, first on the ECMAScript of the standard town building.

The code is as follows:

String.prototype.split (separator, limit)

This chapter details the split method of implementation steps, if interested in a step-by-step carefully read, I am here to produce an empty string related to the steps taken out to explain, improper place, welcome everyone to put forward.

V. Related steps

To extract a partial step:

The most important step in the process is the 13th step, and the main thing to do with this loop is as follows:

The value of p, Q is defined, and the value of P and Q is the same at the beginning of each cycle (the step is outside the loop);

Call the Splitmatch (S, Q, R) method to split the string;

According to the results of the return, the different branches are executed, and the main branches are branch Ⅲ;

Branch Ⅲ is divided into 8 small steps to populate the returned results with the predefined array a

In this 8 small step, the function of step 1 is to return a substring of the original string, where the start position is P (included), and the end position is Q (not included), note: In this step an empty string is generated, which I mark as an intercept string for easy reference.

To add a substring of the previous step to array a

The next few steps are to update the related variables and continue with the next loop. (step 7 is to save the capture grouping in the regular expression to array a, regardless of the resulting empty string)

Splitmatch (S, Q, R)

Next, we need to know what the Splitmatch (S, Q, R) has done with this method. This method is mentioned below in the split specification. The main thing it does is, according to the type of the delimiter (separator) to do the appropriate action:

If the delimiter is of type regexp, the internal method [[Match]] of RegExp is invoked to match the string, and if the match fails, returns failure, otherwise, returns the result of a matchresult type.

If the delimiter is a string, the match is judged, the failure returns failure, and the result of the Matchresult type is returned successfully.

Matchresult

The above steps also lead to a variable of the Matchresult type. By looking up the document found that the type of variable has two attributes Endindex and Captures,endindex values are string matching positions plus 1,captures can be understood as an array, when the delimiter is a regular expression, the elements inside it is the value of the packet capture; When the delimiter is a string, it is an empty array.

Next

We can see from the above steps that the split string is generated in the step of intercepting the string (except for the group capture of regular expressions). Its role is to intercept the string between the specified start (included) and the end position (not included), and when will it return ""? There is a special case where the values of the start and end positions are equal, which is only conjecture, because the specification does not give the canonical steps to intercept the string.

Come here, why don't you take a step forward?

So, I try to search some V8 source code, see can find the concrete realization method. Actually found the relevant code, source link

Here are some of the parts:

The code is as follows:

function Stringsplitjs (separator, limit) {

...

...

The case of a delimiter is a string

if (!is_regexp (separator)) {

var separator_string = to_string_inline (separator);

if (limit = = 0) return [];

ECMA-262 says that if separator was undefined, the result should

Being an array of size 1 containing the entire string.

if (is_undefined (separator)) return [subject];

var separator_length = separator_string.length;

The delimiter is an empty string, and the character array is returned directly

if (Separator_length = = 0) return%stringtoarray (subject, limit);

var result =%stringsplit (subject, separator_string, limit);

return result;

}

if (limit = = 0) return [];

Delimiter is the case of a regular expression, call Stringsplitonregexp

Return Stringsplitonregexp (subject, separator, limit, length);

}

Some code omitted here

I found in the code that the%_substring method was invoked to intercept the string when the array was populated, but unfortunately I did not find a relevant definition for him, if any of the students found were welcome to inform. However, I found that the Stringsubstring method substring this method in JavaScript invokes%_substring this method and returns its result. So if the ' ABC '. SUBSTRING (1,1) returns "", it means that%_substring this method returns "" at the same start and end position, and the result is known as a test.

So, when will the start position equal to the end position (that is, q = = p)? I follow the steps of the above step by step analysis, and finally found:

After the original string s matches the delimiter once, the next position of the string s also matches the separator character. such as: ' ABBBC '. Split (' B '), ' ABBBC '. Split (/(b) {1}/)

Another condition is that one or more characters at the beginning of the string match the delimiter. such as: ' ABC '. Split (' A '), ' abc '. Split (/ab/)

There is also a situation where one or more strings at the end of the string match the delimiter, and the steps associated with it are step 14th.

such as: ' ABC '. Split (' C '), ' abc '. Split (/bc/)

Also, when a regular expression is used as a delimiter, undefined may appear in the returned results.

such as: ' ABC '. Split (/(d) */)

Looking back at the beginning of the example, is not satisfied with the above several situations?

Six, digression

This is the first time I looked so carefully at the ECMAScript standard, the process is really painful, but after the understanding of the feeling very happy. Also thanks to the question raised by the Lord, as well as questioning.

By the way, when a regular expression is a delimiter, the global modifier G is ignored, which is an extra bonus.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.