Use of split in Java

Source: Internet
Author: User

I encountered this knowledge point in the last written test and made a mistake. Sorry! Poor learning skills. The question is probably like this:

Java code
  1. String S2 = "this is a test ";
  2. String sarray [] = s2.split ("/s ");
  3. System. Out. println ("sarray. Length =" + sarray. Length );
String S2 = "this is a test"; string sarray [] = s2.split ("/s"); system. out. println ("sarray. length = "+ sarray. length );

What is the output or compilation error? I think if the parameter in the split method is "S", the output must be 4, that is, the original string is divided into arrays {"Thi", "I", "a te ", "T"}, but if the parameter in the original split method is "/s", what does this parameter mean? After the experiment, the output result is 1.

The reason remains to be further explored.

 

Java. Lang. String. Split, that is, the split method, which splits a string into substrings and returns the result as a string array. Format:

Stringobj. Split ([separator, [limit])

Stringobj is a required String object or text to be decomposed. This object will not be modified by the split method. The separator is optional and indicates a string or regular expression object. It identifies whether one or more characters are used to separate strings. If this option is ignored, a single array of elements containing the entire string is returned. Limit is optional. This value is used to limit the number of elements in the returned array. It is worth noting that the result of the split method is a string array. In stingobj, the position where separator appears must be decomposed. separator is not returned as part of any array element.

Example

Java code
  1. String srcstring = "This Is A about split test ";
  2. String stringarray [] = srcstring. Split ("");
  3. //// Separate each space character
  4. For(String stemp: stringarray ){
  5. System. Out. println (stemp );
  6. }
  7. String srcstring1 = "This Is A about split test"; // if there are n spaces, the length of the divided array is n + 1
  8. // If there are multiple spaces in the string, the two spaces are considered to have no characters, and the position in the result string array is null.
  9. String stringarray1 [] = srcstring1.split ("");
  10. For(String stemp: stringarray1 ){
  11. System. Out. println (stemp );
  12. }
String srcstring = "This Is A about split test"; string stringarray [] = srcstring. split (""); // separate each space character for (string stemp: stringarray) {system. out. println (stemp);} string srcstring1 = "This Is A about split test"; // if there are n spaces, the length of the divided array is n + 1 // If there are multiple spaces in the string, the two spaces are considered to be non-characters, and the position in the result string array is null. String stringarray1 [] = srcstring1.split (""); For (string stemp: stringarray1) {system. Out. println (stemp );}

The output result is

Java code
  1. This
  2. Is
  3. A
  4. About
  5. Split
  6. Test
  7. Another:
  8. This
  9. Is
  10. A
  11. About
  12. Split
  13. Test
Thisisaaboutsplittest another one: thisisaaboutsplittest

Another example

Java code
  1. String srcstring = "This Is A about split test ";
  2. String stringarray [] = srcstring. Split ("", 2 );
  3. //// Separate each space character
  4. For(String stemp: stringarray ){
  5. System. Out. println (stemp );
  6. }
  7. The output result is
  8. This
  9. Is a about split test
String srcstring = "This Is A about split test"; string stringarray [] = srcstring. split ("", 2); // break down at each space character for (string stemp: stringarray) {system. out. println (stemp);} the output is Thisis a about split test.

Take a look at the following

Java code
  1. String ipstring = "59.64.159.20 ";
  2. String iparray [] = ipstring. Split (".");
  3. For(String stemp: iparray ){
  4. System. Out. println (stemp );
  5. }
  6. This output is empty. Why?
String ipstring = "59.64.159.20."; string iparray [] = ipstring. split (". "); For (string stemp: iparray) {system. out. println (stemp);} the output is empty. Why?

Public String [] Split (string RegEx) the parameter name here is RegEx, that is, regular expression (regular expression ). This parameter is not a simple delimiter, but a regular expression. The following is the implementation code of the split method:
Public String [] Split (string RegEx, int limit ){
Return Pattern. Compile (RegEx). Split (this, limit );
}

The split method of the matcher class directly called by the split implementation. We know that "." has a special meaning in a regular expression, so we must escape it when using it. As long

Java code
  1. String iparray [] = ipstring. Split (".");
String iparray [] = ipstring. Split (".");

 

Change

Java code
  1. String iparray [] = ipstring. Split ("\\.");
String iparray [] = ipstring. Split ("\\.");

You can.

Here are some escape characters in the column

\ Backslash
\ T interval ('\ u0009 ')
\ N line feed ('\ u000a ')
\ R press enter ('\ u000d ')
The \ D number is equivalent to [0-9].
\ D is equivalent to [^ 0-9].
\ S blank symbol [\ t \ n \ x0b \ f \ r]
\ S non-blank symbol [^ \ t \ n \ x0b \ f \ r]
\ W single character [a-zA-Z_0-9]
\ W non-single character [^ a-zA-Z_0-9]
\ F page feed
\ E escape
\ B boundary of a word
\ B a non-word boundary
End of the match before \ G


Note: Public String [] Split (string RegEx, int limit) splits this string based on matching the given regular expression.

The array returned by this method contains each substring of this string, which is terminated by another substring that matches the given expression or ends by the string. The substrings in the array are arranged in the order of the strings. If the expression does not match any part of the input, the result array has only one element, that is, this string.

The limit parameter controls the number of times that the mode applies, thus affecting the length of the result array. If the limit N is greater than 0, the mode will be applied N-1 times at most, and the length of the array will not be greater than N, in addition, the last entry of the array will contain all input that exceeds the last matching delimiter. If n is not positive, the pattern will be applied as many times as possible, and the array can be of any length. If n is zero, the mode will be applied as many times as possible, the array can have any length, and the trailing null string will be discarded.

 

Return to the original question. The regular expression for matching in the question is "/s", which indicates a blank character. If no matching character exists in the given string, it is output as the original string, therefore, the output character length is 1.

 

Attach some knowledge about regular expressions in Java.

^ Starts with a limit.
^ Java conditions must start with Java
$ Is the end of the limit.
Java $ condition is limited to the end character of Java
. The condition limits any single character except \ n.
Java .. the condition is limited to any two characters except line breaks after Java

Add the specified condition "[]"
[A-Z] the condition is limited to one character in the lowercase A to Z range.
[A-Z] conditions are limited to one character in the upper case A to Z range
[A-Za-Z] the condition is limited to one character in the lowercase A to Z or uppercase A to Z range.
[0-9] the condition is limited to one character in the lowercase 0 to 9 Range
[0-9a-z] the condition is limited to one character in the lowercase 0 to 9 or A to Z range.
[0-9 [A-Z] the condition is limited to one character (intersection) in the lowercase 0 to 9 or A to Z range)

[] Add ^ and then add the restriction "[^]"
[^ A-Z] the condition is limited to one character in the range of non-lowercase A to Z
[^ A-Z] conditions are limited to one character in the range of not uppercase A to Z
[^ A-Za-Z] the condition is limited to one character in the range of non-lowercase A to Z or uppercase A to Z.
[^ 0-9] the condition is limited to one character in the range of 0 to 9 in non-lowercase letters.
[^ 0-9a-z] the condition is limited to one character in the range of 0 to 9 or A to Z in non-lowercase letters.
[^ 0-9 [A-Z] the condition is limited to one character (intersection) in the range of non-lowercase 0 to 9 or A to Z)

When the limit is 0 or more times for a specific character, you can use 「*」
J * more than 0 J
. * 0 or more arbitrary characters
J. * d j and D must contain more than 0 arbitrary characters.

When the condition is that a specific character appears more than once, you can use "+ 」
J + 1 or more J
. + 1 or more arbitrary characters
More than one arbitrary character between J. + d j and D

You can use 「? 」
Ja? J or ja appears

Limit to the number of consecutive occurrences of the specified character "{}」
J {2} jj
J {3} jjj
More than a characters, and "{,}」
J {3,} jjj, jjjj, jjjjj ,??? (More than three J events coexist)
More than one text, less than B "{a, B }」
J {3, 5} jjj, jjjj, or jjjjj
Take the two as "| 」
J | a j or
Java | Hello Java or hello

Specifies a combination type in "()".
For example, you can write <. * href = \". * \ "> (. + ?) </A>

When using the pattern. Compile function, you can add parameters that control the Matching Behavior of Regular Expressions:
Pattern pattern. Compile (string RegEx, int flag)

The flag value range is as follows:
Pattern. canon_eq is determined to be matched only when the "Canonical Decomposition" of the two characters are identical. For example, if this sign is used, the expression "a \ u030a" will match "? ". By default, canonical equivalence is not considered )".

Pattern. case_insensitive (? I) by default, Case Insensitive matching applies only to the US-ASCII character set. This flag allows the expression to ignore the case sensitivity for matching. To match a Unicode character with an unknown size, you just need to combine the unicode_case and the flag.

Pattern. Comments (? X) In this mode, space characters (in a regular expression) are ignored during matching ", it refers to the space in the expression, tab, and press Enter ). The comment starts from # And ends until the end of this line. You can enable the Unix line mode through the embedded flag.

Pattern. dotall (? S) In this mode, the expression '.' can match any character, including the end character of a row. By default, the expression '.' does not match the end character of the row.

Pattern. multiline
(? M) In this mode, '^' and '$' match the start and end of a row respectively. In addition, '^' still matches the start of the string, '$' also matches the end of the string. By default, these two expressions only match the start and end of the string.

Pattern. unicode_case
(? U) In this mode, If you enable the case_insensitive flag, it will match Unicode characters in case insensitive. By default, case-insensitive matching applies only to the US-ASCII character set.

Pattern. unix_lines (? D) In this mode, only '\ n' is considered as a row stop and matches'. ',' ^ ', and' $.

Aside from the vague concept, let's write a few simple Java regular use cases:

◆ For example, when the string contains Verification

// Search for any string starting with Java and ending
Pattern pattern = pattern. Compile ("^ java .*");
Matcher = pattern. matcher ("Java is not a person ");
Boolean B = matcher. Matches ();
// If the condition is met, true is returned; otherwise, false is returned.
System. Out. println (B );

◆ When a string is separated by multiple conditions
Pattern pattern = pattern. Compile ("[, |] + ");
String [] STRs = pattern. Split ("Java Hello World Java, hello, world | sun ");
For (INT I = 0; I <STRs. length; I ++ ){
System. Out. println (STRs [I]);
}

◆ Text replacement (first occurrence of characters)
Pattern pattern = pattern. Compile ("Regular Expression ");
Matcher = pattern. matcher ("Regular Expression Hello world, regular expression Hello World ");
// Replace the first regular data
System. Out. println (matcher. replacefirst ("Java "));

◆ Text replacement (all)
Pattern pattern = pattern. Compile ("Regular Expression ");
Matcher = pattern. matcher ("Regular Expression Hello world, regular expression Hello World ");
// Replace the first regular data
System. Out. println (matcher. replaceall ("Java "));

◆ Text replacement (replacement character)
Pattern pattern = pattern. Compile ("Regular Expression ");
Matcher = pattern. matcher ("Regular Expression Hello world, regular expression Hello World ");
Stringbuffer SBR = new stringbuffer ();
While (matcher. Find ()){
Matcher. appendreplacement (SBR, "Java ");
}
Matcher. appendtail (SBR );
System. Out. println (SBR. tostring ());

◆ Verify whether the email address is used

String STR = "ceponline@yahoo.com.cn ";
Pattern pattern = pattern. compile ("[\ W \\. \-] + @ ([\ W \-] + \\.) + [\ W \-] + ", pattern. case_insensitive );
Matcher = pattern. matcher (STR );
System. Out. println (matcher. Matches ());

◆ Remove HTML tags
Pattern pattern = pattern. Compile ("<. +?> ", Pattern. dotall );
Matcher = pattern. matcher ("<a href = \" index.html \ "> homepage </a> ");
String string = matcher. replaceall ("");
System. Out. println (string );

◆ Search for the corresponding condition string in HTML
Pattern pattern = pattern. Compile ("href = \" (. + ?) \"");
Matcher = pattern. matcher ("<a href = \" index.html \ "> homepage </a> ");
If (matcher. Find ())
System. Out. println (matcher. Group (1 ));
}

◆ Intercept http: // address
// Intercept the URL
Pattern pattern = pattern. Compile ("(http: // | https: //) {1} [\ W \. \-/:] + ");
Matcher = pattern. matcher ("dsdsds Stringbuffer buffer = new stringbuffer ();
While (matcher. Find ()){
Buffer. append (matcher. Group ());
Buffer. append ("\ r \ n ");
System. Out. println (buffer. tostring ());
}

◆ Replace the specified {} text

String STR = "Java's current development history is from {0}-{1} years ";
String [] [] object = {New String [] {"\\{ 0 \\}", "1995 "}, new String [] {"\\{ 1 \}"," 2007 "}};
System. Out. println (replace (STR, object ));

Public static string Replace (final string sourcestring, object [] Object ){
String temp = sourcestring;
For (INT I = 0; I <object. length; I ++ ){
String [] result = (string []) object [I];
Pattern pattern = pattern. Compile (result [0]);
Matcher = pattern. matcher (temp );
Temp = matcher. replaceall (result [1]);
}
Return temp;
}

◆ Query files in a specified directory with regular conditions

// Used to cache the file list
Private arraylist files = new arraylist ();
// Used to carry the file path
Private string _ path;
// It is used to carry the unmerged regular expression.
Private string _ Regexp;

Class myfilefilter implements filefilter {

/**
* Matching file name
*/
Public Boolean accept (File file ){
Try {
Pattern pattern = pattern. Compile (_ Regexp );
Matcher match = pattern. matcher (file. getname ());

Return match. Matches ();
} Catch (exception e ){
Return true;
}
}
}

/**
* Parse the input stream
* @ Param inpu

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.