About the use of split in Java

Last Update:2014-12-19 Source: Internet

Author: User

Tags explode

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Before in the http://shukuiyan.iteye.com/blog/507915 article has already described this problem, but in the last written examination actually met this knowledge point, but also did wrong, embarrassed! The Apprentice is not fine. The topic is probably this:

Java code

String s2= "This is a test";
String Sarray[]=s2.split ("/S");
System.out.println ("sarray.length=" +sarray.length);

What is this output or a compile error? I think the argument in the split method if "s", the output must be 4, that is, the original string is divided into the array {"Thi", "I", "A te", "t"}, but the original problem split method parameter if "/S", then this parameter in the end is the meaning of it? After the experiment, the output was 1.

Why do you have to dig into it?

Java.lang.string.split, the Split method, implements the function of splitting a string into substrings and then returning the result as an array of strings. The format is:

Stringobj.split ([Separator,[limit]])
Where stringobj is a required option that represents a String object or text to be decomposed. The object is not modified by the split method. Separator is an option that represents a string or a regular expression object that identifies whether one or more characters are used when separating a string. If this option is omitted, a single element array containing the entire string is returned. Limit is an optional value that is used to limit the number of elements in the returned array. It is worth noting that the result of the split method is an array of strings, where each occurrence of separator in stingobj is decomposed and separator is not returned as part of any array element.

An example

Java code

String srcstring= "This was a about split test";
String stringarray[]=srcstring.split ("");
Explode at each space character
for (String Stemp:stringarray) {
System.out.println (stemp);
}
String srcstring1= "This is a was a about split test";//There are n spaces, the length of the divided array is n+1
If there are more than one space in a string, two spaces are considered to be no characters, and the position in the resulting string array is empty.
String stringarray1[]=srcstring1.split ("");
for (String stemp:stringarray1) {
System.out.println (stemp);
}

This results in an output of

Java code

This
Is
A
About
Split
Test
Another:
This
Is
A
About
Split
Test

An additional example

Java code

String srcstring= "This was a about split test";
String Stringarray[]=srcstring.split ("", 2);
Explode at each space character
for (String Stemp:stringarray) {
System.out.println (stemp);
}
The output result is
This
is a about split test

Look at the following.

Java code

String ipstring= "59.64.159.224";
String iparray[]=ipstring.split (".");
for (String Stemp:iparray) {
System.out.println (stemp);
}
This output is empty, why?

Because public string[] split (string regex), the name of the argument here is the Regex, which is regular expression (regex). This parameter is not a simple split character, but a regular expression, the following is the implementation code of the split method:
Public string[] Split (string regex, int limit) {
return Pattern.compile (Regex). Split (this, limit);
}

Split's implementation directly calls the split method of the Matcher class. We know, ". "There are special meanings in regular expressions, so we must escape when we use them." As long as the

Java code

String iparray[]=ipstring.split (".");

Switch

Java code

String iparray[]=ipstring.split ("\ \");

You can do it.

So here are some escape characters on the column

\ \ counter Slash
\ t interval (' \u0009 ')
\ n line break (' \u000a ')
\ r Enter (' \u000d ')
\d number equivalent to [0-9]
\d non-numeric equivalent to [^0-9]
\s blank symbol [\t\n\x0b\f\r]
\s non-blank symbol [^\t\n\x0b\f\r]
\w individual characters [a-za-z_0-9]
\w non-individual characters [^a-za-z_0-9]
\f Page Break
\e Escape
\b The boundary of a word
\b A non-word boundary
\g the end of a previous match

Note: public string[] Split (string regex,int limit) splits this string by matching a given regular expression.
The array returned by this method contains each substring of this string, terminated by another substring that matches the given expression, or terminated by the end of the string. The substrings in the array are arranged in the order in which they are in this string. If the expression does not match any part of the input, the resulting array has only one element, that is, the string.

The limit parameter controls the number of times the pattern is applied, thus affecting the length of the resulting array. If the limit n is greater than 0, the pattern will be applied up to n-1 times, the length of the array will not be greater than N, and the last item of the array will contain all inputs that exceed the last matching delimiter. If n is not positive, the pattern is applied as many times as possible, and the array can be any length. If n is zero, the pattern will be applied as many times as possible, the array can have any length, and the trailing empty string will be discarded.

Back to the original topic, the title of the matching regular expression is "/S", representing the white space character, when there is no matching character in the given string, the output is the original string, so the output character length is 1.

Let's attach some knowledge about the expressions in Java.

^ Opening for the limit
^java conditions are limited to Java as the starting character
$ for Limit End
java$ conditions are limited to Java-terminated characters
. conditional limit except \ n any single character
Java.. Any two characters in addition to line breaks after a condition is restricted to Java

Add a specific restriction condition "[]"
[A-Z] condition is limited to one character in the lowercase a to Z range
[A-Z] condition is limited to one character in the uppercase A to Z range
[A-za-z] Condition limited to one character in lowercase A to Z or uppercase A to Z range
[0-9] conditions limited to one character in the lowercase 0 to 9 range
[0-9a-z] conditions are limited to one character in the lowercase 0 to 9 or a to Z range
[0-9[a-z]] condition limited to one character (intersection) in lowercase 0 to 9 or a to Z range

[] Add ^ after add again limit condition "[^]"
[^a-z] Condition limited to one character in the non-lowercase a to Z range
[^a-z] condition is limited to one character in the non-uppercase A to Z range
[^a-za-z] conditions are limited to one character in the range of non-lowercase A to Z or uppercase A to Z
[^0-9] Condition limited to one character in a non-lowercase 0 to 9 range
[^0-9a-z] conditions are limited to one character in a non-lowercase 0 to 9 or a to Z range
[^0-9[a-z]] condition limited to one character (intersection) in non-lowercase 0 to 9 or a to Z range

You can use "*" when the limit is more than 0 times for a specific character
j* more than 0 J
. * More than 0 any characters
J.*d 0 or more characters between J and D

You can use "+" when the limit is more than 1 times for a specific character
j+ more than 1 J
. + 1 + any character
J.+d 1 or more characters between J and D

When a restriction is 0 or more than 1 occurrences of a specific character, you can use the? 」
MAX J or JA appears

Limit to consecutive occurrences of the specified number of characters "{a}"
J{2} JJ
J{3} JJJ
Text more than a, and "{a,}"
J{3,} jjj,jjjj,jjjjj,??? (3 times above J co-exist)
More than one word, B below "{a,b}"
j{3,5} JJJ or JJJJ or JJJJJ
Take a "| of both. 」
j| A J or a
java| Hello Java or Hello

A combination type is specified in "()"
For example, I query <a href=\ "index.html\" >index</a> <a href></a> between the data, can write <a.*href=\ ". *\" > (. +?) </a>

When using the Pattern.compile function, you can add parameters that control the matching behavior of the regular expression:
Pattern Pattern.compile (String regex, int flag)

The values of the

flag range are as follows:
Pattern.canon_eq      when and only if two characters normal decomposition ( Canonical decomposition) " is identical to the case before the match is identified. For example, after using this flag, the expression "a\u030a" will match "?" . By default, canonical equality (canonical equivalence) is not considered.
pattern.case_insensitive (? i)      by default, case-insensitive matching applies only to the US-ASCII character set. This flag allows the expression to ignore casing for matching. To match the size of the Unicode characters, just combine the UNICODE_CASE with the logo. &NBSP
Pattern.comments (? x)      in this mode, the match ignores ( in regular expression) space character ( Translator Note: Not refers to the expression of the "\\s" , but refers to the expression in the space, tab , carriage return ) . Comments start at # until the end of the line. The Unix line mode can be enabled by an embedded flag.

Pattern.dotall (? s) in this mode, the expression '. ' can match any character, including the Terminator that represents a line. By default, the expression '. ' does not match the terminator of the row.
Pattern.multiline
(? m) in this mode, ' ^ ' and ' $ ' match the start and end of a line, respectively. Also, ' ^ ' still matches the beginning of the string, ' $ ' also matches the end of the string. By default, these two expressions match only the beginning and end of a string.
Pattern.unicode_case
(? u) in this mode, if you also enable the Case_insensitive flag, it will match the uppercase and lowercase characters of the Unicode character. By default, case-insensitive matches apply only to the US-ASCII character set.
Pattern.unix_lines (? d) in this mode, only ' \ n ' is considered a line abort and is matched with '. ', ' ^ ', and ' $ '.

Throw away the empty concept, and write down a few simple Java regular use cases:

For example, when a string contains validation

Find a string that starts with Java and ends at any end
Pattern pattern = pattern.compile ("^java.*");
Matcher Matcher = Pattern.matcher ("Java is not human");
Boolean b= matcher.matches ();
Returns True when the condition is satisfied, otherwise false
System.out.println (b);

When splitting strings in multiple conditions
Pattern pattern = Pattern.compile ("[, |] +");
string[] STRs = Pattern.split ("Java Hello World java,hello,,world| Sun ");
for (int i=0;i<strs.length;i++) {
System.out.println (Strs[i]);
}

Text substitution (the first occurrence of a character)
Pattern pattern = pattern.compile ("Regular expression");
Matcher Matcher = Pattern.matcher ("Regular expression Hello world, regular expression Hello World");
Replace the first data that matches the regular one
System.out.println (Matcher.replacefirst ("Java"));

Text Replace (All)
Pattern pattern = pattern.compile ("Regular expression");
Matcher Matcher = Pattern.matcher ("Regular expression Hello world, regular expression Hello World");
Replace the first data that matches the regular one
System.out.println (Matcher.replaceall ("Java"));

Text substitution (substitution characters)
Pattern pattern = pattern.compile ("Regular expression");
Matcher Matcher = Pattern.matcher ("Regular expression Hello world, regular expression Hello World");
StringBuffer sbr = new StringBuffer ();
while (Matcher.find ()) {
Matcher.appendreplacement (SBR, "Java");
}
Matcher.appendtail (SBR);
System.out.println (Sbr.tostring ());

Verify that you are an e-mail address

String str= "[email protected]";
Pattern pattern = Pattern.compile ("[\\w\\.\\-][email protected] ([\\w\\-]+\\.) +[\\w\\-]+ ", pattern.case_insensitive);
Matcher Matcher = Pattern.matcher (str);
System.out.println (Matcher.matches ());

Remove HTML tags
Pattern pattern = pattern.compile ("<.+?>", Pattern.dotall);
Matcher Matcher = Pattern.matcher ("<a href=\" index.html\ "> Home </a>");
String string = Matcher.replaceall ("");
System.out.println (string);

Finding the corresponding conditional string in HTML
Pattern pattern = pattern.compile ("Href=\" (. +?) \"");
Matcher Matcher = Pattern.matcher ("<a href=\" index.html\ "> Home </a>");
if (Matcher.find ())
System.out.println (Matcher.group (1));
}

Intercept/HTTP Address
Intercepting URLs
Pattern pattern = Pattern.compile ("(http://|https://) {1}[\\w\\.\\-/:]+");
Matcher Matcher = Pattern.matcher ("dsdsdsStringBuffer buffer = new StringBuffer ();
while (Matcher.find ()) {
Buffer.append (Matcher.group ());
Buffer.append ("\ r \ n");
System.out.println (Buffer.tostring ());
}

Replace the text in the specified {}

String str = "The current history of Java is made up of {0} years-{1} years";
String[][] object={new string[]{"\\{0\\}", "1995"},new string[]{"\\{1\\}", "2007"};
SYSTEM.OUT.PRINTLN (replace (str,object));

public static string replace (final String sourcestring,object[] Object) {
String temp=sourcestring;
for (int i=0;i<object.length;i++) {
String[] result= (string[]) object[i];
Pattern pattern = pattern.compile (result[0]);
Matcher Matcher = pattern.matcher (temp);
Temp=matcher.replaceall (result[1]);
}
return temp;
}

Querying the files in the specified directory with regular conditions

For cache file List
Private ArrayList files = new ArrayList ();
Used to host file paths
Private String _path;
Used to host a regular formula that is not merged
Private String _regexp;

Class Myfilefilter implements FileFilter {

/**
* Match file name
*/
Public boolean accept (file file) {
try {
Pattern pattern = pattern.compile (_REGEXP);
Matcher match = Pattern.matcher (File.getname ());
return match.matches ();
} catch (Exception e) {
return true;
}
}
}

/**
* Parse input stream
* @param Inpu

About the use of split in Java

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More