Introduction to regular expression Learning (Java)

Source: Internet
Author: User
Tags expression engine

Since June, I have been doing automatic capturing of lecture information for a long period of time. The most important thing I rely on is a very good time parsing engine, I hope I can parse as many time and data formats as possible. Unfortunately, I really cannot find a great "Wheel", so I have to spend a lot of time processing and parsing time strings, java provides the dateformat interface and simpledateformat in the text package, which makes the problem a little simpler. However, anyone who has done time processing knows that it is just "a little simpler. In order to do such a common time string parsing method, I also took a lot of detours. After a month of exploration, I found that what can really help solve the problem is
Java's Regular Expression Engine, in terms of strategy, is to try to let the string pass through regular expression processing, and finally convert it into one or two unified main formats, that is, after this understanding is established, clarified the importance of regular expressions-but it was really just book knowledge.

In the preface to "proficient in regular expressions", I wrote that if the great inventions in the computer software field are listed, there will be no more than 20 items, well-known guys such as group exchange network, web, lisp, hash algorithm, UNIX, compilation technology, relational model, object-oriented, and XML, and regular expressions should never be missed. In the subsequent regular expressions, the preface says,"For a lot of practical work, regular expressions are a panacea, which can improve development efficiency and program quality by hundreds of times.The most classic use of regular expressions is the study of bioinformatics and human genetic maps.

According to the experience of the time parsing engine that I had to do to realize automatic parsing of lecture information recently, regular expressions are definitely a powerful tool used by any programming language to process strings, with this function, the program development efficiency can be greatly improved. If regular expressions are not used or the regular expressions are not fully applied to the appropriate regular expressions, many program codes are cumbersome and poor, whether it's the program owner or later maintainer, it will be very depressing to cut, splice, and perform Array Operations on a large number of underlying strings, at the beginning of the time parsing, I had to splice several methods without stopping using the string split method (because at that time, I did not master the advanced features of regular expression ).

OK, all of the above can be summarized as one sentence, powerful regular expresstions, which need programmer to master and apply to your work, strong regular expressions should be the first response of programmers in string processing. Next we will mainly discuss how to learn and master the regular expressions in Java (because I only use Java ).

To truly learn to use regular expressions, an essential process is to continue to improve in solving the problems of instances one by one. As the problem starts from simple to complex, the grasp of the regular expression syntax is also from simple to advanced.

This book "proficient in regular expressions" is well written, but for Java programmers, there is no PERL/PHP /. net syntax does not understand, it is a pity, there is no way to understand the advanced features of specific instances for non-Java language, so objectively affects the use of Java regular expressions to use advanced features to learn, at the same time, it is also a process for beginners who use regular expressions more advanced. write such a blog to summarize and improve.

 

Case 1: Use the Java regular capture group to extract a specific string segment that meets the logic needs

Specifically
The population of 2984444215 is growing
Extracted specific population values

 

To solve this problem, Java's basic regular expression syntax and related two main classes are required. util. regEx. pattern and Java. util. regEx. matcher is the simplest task.

There are two solutions for this string

First, only use \ D + to match the string that appears in the string. If no other number exists from the string 2984444215 to be extracted, you can use this rough method.

The specific implementation method is as follows:

 

        String str="The population of 2984444215 is growing";        Pattern p=Pattern.compile("\\d+");        Matcher m=p.matcher(str);        if(m.find()){            String result=m.group();            System.out.println("result is "+result);        }

Second, more accurate. The determination is a series of numbers, and these numbers satisfy the characteristics that the previous string must be population of, which is actually matching XXX in the logical block of population of XXX, and XXX must be a continuous number.

You can use a capture group that is expressed in regular expressions, or reverse view of the advanced features. The code for solving the problem with a capture group is as follows:

String STR = "the population of 2984444215 is growing"; pattern P = pattern. compile (". * (population \ s + of \ s +) (\ D + ). * "); matcher M = P. matcher (STR); If (M. find () {string result = m. group (2); // use a regular capture group to ensure that the string position is correct. out. println ("result is" + result );}

 

 

Case 2: use Java regular expressions to extract multiple string segments that meet the logic needs

 

For example, Java regular expressions are used to extract all the Population Series values in the following text.

 

The population of China is 1295000000

The population of Japan is 135000000

 

 

 

Case 3:

Group)
The Logic block in the string is converted and adjusted before and after the position. It is designed to be a regular expression grouping or reverse reference knowledge.

 

Achieving objective: to convert 2012-8-: 30-to, is actually a simple task of reversing the position

 

There can be at least two implementation methods. The first method is to use the hard string cutting and splicing ideas. For example, you can use it to cut the string into two parts, and then re-use + to get it in one piece. The code implementation is as follows.

 

The other is to use a regular expression, which can be done through a line of code. The code is implemented as follows:

        String str="2012-8-17,9:30-11:30";        str=str.replaceAll("(\\d{4}-\\d{1,2}-\\d{1,2}),(\\d{1,2}:\\d{1,2}-\\d{1,2}:\\d{1,2})",                "$2,$1");        System.out.println(str);

 

Case 4:

 

Convert 22nd in the time string-Am, Tuesday, May 22nd, 2011 to 22, that is, to convert ST in 21st
Remove RD of Nd and 23rd of 22nd and th of 24th,

The solution to this problem is relatively tricky, because simply using the string Replace ("nd", "") method, replace the nd in Monday or Sunday at the same time, that is, remove the ND with the replacement in general sense, which will interfere with other strings containing the nd, therefore, it is difficult to exclude other strings that do not contain such characters as nd.

 

Specifically, for the nd, The lookaround function provided by the regular expression includes sequential and reverse view. Here, the reverse view function can be used to solve the problem, if the nd must have a number ending with 2 (such as 2,232), replace the nd in this case is what we want, so we can ensure that the nd in Sunday is not replaced.

 

The specific implementation code method is as follows. In fact, it still needs to be further precise, but it is enough.

public static String handleStNdRdTh(String str){        String result=str;        Pattern p1=Pattern.compile("\\d{1,2}st");        Pattern p2=Pattern.compile("\\d{1,2}nd");        Pattern p3=Pattern.compile("\\d{1,2}rd");        Pattern p4=Pattern.compile("\\d{1,2}th");        Matcher m=p1.matcher(str);        if(m.find()){            String tmp=m.group();            String numbStr="";            Pattern p=Pattern.compile("\\d{1,2}");            Matcher m2=p.matcher(tmp);            if(m2.find()){                numbStr=m2.group();                result=str.replace(tmp,numbStr);            }        }else if(m.reset().usePattern(p2).find()){            String tmp=m.group();            String numbStr="";            Pattern p=Pattern.compile("\\d{1,2}");            Matcher m2=p.matcher(tmp);            if(m2.find()){                numbStr=m2.group();                result=str.replace(tmp,numbStr);            }        }else if(m.reset().usePattern(p3).find()){            String tmp=m.group();            String numbStr="";            Pattern p=Pattern.compile("\\d{1,2}");            Matcher m2=p.matcher(tmp);            if(m2.find()){                numbStr=m2.group();                result=str.replace(tmp,numbStr);            }        }else if(m.reset().usePattern(p4).find()){            String tmp=m.group();            String numbStr="";            Pattern p=Pattern.compile("\\d{1,2}");            Matcher m2=p.matcher(tmp);            if(m2.find()){                numbStr=m2.group();                result=str.replace(tmp,numbStr);            }        }        return result;    }

 

 

Case 5: Insert a string at a specified position or delete a string at a specified position

 

Goal: Add a comma from right to left for the number in the population of 2984444215 is growing.

 

This example is actually a specific example in "proficient in regular expressions", but it is implemented in the Perl language, and there are also a variety of solutions. The first is the most stupid method. First, we extract the consecutive numeric strings in the string, divide each three strings, and splice them one by one; the second method is to use the loop view function of the regular expression to solve the problem. According to the difference of the loop view, there are two different implementations: sequential and reverse.

 

The specific implementation code is as follows:

 

Reverse view --

String teststr = "the population of 2984444215 is growing"; // use the reverse surround view function. teststr = teststr. replaceall ("(? <= \ D )(? = (\ D) + (?! \ D) ",", "); system. Out. println (teststr );

Sequential view --

String teststr = "the population of 2984444215 is growing"; // use the forward view function, and use the reverse reference teststr = teststr. replaceall ("(\ D )(? = (\ D) + (?! \ D) "," $1, "); system. Out. println (teststr );

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.