Java regular Expressions (bottom)

Source: Internet
Author: User
Tags contains expression html page log perl regular expression regular expression split
Three, application examples below we look at some examples of Jakarta-oro library applications. 3.1 log file processing task: Analyze a Web server log file to determine how long each user spends on the site. In a typical BEA WebLogic log file, the format of logging is as follows: analyzes this log record to find that there are two items to extract from this log file: IP address and page access time. You can extract IP addresses and time tokens from log records using group symbols (parentheses). First let's look at the IP address. The IP address consists of 4 bytes, each byte in a value between 0 and 255, and each byte is separated by a period. Therefore, each byte in the IP address has at least one, up to three digits. Figure Eight shows the regular expression written for the IP address:





Figure Eight: The period character in the matching IP address IP address must be escaped (preceded by "\") because the period in the IP address has its meaning rather than the special meaning in the regular expression syntax. The special meaning of a period in a regular expression is described earlier in this article. The time portion of the log record is surrounded by a pair of brackets. You can extract everything from the square brackets by first searching for the starting square bracket character ("["), extracting any content that does not exceed the closing square bracket character ("]"), and looking forward until the closing bracket character is found. Figure Nine shows the regular expression for this section.





Figure nine: match at least one character until "]" now, combine the two regular expressions with the grouping symbols (parentheses) and merge them into a single expression, so that the IP address and time can be extracted from the log record. Note that in order to match "--" (but not extract it), "\s-\s-\s" is added to the middle of the regular expression. The complete regular expression is shown in Figure 10.





Figure 10: Match IP address and time tag now that the regular expression has been written, you can then write Java code that uses the regular expression library. To use the Jakarta-oro library, first create the regular expression string and the log record string to be parsed: The regular expression used here is almost exactly the same as the regular expression in Figure 10, with one exception: in Java, You must escape each forward slash ("\"). Figure 10 is not a Java representation, so we need to precede each "\" with a "\" to avoid a compilation error. Unfortunately, the escape process is prone to errors, so you should be cautious. You can first enter a regular expression without escaping, and then replace each "\" with "\" from left to right. If you want to recheck, you can try to output it to the screen. After the string is initialized, instantiate the Patterncompiler object and create a pattern object using Patterncompiler to compile the regular expression: Now, create the Patternmatcher object, The contain () method that invokes the Patternmatcher interface checks for matches : Next, the Patternmatcher object returned by the Matchresult interface is used to output the matching group. Because the LogEntry string contains matching content, you can see the class like the following output: 3.2 HTML processing instance aThe following task is to parse all the properties of a font tag within an HTML page. The typical font tags in an HTML page are as follows: the program prints the properties of each font tag as follows: In this case, I recommend that you use two regular expressions. The first, as shown in Figure 11, extracts the "face=" Arial from the font tag, Serif "size=" "+2" color= "Red".





Figure 11: Match all properties of the font tag the second regular expression, shown in Figure 12, divides the attributes into name-value pairs.





Figure 12: Match a single attribute and split it into a name-value pair split result: Now let's look at the Java code that completes this task. First, you create two regular expression strings and compile them into pattern objects using Perl5compiler. When compiling a regular expression, specify the Perl5compiler.case_insensitive_mask option so that the matching operation is case-insensitive. Next, create a Perl5matcher object that performs the matching operation. Suppose you have a variable HTML of type string that represents a line of content in an HTML file. If the HTML string contains a font tag, the match returns True. At this point, you can get the first group with the Matchresult object returned by the match object, which contains all the properties of the font: next creates a Patternmatcherinput object. This object allows you to continue the matching operation from the last match, so it is appropriate to extract the name-value pairs of the attributes within the font tag. Creates a Patternmatcherinput object, passing in a parameter to the string to be matched. Then, the properties of each font are extracted with a matching instance. This is done repeatedly by calling the Patternmatcher object's contains () method by specifying the Patternmatcherinput object (rather than a string object) as a parameter. Each iteration of the Patternmatcherinput object moves its internal pointer forward, and the next detection begins after the previous match. The output of this example is as follows : 3.3 HTML processing instance two let's look at another example that handles HTML. This time, we assume that the Web server moved from widgets.acme.com to newserver.acme.com. Now you want to modify the links in some pages: The regular expression that performs this search is shown in Figure 13:





Figure 13: The link before the modification if you can match the regular expression, you can replace the link in Figure 13 with the following: Note # character followed by $. Perl Regular expression syntax uses $, $, and so on to represent groups that have been matched and extracted. The expression in Figure 13 attaches all content that is matched and extracted as a group to the back of the link. Now, return to Java. As we have done before, you must create a test string, create the object necessary to compile the regular expression into the pattern object, and create a Patternmatcher object: Next, Replace with the substitute () static method of the Com.oroinc.text.regex package Util class, the output result string : The syntax for the Util.substitute () method is as follows:
Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.