Java uses regular expressions to obtain the chapter name of a text

Source: Internet
Author: User

To obtain a text chapter, you must first determine the standard of the beginning of the chapter. Generally, Chinese chapters start with "the first chapter, the second chapter, and so on. Therefore, the "^" character is used to determine the first place, but many times there will be a blank character before the chapter, all start with the "Number" as the chapter, perform the following match

^ \ S * No.

"\ S" indicates a blank character, and "*" indicates 0 to multiple blank characters. "^" indicates that a blank character is followed by "number as the start ". If the string to be matched is not at the beginning of the paragraph, remove "^.

The sequence number format of a chapter is not uniform. It may be an Arabic number or a Chinese character, and the length generally does not exceed 9 characters. Therefore, it is matched with any character.

. {1, 9}

"." Indicates any character. "{}" indicates the minimum length of 1 time and the maximum length of 9 times.

Chapter number is followed by Modification

[Chapter volume part back]

"[]" Indicates that one character appears. "[]" matches a single character, it is equivalent to "Chapter", "Section", "volume", "set", "Part", "article", or "back.

There are generally blank characters in front of the title, or there may be no matching of white spaces.

\ S *

The Unit title can be any character, any character is ".", and 0 to multiple arbitrary characters are

.*

Generally, the next chapter is a line feed, so there must be a line feed matching at the end.

\ N | \ r \ n

Combine all the above regular expressions into one

(^ \ S * No )(. {1, 9}) [Chapter volume part back] (\ s *)(. *) (\ n | \ r \ n)

This regular expression contains six groups. The entire expression is the first group, and each "()" contains a group. [Chapter volume part back] is a group, after completing the regular expression, we can use Java to obtain the section title.

= "The first time I experienced a sudden change in the snow \ r \ n Qian Tang River, Hao River, and water, the day and night from the nean niujia village, the endless flow into the sea. "= Pattern. compile ("(^ \ s * Number )(. {1, 9}) [Chapter volume part back] (\ s *)(. *) (\ n | \ r \ n) "= (I = 0; I <= matcher. groupCount (); I ++ "group" + I + ":" + matcher. start (I) + "-" +

Output result:

Group0: 0-9 the first time the wind powers up group1: 0-1 group2: 1-2 a group3: 3-4 group4: 4-8 the wind powers up group5: 8-9

 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.