To obtain a text chapter, you must first determine the standard of the beginning of the chapter. Generally, Chinese chapters start with "the first chapter, the second chapter, and so on. Therefore, the "^" character is used to determine the first place, but many times there will be a blank character before the chapter, all start with the "Number" as the chapter, perform the following match
^ \ S * No.
"\ S" indicates a blank character, and "*" indicates 0 to multiple blank characters. "^" indicates that a blank character is followed by "number as the start ". If the string to be matched is not at the beginning of the paragraph, remove "^.
The sequence number format of a chapter is not uniform. It may be an Arabic number or a Chinese character, and the length generally does not exceed 9 characters. Therefore, it is matched with any character.
. {1, 9}
"." Indicates any character. "{}" indicates the minimum length of 1 time and the maximum length of 9 times.
Chapter number is followed by Modification
[Chapter volume part back]
"[]" Indicates that one character appears. "[]" matches a single character, it is equivalent to "Chapter", "Section", "volume", "set", "Part", "article", or "back.
There are generally blank characters in front of the title, or there may be no matching of white spaces.
\ S *
The Unit title can be any character, any character is ".", and 0 to multiple arbitrary characters are
.*
Generally, the next chapter is a line feed, so there must be a line feed matching at the end.
\ N | \ r \ n
Combine all the above regular expressions into one
(^ \ S * No )(. {1, 9}) [Chapter volume part back] (\ s *)(. *) (\ n | \ r \ n)
This regular expression contains six groups. The entire expression is the first group, and each "()" contains a group. [Chapter volume part back] is a group, after completing the regular expression, we can use Java to obtain the section title.
= "The first time I experienced a sudden change in the snow \ r \ n Qian Tang River, Hao River, and water, the day and night from the nean niujia village, the endless flow into the sea. "= Pattern. compile ("(^ \ s * Number )(. {1, 9}) [Chapter volume part back] (\ s *)(. *) (\ n | \ r \ n) "= (I = 0; I <= matcher. groupCount (); I ++ "group" + I + ":" + matcher. start (I) + "-" +
Output result:
Group0: 0-9 the first time the wind powers up group1: 0-1 group2: 1-2 a group3: 3-4 group4: 4-8 the wind powers up group5: 8-9