Python Regular Expression tutorial 2: capture, python Regular Expression
Preface
In the previous article, we introduced the basics of Python regular expressions. In this article, we will summarize the usage of Regular Expressions for capturing. I will not talk about it below. Let's take a look at the detailed introduction.
Capture
Capturing and grouping are closely related to regular expressions. Generally, grouping is captured and completed with parentheses (therefore, parentheses are also special characters in regular expressions, escape when expressing the Original Meaning ):
(...) Group normally and capture
(? :...) Group, but not captured
For example, suppose we need to match a landline number:
>>> m = re.search(r'^(\d{3,4}-)?(\d{7,8})$','020-82228888')>>> m.group(0)'020-82228888'>>> m.group(1)'020-'>>> m.group(2)'82228888'
Here, the default group (0) is a complete match, and the subsequent groups are arranged in the order of appearance.
Next, we want to find all landline numbers in the entire text section.re.findall
:
>>> re.findall(r'(\d{3,4}-)?(\d{7,8})','020-82228888\n0357-4227865') [('020-', '82228888'), ('0357-', '4227865')]
Findall has a feature, that is, if a captured group exists in the result, the captured group is formed into a tuple to return. Using this feature, we can get the expected result from the grouping but not capturing syntax mentioned above:
>>> re.findall(r'(?:\d{3,4}-)?\d{7,8}','020-82228888\n0357-4227865') ['020-82228888', '0357-4227865']>>> re.findall(r'(?:\d{3,4}-)?\d{7,8}','020-82228888\n4227865') ['020-82228888', '4227865']
In a regular expression, \ 1, \ 2 can also be used to refer to the previously captured string combination. This is often used for correct matching of single double quotes:
>>> sentence = """You said "why?" and I say "I don't know".""">>> re.findall(r'["\'](.*?)["\']', sentence)['why?', 'I don']>>> re.findall(r'(["\'])(.*?)\1', sentence)[('"', 'why?'), ('"', "I don't know")]
In addition, if \ 1 and \ 2 are not readable, you can capture an English name. In the following example, the conversion between two different date formats is realized:
>>> sentence = "from 12/22/1629 to 11/14/1643">>> re.sub(r'(?P<month>\d{2})/(?P<day>\d{2})/(?P<year>\d{4})', r'\g<year>-\g<month>-\g<day>', sentence) 'from 1629-12-22 to 1643-11-14'
However, this naming reference capture method is invalid in findall and search:
>>> sentence = """You said "why?" and I say "I don't know".""">>> re.findall(r'(?P<quote>["\'])(.*?)\g<quote>', sentence) []>>> re.search(r'(?P<quote>["\'])(.*?)\g<quote>', sentence) >>> re.search(r'(?P<quote>["\'])(.*?)\1', sentence) <_sre.SRE_Match object; span=(9, 15), match='"why?"'>>>> re.search(r'(?P<quote>["\'])(.*?)\1', sentence).groupdict(){'quote': '"'}
Summary
The above is all about group capture in the Python regular expression. I hope the content in this article will help you learn or use python. If you have any questions, please leave a message, if you have any questions, you can leave a message. In the next article, I will continue to summarize the greedy/non-Greedy features of Regular Expression matching. Please stay tuned to the customer's home.