Interpretation of the basic application of Python Regular Expressions

Source: Internet
Author: User

The Python programming language is a powerful object-oriented literal translation computer programming language. So how can we use this language correctly to help us implement some functional requirements? Here we can start with the Python regular expression.

  • Learn about the ease-of-use features of sending emails in Python
  • Python dictionary addition and deletion operations
  • A variety of common Python dictionary Application Methods
  • Python PDB simple debugging method
  • Summary of common Python Image Processing Techniques

Python has added the re module since version 1.5. It provides the Perl-style regular expression mode. In versions earlier than Python 1.5, the Emecs style mode is provided through the regex module. The Emacs style mode is slightly less readable and has poor functions. Therefore, when writing new code, try not to use the regex module. Of course, you may still find it in the old Code occasionally.

In essence, Python regular expressions or RESS are small and highly specialized programming languages embedded in Python and implemented through the RE module. In this small language, you can specify rules for the corresponding string set to be matched; this string set may contain an English statement, an e-mail address, a TeX command, or anything you want to do. Then you can ask, "Does this string match this pattern ?" Or "Is there a part of the string that matches this pattern ?". You can also use RE to modify or split strings in various ways.

The regular expression mode is compiled into a series of bytecode and then executed by the matching engine written in C. In advanced usage, you may also need to pay attention to how the engine executes the given RE and how to write the RE in a specific way to make the produced bytecode run faster. This article does not involve optimization, because it requires you to fully master the internal mechanism of the matching engine.

The Regular Expression Language is relatively small and has limited functions), so not all string processing can be completed using regular expressions. Of course, some tasks can be completed using regular expressions, but the final expression will become very complex. In these cases, it may be better to write Python code for processing. Although Python code is slower than a sophisticated regular expression, it is easier to understand.

Simple Mode

We will start from the simplest Python Regular Expression learning. Because regular expressions are often used in string operations, we start with the most common task: character matching.

For more information about deterministic and non-deterministic finite automaton in computer science at the underlying layer of regular expressions, see any textbooks related to compiler compilation.

Character matching

Most letters and characters generally match themselves. For example, the regular expression "test" Will exactly match the string "test. You can also use the case-insensitive mode. It also allows the RE to match "Test" or "TEST". More explanations will be given later .)

Of course, there will be exceptions to this rule. Some characters are special, but they do not match themselves. They indicate that they should match some special things, or they will affect the repeat times of other parts of the RE. This article discusses a wide range of metacharacters and their functions.

Repeated

The first thing that Python regular expressions can do is to match character sets with an indefinite length, which cannot be done by other methods that can act on strings. However, if it is the only additional function of a regular expression, they will not be so good. Another feature is that you can specify the number of repetitions of a part of a regular expression.

The first metacharacter of the repeat function we discussed is *. * It does not match the letter "*"; instead, it specifies that the first character can be matched zero or more times, rather than only once.

For example, ca * t will match "ct" (0 "a" characters), "cat" (1 ""), "caaat" (3 "a" characters. The RE engine has various internal restrictions on the integer size from C to prevent it from matching more than 0.2 billion "a" characters; you may not have enough memory to build such a large string, so the limit will not be accumulated.

Repeat like * is greedy. When repeat a RE, the matching engine tries to repeat as many times as possible. If the rear part of the pattern is not matched, the matching engine returns and tries a smaller duplicate.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.