Quick Start for regular expressions

Source: Internet
Author: User

Read a question before you say regular expressions:

Problem Description:
Find the word in the given string
(the word is made up of uppercase and lowercase alphabetic characters, and other non-alphabetic characters are treated as intervals of words,
such as spaces, question marks, numbers, and so on; a single letter is not a word); After a word is found, sort in descending order by length ,
(If the length is the same when sorting, it is arranged in the order in which it appears), and then output to a new string;
If a word recurs multiple times, it is output only once , and if no word is found in the entire input string,
Please output an empty string. The output words are separated by a "space".
Example:

Input: "Some local buses, some1234123drivers",
Output: "Drivers local buses some"
Input: "%a^123 t 3453i* ()"
Output: charoutput[]= ""

    这是一道有关字符串操作的题目,看到题目中的“按照长度进行降序排序" " 重复出现多次,则只输出一次"这样的关键字,我的第一反应是可以用python中的集合去重以及sort()函数来实现。    开始的思路是可以用字符串操作函数吧给定的单词提取出来,放在集合里,用集合的set()函数进行去重然后用sort()函数来进行排序,最后输出。    由于作者是初学python,对python的这些语法函数也只是有一个印象,知道有这些东西,因此还有很多东西要学习,比如:    1.怎样让sort()函数按我们想要的方式来排序(按照长度,降序排序)当然你如果有兴趣可以看我发表的第一篇关于sort()函数的博客。    2怎样把单词提取出来,我的思路是第一个与最后一个单独提取,中间的用于查找空格或逗号等其他字符find()函数……总之非常麻烦。

Therefore, you can try to solve this problem with regular expressions.


Regular expressions (regular expression abbreviation RE):

Re is a string of characters and special characters that describe some kind of repetition of these characters and characters.
Its role is to match, search-replace certain patterns in text.


Regular expressions can be used to "filter" in the text to extract the information you want, or to modify, so that the text format to meet your requirements.
  
For example:
The most common ordinary regular expression is a string such as "goodmorning" it is used to match the string "GoodMorning", "abc123" is used to match "abc123"

The complex regular expression consists of a string of characters and special characters to match the string you want, from the concrete to the abstract, such as "\[email protected]\w+.com" can be used for matching all the [email protected] format of the mailbox.
So first introduce the meanings of these special symbols:




Following the functions of re in the specific explanation of the important parts, first introduce two important functions:




1.re.match function
Re.match attempts to match a pattern from the starting position of the string, and if the match is not successful, match () returns none.
function Syntax:
Re.match (pattern, string)
Function parameter Description:
Parameter description
Pattern-matched Regular expression
String to match.
The match succeeds Re.match method returns a matching object, otherwise none is returned.
We can use the group (NUM) or groups () matching object function to get a matching expression.
Such as:

import rem=re.match("foo","food")print(m)print(m.group())

Result is

<_sre.SRE_Match object; span=(0, 3), match=‘foo‘>foo

Where the first line is a message indicating that "foo" is found in "food"
The second line is the matching string found by the group () function

Here is a counter example:

import rem=re.match("foo","foggh")print(m)print(m.group())

The result is:
None
File "c:/users/dell/pycharmprojects/untitled1/python.py", line Max, in
Print (M.group ())
Attributeerror: ' Nonetype ' object has no attribute ' group '

"Foo" was not found in "Foggh", so the match () function returned none, and the group function raised a attributeerror.



2.re.search function

Re.search scans the entire string and returns the first successful match.
function Syntax:
Re.search (pattern, string)
Function parameter Description:
Parameter description
Pattern-matched Regular expression
String to match.
The match succeeds Re.search method returns a matching object, otherwise none is returned.
We can use the group (NUM) or groups () matching object function to get a matching expression.
For example:

import rem=re.search("foo","dddfood")print(m)print(m.group())

The result is:

<_sre.SRE_Match object; span=(3, 6), match=‘foo‘>foo

The inverse example is:

import rem=re.search("foo","fdfoggh")print(m)print(m.group())

The result is

call last):  File "C:/Users/DELL/PycharmProjects/untitled1/python.py", line 60, in <module>    print(m.group())NoneAttributeError: ‘NoneType‘ object has no attribute ‘group‘


The difference between Re.match and Re.search
Re.match matches only the beginning of the string, if the string does not begin to conform to the regular expression, the match fails, the function returns none, and Re.search matches the entire string until a match is found.

Next, we introduce the important characters:

1. "."
“.” can be substituted for any single character (except for line break \ n), and the regular expression is written as ". S"
Cases:

import reRE=".end"m=re.match(RE,"bend")ifisnotNone:    print(m.group())

The result is
bend

2. "|"
The regular expression is written as "S1|S2|S3", indicating that the string s matches in S1 or S2 or S3
Cases:

import reRE="bat|bet|bit"m=re.match(RE,"bit")ifisnotNone:    print(m.group())

Result is

bit



3. "[]"
A regular expression that uses square brackets matches any one of the characters in the square brackets

import reRE="[cr][23][dp][o2]"m=re.match(RE,"c3po")if m is not None:    print(m.group())

The result is:

c3po



In addition, you can write [a-z] to match any character in the letter A to the letter Z, as well as [a-z],[0-9] or even [a-za-z0-9]
  There is also a simpler way, for example, [a-za-z0-9] can be replaced with "\w", which represents the character number of the character set, [0-9] can be represented by "\d". More special characters can refer to the information given in the table above.


A special symbol that represents the number of repetitions:
There are special symbols in regular expressions that can represent repetitions, and to introduce it, let's look at an example

import reRE="\[email protected]\w+\.com"m=re.match(RE,"[email protected]")ifisnotNone:    print(m.group())

It turns out to be

110471959@qq.com

The regular expression "\[email protected]\w+.com" in this code is a mailbox that can be used to match all [email protected] formats.
We talked about \w is the character number of the character set, "+" means that the left "\w" can be repeated one or more times, so it can match a long number of 1104471959, can also match a long string of characters (that is, the string), and "." The "\" in front is an escape character, we know "." Can represent a character, that if we just want to match "." What about itself? Just in the "." Preceded by the escape character "\" is OK

Similar to have

"*" means repeating its left character 0 or more times,
”? "represents repeating its left character 0 or one time.
It is important to note that they represent only the number of repetitions of the character on the left, and if you want to repeat multiple characters, just add ()

Use of parentheses in regular expressions:

In fact, () not only can you use it in repeating multiple characters, there are more uses
Let's look at a piece of code first:

import reRE="(\w{3})-(\d{3})"m=re.match(RE,"abc-123456")if m is not None:    print(m.group())    print(m.group(0))    print(m.group(1))    print(m.group(2))    print(m.groups())

The result is:

abc-123abc-123abc123(‘abc‘‘123‘)

The meaning of {3} is to repeat the left "\w" three times, we use parentheses to group the regular expression, so that when we use the group () function to add parameters can be extracted, its use you know



Let's end with an important function.
1.sub () function
The Python re module provides re.sub to replace matches in a string.
Grammar:
Re.sub (Pattern, Repl, String, max=0)
The returned string is replaced by a match that is not repeated on the leftmost side of the re in the string. If the pattern is not found, the character will be returned unchanged.
The optional parameter count is the maximum number of times a pattern match is replaced, and count must be a non-negative integer. The default value is 0 to replace all matches.

We use the sub () function to change the format or content of the string to what we want, usually we only need to use the sub () of the three parameters, sub (s1,s2,s), you can remember, the S S1 with S2 replacement, of course, S1 is a regular expression.
For example:

import reRE1="\d"s="h1e2l3l4o5 w6o7r8l9d"print(re.sub(RE1," ",s))

The result is

h e l l o  w o r l d

As you can see, the purpose of this code is to replace the numbers in the string "H1e2l3l4o5 w6o7r8l9d" with spaces.

There is also a function called SUBN (), which differs from sub () in that it returns a tuple in which the elements inside the tuple are replaced with the number of changes.
For example

import reRE1="\d"s="h1e2l3l4o5 w6o7r8l9d"print(re.subn(RE1," ",s))

The result is:

(‘h e l l o  w o r l d‘, 9)

2.split () function
For example, when you look at it, you know:

import reRE=":"print(re.split(RE,"a:b:c:d"))

The result is

[‘a‘, ‘b‘, ‘c‘, ‘d‘]

Similar to the split () function in the string, it splits the string and returns a list.




Finally, let's use regular expressions to solve the problem of the beginning.
Directly on the code:

Import Res=input ("Input the string:") re1="[^a-za-z\s]"S1=re.Sub(RE1," ", s)#将字符串s中所有除字母与空格的字符替换为空格Re2="\s+"S2=re.Sub(RE2," ", S1)#将多个连在一起的空格替换为一个空格re3=" "S3=re.split (RE3,S2)#按照空格进行分割, get a list of words that make up agains4=Set(S3)#用set () function to remove weightS5=list (S4)#把集合类型转换为列表类型S5.sort (Key=len,reverse=True) s6="" forIinchS5:#最后把列表变为字符串s6=s6+i+" "Print (S6)

Results:

thestring:somelocallocalsome

Quick Start for regular expressions

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.