Python matches Chinese regular expressions

Source: Internet
Author: User
A regular expression is a special character sequence that helps you easily check whether a string matches a certain pattern. Next, we will introduce Python to match regular expressions in Chinese. if you are interested, learn that regular expressions are not part of Python. Regular expressions are powerful tools used to process strings. they have their own unique syntax and an independent processing engine, which may not be as efficient as the built-in str method, but are very powerful. Thanks to this, in languages that provide regular expressions, the syntax of regular expressions is the same. The difference is that different programming languages support different syntaxes, unsupported syntax is usually not commonly used.

Introduction to Python regular expressions

A regular expression is a special character sequence that helps you easily check whether a string matches a certain pattern.

Python has added the re module since version 1.5. It provides the Perl-style regular expression mode.

The re module enables the Python language to have all the regular expression functions.

The compile function generates a regular expression object based on a mode string and optional flag parameters. This object has a series of methods for regular expression matching and replacement.

The re module also provides functions that are exactly the same as those of these methods. these functions use a pattern string as their first parameter.

All of the above are paving the way for the text. let's take a look at how python regular expressions match Chinese characters.

#-*-Coding: UTF-8-*-import redef findPart (regex, text, name): res = re. findall (regex, text) if res: print "There are % d % s parts: \ n" % (len (res), name) for r in res: print "\ t", r. encode ("utf8") printtext = "# who # helloworld # a Chinese x #" usample = unicode (text, 'utf8 ') findPart (u "# [\ w \ u2E80-\ u9FFF] + #", usample, "unicode chinese ")

Note:

Main non-English character ranges

2E80 ~ 33FFh: Symbol area of China, Japan, and South Korea. Reception of Kangxi Dictionary heads, China-Japan-South Korea auxiliary departments heads, phonetic symbols, Japanese kana, Korean notes, Chinese-Japan-South Korea symbols, punctuation marks, circled or including Rune numbers, months, and Japanese kana combination, unit, year, month, date, and time.

3400 ~ 4 DFFh: Japan and South Korea recognized the expansion of ideographic text area A, a total of 6,582 Chinese and Korean characters.

4E00 ~ 9 FFFh: Japan and South Korea recognized the ideographic text area, a total of 20,902 Chinese and Korean characters.

A000 ~ A4FFh: Yi text area, which contains the texts and roots of Yi people in southern China.

AC00 ~ D7FFh: a combination area of Korean and pinyin. it contains text in Korean notes.

F900 ~ FAFFh: Compatible with ideographic text area, a total of 302 Chinese and Korean characters.

FB00 ~ FFFDh: The text expression area, which contains the combination of Latin text, Hebrew, Arabic, Chinese-Japanese direct punctuation marks, small symbols, Halfwidth symbols, and fullwidth symbols.

(#! /Usr/bin/python3 #-*-coding: UTF-8-*-import remessage = u 'Heaven and man '. encode ('utf8') print (re. search (u'people '. encode ('utf8'), message ). example in group () interaction mode> import re> s = 'phone No. 010-87654321 '>>>>> r = re. compile (r' (\ d +)-(\ d +) ')> m = r. search (s) >>> m <_ sre. SRE_Match object at 0x010EE218>)

The above is a small series of Python regular expression matching Chinese methods, I hope to help you!

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.