What is the method for Python regular expressions to match Chinese

Source: Internet
Author: User
If you search for a Python regular expression on the Web, the search results are thousands, and it seems that everyone needs a lot of regular expression learning, so what is regular expression? A regular expression is a special sequence of characters that can help you easily check whether a string matches a pattern. The following is a combination of cases to give you a brief introduction of Python matching Chinese regular expression, the need for friends can learn. Regular expressions are not part of the Python programming language (http://www.maiziedu.com/course/python/). Regular expressions are powerful tools for working with strings, with their own unique syntax and an independent processing engine, which may not be as efficient as Str's own approach, but very powerful. Thanks to this, in the language that provides the regular expression, the syntax of the regular expression is the same, except that the number of grammars supported by different programming languages is different; but don't worry, the unsupported syntax is usually the less common part. Introduction to Python regular expressions a regular expression is a special sequence of characters that helps you conveniently check whether a string matches a pattern. Python has added the RE module since version 1.5, which provides a Perl-style regular expression pattern. The RE module enables the Python language to have all the regular expression functionality. The compile function generates a regular expression object based on a pattern string and an optional flag parameter. The object has a series of methods for regular expression matching and substitution. The RE module also provides functions that are fully consistent with these methods, which use a pattern string as their first parameter. The above is to give the text to pave the way, below to see how the python regular expression matching Chinese. #-*-Coding:utf-8-*-import redef findpart (regex, text, name): Res=re.findall (regex, text) if Res:print "There is%d%s p arts:\n "% (len (res), name) for R in Res:print" \ T ", R.encode (" UTF8 ") Printtext =" #who #helloworld#a Chinese x# "Usample=unicode ( Text, ' UTF8 ') findpart (U "#[\w\u2e80-\u9fff]+#", Usample, "Unicode Chinese") Note: several major non-English language character ranges 2E80~33FFH: CJK Symbol Area. Host Kangxi Radical, CJK Auxiliary radicals, phonetic symbols, Japanese kana,Korean notes, CJK symbols, punctuation, circled or rune numbers, month, and Japanese kana combination, unit, era name, month, date, time, etc. 3400~4DFFH: China, Japan and South Korea agree ideographs to expand a area, a total of 6,582 Japanese and Korean characters. 4E00~9FFFH: CJK Identity Ideographs District, a total of 20,902 Japanese and Korean characters. A000~a4ffh: Yi Writing area, accommodating the Chinese Southern Yi text and the word root. AC00~D7FFH: The Korean phonetic combination word area, to accommodate the text that is spelled with the Korean note. F900~FAFFH: CJK Compatible Ideographs area, with a total of 302 Chinese and Japanese Korean characters. FB00~FFFDH: Text representation area, accommodating combination Latin text, Hebrew, Arabic, CJK Straight punctuation, small symbol, half-width symbol, full-width (#!/usr/bin/python3#-*-coding:utf-8-*-import remessage = U ' Unity of heaven and Man '. Encode (' UTF8 ') print (Re.search (U ' encode (' UTF8 '), message). Group ()) Interactive mode example >>> import re> >> s= ' Phone No. 010-87654321 ' >>> >>> r=re.compile (R ' (\d+)-(\d+) ') >>> M=r.search (s) >>> M<_sre. Sre_match object at 0x010ee218>) above is a small series to introduce the Python regular expression matching Chinese method, we hope to help!
This thread was reviewed by Beckham on 2016-5-17 13:27
  • Related Article

    Contact Us

    The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

    If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

    A Free Trial That Lets You Build Big!

    Start building with 50+ products and up to 12 months usage for Elastic Compute Service

    • Sales Support

      1 on 1 presale consultation

    • After-Sales Support

      24/7 Technical Support 6 Free Tickets per Quarter Faster Response

    • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.