《深入Python》學習筆記

最後更新：2018-12-07 來源：互聯網

上載者：User

創建阿里雲帳戶，並獲得超過 40 款產品的免費試用版；而企業帳戶則可以享有總值 $1200 的免費試用版。立即註冊！

《深入Python(Dive Into Python)》http://woodpecker.org.cn/diveintopython/toc/index.html

1～6章大部分內容都在《簡明Python教程》中有介紹。

未介紹之處有：

私人函數，名稱以兩個底線開頭的函數都是似有函數

原始字串：在字串前面加上r，則此字串裡的\不需要寫成\，比如'\b' 可以寫成 r'\b'。Regex要用原始字串，否則運算式會難以閱讀。

第7章 Regex

import re

\b 字元邊界 '\bROAD\b' 表示包含單獨的詞WORD

$ 字串末尾 '\bROAD$' 表示包含位於句末的詞WORD

^ 字串開始

樣本，把字串中的'ROAD'替換為'RD.'，

s = '100 BROAD ROAD APT. 3'; re.sub(r'bROAD\b', 'RD.', s)

結果：'100 BROAD RD. APT. 3'

字元後面的? 表示此字元出現0或1次。如 'M?M?M?$' 可匹配 ' ', 'M', 'MM', 'MMM'。re.search('M?M?M?$', 'MMM')

字元後面的+ 表示此字元出現1次或多次。

{ }定義字元出現次數: 'M?M?M?$'可以寫成'M{0,3}$'

| 表示或者如：'A|B'

樣本：確認羅馬數字 pattern = '^M{0,3}(CM|CD|D?C{0,3})(XC|XL|L?X{0,3})(IX|IV|V?I{0,3})$'

\d 任何單個數字

\D 任何非數字字元

Python預設的Regex都是緊湊型的，不容易閱讀，可以寫成下面這種鬆散型

pattern = ””“

<span style="color: #222222; font-family: 'Book Antiqua', Georgia, Palatino, Times, 'Times New Roman', serif; line-height: 23px; font-size: medium;"><span class="userinput"><span class="pystring" style="background-color: white; color: olive;">    ^                   # beginning of stringM{0,3}              # thousands - 0 to 3 M's(CM|CD|D?C{0,3})    # hundreds - 900 (CM), 400 (CD), 0-300 (0 to 3 C's),#            or 500-800 (D, followed by 0 to 3 C's)(XC|XL|L?X{0,3})    # tens - 90 (XC), 40 (XL), 0-30 (0 to 3 X's),#        or 50-80 (L, followed by 0 to 3 X's)(IX|IV|V?I{0,3})    # ones - 9 (IX), 4 (IV), 0-3 (0 to 3 I's),#        or 5-8 (V, followed by 0 to 3 I's)$                   # end of string"""</span></span></span>

使用鬆散型時，必須多加一個參數，如 re.search(pattern, 'MMMDCCCLXXXVIII', re.VERBOSE)

樣本：解析電話號碼

<span style="color: #222222; font-family: 'Book Antiqua', Georgia, Palatino, Times, 'Times New Roman', serif; line-height: 23px; font-size: medium;"><tt class="prompt">&gt;&gt;&gt; </tt><span class="userinput">phonePattern = re.compile(r<span class="pystring" style="background-color: white; color: olive;">'''# don't match beginning of string, number can start anywhere(\d{3})     # area code is 3 digits (e.g. '800')\D*         # optional separator is any number of non-digits(\d{3})     # trunk is 3 digits (e.g. '555')\D*         # optional separator(\d{4})     # rest of number is 4 digits (e.g. '1212')\D*         # optional separator(\d*)       # extension is optional and can be any number of digits$           # end of string'''</span>, re.VERBOSE)</span></span>

<span style="color: #222222; font-family: 'Book Antiqua', Georgia, Palatino, Times, 'Times New Roman', serif; line-height: 23px; font-size: medium;"><tt class="prompt">&gt;&gt;&gt; </tt><span class="userinput">phonePattern.search(<span class="pystring" style="background-color: white; color: olive;">'work 1-(800) 555.1212 #1234'</span>).groups()</span>        <a name="re.phone.7.1" id="re.phone.7.1"></a><img src="http://woodpecker.org.cn/diveintopython/images/callouts/1.png" alt="1" border="0" width="12" height="12" /><span class="computeroutput" style="background-color: white; color: teal;">('800', '555', '1212', '1234')</span></span>

注意，(x)中的括弧，表示一個記憶組(remembered group)。只有加上括弧，才能用groups()獲得它的值。

第8章 HTML處理（解析HTML檔案，抓取資料）

通過 urllib 下載html內容，通過sgmllib(SGMLParser) 分析html檔案

注意from module import和import module不同。

import module 保留模組的命名空間，要使用模組名訪問內建函式或屬性。

from module import 把模組中制定的函數和屬性匯入到自己的命名空間，可以直接使用，而不需要加上模組名。

from xml.dom import minidom xml是包，也就是目錄，此目錄中包含特殊檔案__init__.py

使用dictionary格式化字串 '%(key)s'

解析XML

使用Python標準庫的ElementTree

xml.etree.ElementTree as etree

tree = etree.parse('aaa.xml')

root = tree.getroot()

root.tag

for child in root:

文字編碼UNICODE

此文網路編程似乎比較老舊，放棄。

Python, 學習筆記

本文章原先以中文撰寫並發佈於 aliyun.com，亦設英文版本，僅作資訊用途。本網站不對文章的準確性，完整性或可靠性或其任何翻譯作出任何明示或暗示的陳述或保證。如對該文章有任何疑慮或投訴，請傳送電郵至 info-contact@alibabacloud.com 並提供相關疑慮或投訴的詳細說明。職員會於 5 個工作天內與您聯絡，一經驗證之後，即會刪除該侵權內容。

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More