1.Python fetch three-way comparison of hyperlinks (URLs) in pages (Htmlparser, pyquery, regular expressions)
2.Python provides the original string, as the name implies, preserves the meaning of the original character, does not escape the backslash and the character after the backslash, the way to declare the original string is to precede the string with ' R ' or ' R '. 3.findall can be directly used in the regular, regardless of escaping? 4.re. X Re. I5.? I?:-> match case 6. The most commonly used function in Python to get input from the keyboard is raw_input () and input (). It is best to use the former, which is returned as a string. 7.print printing output can be ' preceded by 8.urlopen after the read, the second is the STR type, the Open option plus the timeout9.except error type is best Unified exception to avoid accidental errors. 10.Python error ' ASCII ' codec can ' t decode byte 0xe5 in position 0:ordinal isn't in range (128), try decode, such as cannot write attempt encode into a byte stream. 11.Python fetches the 3 method comparison (Htmlparser, pyquery, regular expression) ==>http://www.myexception.cn/html-css/639814 of hyperlinks (URLs) in the page. The Html12.python determines whether NULL is available if XX is None or if not XX, which is applied more broadly and with better results. 13. Read URL read by line remove \n for lines in File.readlines (): Line=line.strip (' \ n ') 14. For Urlsplit, Urlparse, Urlunparse Detailed Description: http://www.cnblogs.com/huangcong/archive/2011/08/31/2160633.html http:// hi.baidu.com/springemp/item/64613c7457731517d0dcb3a7 15. Get the page status code, need requests module http://www.oschina.net/code/ snippet_862981_2303216.local variable ' xx ' referenced before assignment requires global 17. For URLs unchanged, content jumps, that is, the kind of anti-scanning, you can useUrllib Direct Open,catch error can be. ex:http://segmentfault.com/q/1010000000095769 nginx Configuration 18.urllib2.geturl () can get the final page after the jump, 302? 19. How to get the page status code:
F=urllib.urlopen ("xxxxxx")PrintF.getcode ()==========================ImportRequestsdefgetstatuscode (URL): R= Requests.get (URL, allow_redirects =False)returnR.status_code#The requests library used in 2.7 or 2.6 doesn't seem to be there.===========================Conn= Httplib. Httpconnection ("192.168.1.212"); #You can also use get to start a data submissionConn.request (method="POST", url="/newsadd.asp?action=newnew", body=params,headers=headers); #returns the processed dataResponse =Conn.getresponse (); #determine if commit is successfulifResponse.Status = = 302:
20.httplib request usage, GetResponse () for returning data 21.get_header probing for the existence of a remote file may require a closer look at whether or not to take empty
Python Learning Notes (iv) "Turn"