Python web crawler gets Taobao commodity price __python

Source: Internet
Author: User
Tags python web crawler

1, Python web crawler to get Taobao commodity price code:

#-*-coding:utf-8-*-' Created on March 17, 2017 @author: Lavi ' "Import requests from BS4 import BeautifulSoup import BS4 I Mport re def gethtmltext (URL): try:r = Requests.get (url,timeout=30) r.raise_for_status R.enco Ding = r.apparent_encoding return r.text except:return "" Def Parserpage (goodslist,html): TLT = R E.findall (R ' \ "view_price\" \:\ "[\d\.]  *\ "', html" PLT = Re.findall (R ' \ "Raw_title\" \:\ ". *?\" ', HTML) #添加问号使用最小匹配的 for I in range (len (TLT)): title =
        The eval (tlt[i].split (': ') [1]) #eval () function is very powerful and can evaluate string str as a valid expression and return the result of the calculation price = eval (plt[i].split (': ') [1]) Goodslist.append ([Title,price]) def printPage (goodslist): tplt= "{: 6}\t{:8}\t{:16}" Print (Tplt.format ("Serial number", "Price", "product
    
Name ")) for I in Range (len (goodslist)): goods = goodslist[i] Print (Tplt.format (i+1,goods[0],goods[1))
    def main (): goods = "schoolbag" depth = 2;
url = "https://s.taobao.com/search?q=" goodslist = []    For I in range (depth): HTML = Gethtmltext (url+goods+ "&s=" +str (i*44)) Parserpage (goodslist, HTML) PrintPage (goodslist) Main ()
2. The eval () function uses the extension

The eval () function is powerful, as the official demo explains: String STR is evaluated as a valid expression and returns the result of the calculation. So, combining math is good for a calculator. In addition, list,tuple,dict and string can be converted to each other.

A = "[[1,2], [3,4], [5,6], [7,8], [9,0]]"
B = eval (a)
b
out[3]: [[1, 2], [3, 4], [5, 6], [7, 8], [9, 0]]
t Ype (b)
out[4]: list
a = "{1: ' A ', 2: ' B '}"
B = eval (a)
b
Out[7]: {1: ' A ', 2: ' B '}
type (b)
Ou T[8]: dict
a = "([1,2], [3,4], [5,6], [7,8], (9,0))"
B = eval (a)
b
out[11]: ([1, 2], [3, 4], [5, 6], [ 7, 8], (9, 0))
The eval () function can be seen as powerful, but security is also a fatal disadvantage. Think about this use environment: requires the user to enter an expression and evaluate it. If the user maliciously enters:

__import__ (' OS '). System (' dir ')
Then, after Eval (), you will find that the current directory files are present in front of the user. Then continue typing:
Open (' filename '). Read ()
The code has been read by people. Get finished, a delete command, the file disappears.

3. Minimum matching of Python regular expressions

Python's regular expression re module uses greedy matching, or maximum matching, by default. But we also have the need to use the minimum match when the following is to see what is the bottom of the match, and how to achieve a minimum match:

The shortest match applies to: if there is a piece of text, you just want to match the shortest possible, not the longest.
Example
For example, there is a section of HTML fragment, ' \this is the label\\the second label\ ', how to match the contents of each a tag, the shortest and the longest difference is below.
Code

Import re
>>> str = ' <a>this is-a-label</a><a>the second label</a> '

> >> Print Re.findall (R ' <a> (. *?) </a> ', str '  # shortest match
[' is ', ' ' second label ']

>>> print re.findall (R ' <a& gt; (. *) </a> ', str) [' This is the '
label</a><a>the second label ']
explain
example, the Pattern R ' (. *?) ' The intent is to match the contained text, but the * operator is greedy in the regular expression, so the matching operation finds the longest possible.
But after the * operator Plus. operator so that the match becomes a non greedy pattern, resulting in a shortest match.


Resources:

1, Chinese university Moocpython web crawler and Information extraction course

2, HTTP://WWW.TUICOOL.COM/ARTICLES/BBVNQBQ

3, http://www.cnblogs.com/jhao/p/5989241.html

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.