[multi-threaded] Amazon book ranking query

Source: Internet
Author: User

Version: Python3.6

Library: Atexit, Re, threading, Time, URLLIB3, BS4

Amazon has anti-crawler mechanisms, at least to add a message in the header, this example to join the UA, but still often do not, need to repeat the attempt.

# _*_coding:utf-8_*_# created by Zhang q.l.on 2018/5/7 0007from atexit ImportRegisterfrom RE ImportCompilefrom Threading ImportThreadfrom Time ImportCtimeimportUrllib3importBs4header ={' user-agent ': ' applewebkit/537.36 (khtml, like Gecko) '}headersample ={' user-agent ': ' mozilla/5.0 (Windows NT 6.1; Win64; x64) applewebkit/537.36 (khtml, like Gecko) chrome/63.0.3239.108 safari/537.36 '}regex = Compile (' # ([\d,]+) in Books ') url = ' https://item.jd.com/7081550.html 'Urltest = ' https://www.amazon.com//dp/'Urltest2 = ' https://www.amazon.com//dp/0132269937 'ISBNs ={' 0132269937 ': ' Core Python programming ', ' 0132356139 ': ' Python Web development with Django ', ' 0137143419 ': ' Python Fundamentals ' ,}def  HttpGet (ISBN): http = urllib3. Poolmanager () #首先产生一个PoolManager实例 urllib3.disable_warnings () #忽略https的无效证书警报 # page = http.request (' GET ', '%s '% Urltest2,headers=header) #发起GET请求 page = http.request (' GET ', '%s%s '% (URLTEST,ISBN), Headers=header) #发起GET请求 print ( page.status) #服务器返回的状态代码 # Print (page.data) #服务器返回的数据, returns the XML string # Print (Page.data.decode ()) #利用默认 ' utf-8 ' encoded format to decode res = BS4. BeautifulSoup (Page.data, ' lxml ') #利用lxml模块解码 res =  Str (res) # Print (res) return  Regex.findall (res) [0]def< span> _showranking (ISBN): print ('-%r ranked%s '%  (ISBNS[ISBN], HttpGet (ISBN))) def  _main (): Print (' at ', CTime (), ' on Amazon ... ' ) for ISBN in  isbns:thread (target=_showranking, Args=  (ISBN,)). Start () @ Registerdef  _atexit (): Print (' All do at: ' , CTime ()) If __name__ = = ' __main__ ' : _main ()   

Output Result:

D:\ installed software \python3.6\python3.exe C:/users/administrator/pycharmprojects/python core programming/multithreaded programming/amazon-Nothread.pyAt Tue  8 15:10:44 2018 on Amazon ... 200200200-' Python Fundamentals ' ranked 4,517,952-' Python Web development with Django ' ranked 1,243,459-' Core python Pro Gramming ' ranked 674,874 all doneAt:tue could  8 15:10:50 2018Process finished with exit code 0
    

Compared to a program that does not introduce threads, there are two main differences:

1. Due to concurrent processing mode, processing time is shorter;

2. The order in which the output is processed after the thread is introduced is output in the order of completion, and the single-threaded version is determined by the order of the variables, which is the key of the dictionary.

[multi-threaded] Amazon book ranking query

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.