Version: Python3.6
Library: Atexit, Re, threading, Time, URLLIB3, BS4
Amazon has anti-crawler mechanisms, at least to add a message in the header, this example to join the UA, but still often do not, need to repeat the attempt.
# _*_coding:utf-8_*_# created by Zhang q.l.on 2018/5/7 0007from atexit ImportRegisterfrom RE ImportCompilefrom Threading ImportThreadfrom Time ImportCtimeimportUrllib3importBs4header ={' user-agent ': ' applewebkit/537.36 (khtml, like Gecko) '}headersample ={' user-agent ': ' mozilla/5.0 (Windows NT 6.1; Win64; x64) applewebkit/537.36 (khtml, like Gecko) chrome/63.0.3239.108 safari/537.36 '}regex = Compile (' # ([\d,]+) in Books ') url = ' https://item.jd.com/7081550.html 'Urltest = ' https://www.amazon.com//dp/'Urltest2 = ' https://www.amazon.com//dp/0132269937 'ISBNs ={' 0132269937 ': ' Core Python programming ', ' 0132356139 ': ' Python Web development with Django ', ' 0137143419 ': ' Python Fundamentals ' ,}def HttpGet (ISBN): http = urllib3. Poolmanager () #首先产生一个PoolManager实例 urllib3.disable_warnings () #忽略https的无效证书警报 # page = http.request (' GET ', '%s '% Urltest2,headers=header) #发起GET请求 page = http.request (' GET ', '%s%s '% (URLTEST,ISBN), Headers=header) #发起GET请求 print ( page.status) #服务器返回的状态代码 # Print (page.data) #服务器返回的数据, returns the XML string # Print (Page.data.decode ()) #利用默认 ' utf-8 ' encoded format to decode res = BS4. BeautifulSoup (Page.data, ' lxml ') #利用lxml模块解码 res = Str (res) # Print (res) return Regex.findall (res) [0]def< span> _showranking (ISBN): print ('-%r ranked%s '% (ISBNS[ISBN], HttpGet (ISBN))) def _main (): Print (' at ', CTime (), ' on Amazon ... ' ) for ISBN in isbns:thread (target=_showranking, Args= (ISBN,)). Start () @ Registerdef _atexit (): Print (' All do at: ' , CTime ()) If __name__ = = ' __main__ ' : _main ()
Output Result:
D:\ installed software \python3.6\python3.exe C:/users/administrator/pycharmprojects/python core programming/multithreaded programming/amazon-Nothread.pyAt Tue 8 15:10:44 2018 on Amazon ... 200200200-' Python Fundamentals ' ranked 4,517,952-' Python Web development with Django ' ranked 1,243,459-' Core python Pro Gramming ' ranked 674,874 all doneAt:tue could 8 15:10:50 2018Process finished with exit code 0
Compared to a program that does not introduce threads, there are two main differences:
1. Due to concurrent processing mode, processing time is shorter;
2. The order in which the output is processed after the thread is introduced is output in the order of completion, and the single-threaded version is determined by the order of the variables, which is the key of the dictionary.
[multi-threaded] Amazon book ranking query