Python crawler crawls the Securities Star website

Last Update:2017-08-07 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Weekend boring, have some fun ...

#coding:utf-8import requestsfrom bs4 import beautifulsoupimport randomimport  time# Crawl Required Content user_agent = ["mozilla/5.0  (windows nt 10.0; wow64)",  ' Mozilla/ 5.0  (windows nt 6.3; wow64) ',                ' mozilla/5.0  (windows nt 6.1)  AppleWebKit/537.11  (khtml,  like gecko)  chrome/23.0.1271.64 safari/537.11 ',                ' mozilla/5.0  (windows nt 6.3; wow64;  trident/7.0; rv:11.0)  like gecko ',                ' mozilla/5.0  (windows nt 5.1)  AppleWebKit/537.36  (khtml,  like gecko)  chrome/28.0.1500.95 safari/537.36 ',                ' mozilla/5.0  (windows nt 6.1; wow64; trident/7.0; slcc2; .net  clr 2.0.50727; .net clr 3.5.30729; .net clr 3.0.30729; media  center pc 6.0; . net4.0c; rv:11.0)  like gecko) ',                ' mozilla/5.0  (windows; u; windows nt 5.2)  gecko/ 2008070208 firefox/3.0.1 ',                ' mozilla/5.0  (windows; u; windows nt 5.1)  gecko/20070309 firefox/ 2.0.0.3 ',               ' Mozilla/5.0   (windows; u; windows nt 5.1)  gecko/20070803 firefox/1.5.0.12 ',                ' opera/9.27  (windows nt  5.2;&NBSP;U;&NBSP;ZH-CN) ',               ' mozilla/5.0  (macintosh; ppc mac os x; u; en)  Opera 8.0 ',                ' opera/8.0  (Macintosh;  ppc mac os x; u; en) ',                ' mozilla/5.0  (windows; u; windows nt 5.1; en-us;  rv:1.8.1.12)  gecko/20080219 firefox/2.0.0.12 navigator/9.0.0.6 ',                ' mozilla/4.0  (compatible; msie 8.0;  windows nt 6.1; win64; x64; trident/4.0) ',                ' mozilla/4.0  (compatible; msie 8.0;  Windows nt 6.1; trident/4.0) ',               ' Mozilla/5.0   (compatible; msie 10.0; windows nt 6.1; wow64; trident/6.0;  slcc2; .net clr 2.0.50727; .net clr 3.5.30729; .net clr 3.0.30729 ;  media center pc 6.0; infopath.2; . net4.0c; . net4.0e) ',               ' Mozilla/5.0   (WINDOWS&NBSP;NT&NBSP;6.1;&NBSP;WOW64)  AppleWebKit/537.1  (Khtml, like gecko)   maxthon/4.0.6.2000 chrome/26.0.1410.43 safari/537.1  ',                ' mozilla/5.0  (compatible; msie 10.0;  windows nt 6.1; wow64; trident/6.0; slcc2; .net clr 2.0.50727; . net clr 3.5.30729; .net clr 3.0.30729; media center pc 6.0; infopath.2; . net4.0c; . net4.0e; qqbrowser/7.3.9825.400) ',                ' mozilla/5.0  (windows nt 6.1; wow64; rv:21.0)  gecko/20100101  Firefox/21.0  ',               ' mozilla/5.0  (windows nt 6.1; wow64)  AppleWebKit/537.1  (khtml, like  Gecko)  chrome/21.0.1180.92 safari/537.1 lbbrowser ',                ' mozilla/5.0  (compatible; msie 10.0; windows  nt 6.1; wow64; trident/6.0; bidubrowser 2.x) ',                ' mozilla/5.0  (WINDOWS&NBSP;NT&NBSP;6.1;&NBSP;WOW64)  AppleWebKit/536.11  (KHTML,&NBSP;LIKE&NBSP;GECKO)  chrome/20.0.1132.11 taobrowser/3.0 safari/536.11 ']moduledic={' ranklist_a ': 111, ' Ranklist_b ' : 4,}for module in moduledic:    for page in range (1, Moduledic[module]):         url= ' http://quote.stockstar.com/stock/' +str (module) + ' _3_1_ ' +str (page) + '. html '         try:             global response                    response=requests.post (url,  headers={"User-agent": Random.choice (User_agent)})   #定制请求头          except :            print  "Continue"          response.encoding =  ' gb2312 '          html = response.text         soup = beautifulsoup (HTML,   ' lxml ')         time.sleep (random.randrange)           #每抓一页随机休眠几秒, values can be changed according to the actual situation          Datalist=[]        for i in  soup.find_all (' tr '):             for j in i.find_all (' TD '):                datalist.append ( j.string)             try:                 data = datalist[0] +   "     "  + datalist[1] +      " +  datalist[2] +  "     " + datalist[3] +"      " + datalist[4"  +  "    "  + datalist[5] +      " +  datalist[6] + "    "  + datalist[7]+  "    "  +  datalist[8] + "    "  + datalist[9] + "    "  + datalist[10] +  "    "  + datalist[11]                 print data             except:                 continue             datalist=[]

Part:

650) this.width=650; "Src=" https://s1.51cto.com/wyfs02/M01/9D/D5/wKioL1mHIZTCzPNoAAGZqOmLUQM367.png-wh_500x0-wm_ 3-wmp_4-s_2482180387.png "title=" Securities star. png "alt=" wkiol1mhiztczpnoaagzqomluqm367.png-wh_50 "/>

Originally want to save in the database, the latter used for data analysis, suddenly not interested in the first.

Just want to say: Most of the site anti-crawler strategy basically did not do, if I want to, may also be a day or two can be the whole site to climb down, the above also took half an hour. The data is not money? Is it the equivalent of an indirect de-library to climb down completely?

This article is from the "Shangwei Super" blog, please make sure to keep this source http://9399369.blog.51cto.com/9389369/1954076

Python crawler crawls the Securities Star website

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Python crawler crawls the Securities Star website

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Python crawler crawls the Securities Star website

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support