Python crawler gets jsessionid login website

Source: Internet
Author: User
Tags tomcat server

When you use Python to collect data from some websites, you often encounter situations where you need to log in. In these cases, when using a browser such as Firefox to log in, the debugger (shortcut key F12) can see the log in when the Web page to the server to submit information, this part of the information can be extracted from the Python urllib2 library with a cookie to simulate login and then collect data, such as the following code:

#coding =utf-8import urllibimport urllib2import httplibimport cookieliburl = ' http://www.xxx.net/' cookie = cookielib. Cookiejar () cj=urllib2. Httpcookieprocessor (cookie) #设置登录参数, use the browser's debugger and other grab kit tool to get Postdata=urllib.urlencode ({' Jsessionid ': ' 1f616774d9548c1e8af12a65b470b663 ', ' username ': ' admin ', ' password ': ' admin '}) #生成请求request =urllib2. Request (URL, postdata) #设置代理request. Set_proxy (' xx.xx.xx.xx:xx ', ' http ') #登录opener =urllib2.build_opener (CJ) Urllib2.install_opener (opener) Html=opener.open (request) print html.read () #打开数据页面开始采集数据s = Urllib2.urlopen ('/HTTP/ Www.xx.net '). Read ()

It can be noted that the submitted data contains a Jsessionid parameter, Baidu will know, usually the Tomcat server generates a new session when the ID will be generated, and included in the login page in the head, such as:

650) this.width=650; "src=" Http://s3.51cto.com/wyfs02/M02/5B/29/wKiom1UAUKXyY5ueAAHUFoPW_H4458.jpg "title=" Jsessionid.png "alt=" Wkiom1uaukxyy5ueaahufopw_h4458.jpg "/>


Some servers can be repeatedly logged on using a fixed jsessionid, but some do not, and should be set by the server. For a fixed jsessionid can log in, the above code can be dealt with, but the dynamic changes need to get the Jsessionid of this session, and then submit the login:

#获取Tomcat服务器产生的JSESSIONIDrequest = Urllib2. Request (URL) Set_cookie = Urllib2.urlopen (request). info () [' set-cookie ']json_id = Set_cookie.split (';') [0] #JSESSIONID =3037dcdf69a6454fc525e38c41e6b611json_id = json_id.split (' = ') [ -1]print json_id



This article is from the "Free Self" blog, please be sure to keep this source http://hhuayuan.blog.51cto.com/1630327/1619513

Python crawler gets jsessionid login website

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.