Tianluo website-A Preliminary Exploration of Python CrawlerPrepare the Python Environment
We use Python2.7 for development. Pay attention to configuring environment variables.IDE
We use Pycharm for development, which is the same as the well-known Android Studio and IDEA-Jet Brains.
We have two shameless posts about cracking:
Username: yueting3527 registration cod
"Always ask for you but never say thank you ~ ~ ~", in the blog park and the above to absorb a lot of knowledge, will also grow here, here is very good, thank you blog Park and know, so today also put their own in the project during the things to share, hope to help friends ....Say less nonsense, let's go~~~~!Demand:Project needs to do a dating site, the main technology has nginx, server cluster, Redis cache, MySQL master-slave replication, amoeba read and write separation, etc., I mainly use Ra
Python study note 23: setting up a simple blog website with Django (1)
1. Create a project command:
Django-admin startproject mysite
# Some need to enter:
Django-admin.py startproject mysite
You will find that a folder mysite is generated under the current directory, and its structure is:
Mysite/Manage. pyMysite/_ Init. pySettings. pyUrls. pyWsgi. py
Where:
Manage. py: a command line tool that can call
Download the entire python website.
Download the entire website tool using python.
The core process is simple:
1. Enter the website address
2. url to get the response content.
3. According to the http packet header of the response, if the type is html, the process start
Effect:I did not make folders to save, because the skin and heroes are one by one correspondence, this looks more convenient to operate.After the download of the skin, a JSON file is automatically downloaded from the website, so a new hero, new skin software will be updated automatically. HighBut there are some new skin official website also did not provide data, find new skins to download the selection, cl
This article mainly introduces how to install third-party libraries of non-PyPI official website by pip in Python. The latest version of pip (version 1.5 or later) is out of security considerations, pip does not allow installation of non-PyPI URLs. This article provides two solutions. you can refer to the following three methods to install a non-built-in python m
This article mainly introduces the example of implementing log analysis for apahce website using python. if you need it, you can refer to the example of maintaining the script. it is written in disorder, just as an example, demonstrate how to quickly use the tool to quickly achieve the goal:
Application: shell and python data interaction, data capturing, and code
I heard that Python is very convenient to do web crawler, just this few days units also have such needs, the need to visit XX website to download some documents, so that their own personal test, the effect is good.
In this example, a website that is logged in needs to provide a username, password, and verification code that uses Python's urllib2 to log in direct
feel Liao Xuefeng's official website http://www.liaoxuefeng.com/inside the tutorial is good, so study, the need to review the excerpt. The following mainly for their own review, details please log on to the official website of Liao Xuefeng to view. Python has a built-inthe map () and reduce () functions. Let's look at map first.The map () function receives two pa
The simple introduction to urllib2 is mentioned earlier. The following describes how to use urllib2.
1. Proxy Settings
By default, urllib2 uses the environment variable http_proxy to set HTTP proxy.
If you want to explicitly control the proxy in the program without being affected by environment variables, you can use the proxy.
Create test14 to implement a simple proxy Demo:
import urllib2enable_proxy = Trueproxy_handler = urllib2.ProxyHandler({"http" : 'http://some-proxy.com:8080'})null_proxy
! ") Break Else: Logging.error ("Questionnaire Star Login failed! ") except: Logging.error ("exception, Questionnaire star Login failed! ") Time.sleep (1)#the length of the wait time at the end of each cycle, you can define your own deftest_name (self): self. User_login ('18392868125','855028741616') self. Check_user_login ()if __name__=="__main__": Unittest.main ()Run Results log print form: [2017-05-05 16:10:59,174] [line:48] [INFO]
When you use Python to collect data from some websites, you often encounter situations where you need to log in. In these cases, when using a browser such as Firefox to log in, the debugger (shortcut key F12) can see the log in when the Web page to the server to submit information, this part of the information can be extracted from the Python urllib2 library with a cookie to simulate login and then collect
Cookies, which are data stored on the user's local terminal (usually encrypted) by certain websites in order to identify users and perform session tracking.For example, some sites need to log in to access a page, before you log in, you want to crawl a page content is not allowed. Then we can use the URLLIB2 library to save our registered cookies, and then crawl the other pages to achieve the goal.[Email protected]~]# cat cscook.py#!/usr/bin/python#-*-
','Bob','Tracy')Print('classmates =', classmates)Print('classmates[0]=', Classmates[0])Print('classmates[1]=', classmates[1])Print('classmates[2]=', classmates[2])Print('Classmates[-1]', classmates[-1])#classmates[0] = ' Adam ' # cannot modify element#print (' classmates1 ', classmates)T= (1, 2)#defining a tuple element must determinePrint('t1=', T) t= ()#define an empty tuplePrint('t2=', T) t= (1)#This is 1 numbers, and the definition is not just one element.Print('t3=', T) t= (1,)#define only
This Python module can be used to assume a lump of MD as a website. Reference: http://www.mkdocs.org/Installation configurationpip install mkdocsmkdocs new my-projectcd my-project只需要将一坨md丢到相应的目录启动服务即可.mkdocs serveAccess effectCan also diy his theme Theme:readthedocs\I like this theme mkdocs-materialReference siteInstall the latest version of Material with pip:pip install mkdocs-materialAppend the following
1 PrefaceThis small program is used to crawl novels of the novel website, the general pirate novel sites are very good crawlBecause this kind of website basically has no anti-creeping mechanism, so can crawl directlyThis applet takes the website http://www.126shu.com/15/download full-time Mage as an example2.requests LibraryDocument: http://www.python-requests.or
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.