Python-based----regular expression crawler applications, Configparser modules and subprocess modules

Source: Internet
Author: User

Regular Expression crawler applications (Campus beauty net)
 1 Import requests 2 import RE 3 import JSON 4 #定义函数返回网页的字符串信息 5 def getpage_str (URL): 6 page_string=requests.get (URL) 7 return Page_string.text 8 9 hua_dic={}10 def run_re (URL): #爬取名字, school and favorite number of hua_str=getpage_str (URL) Hua _list=re.finditer (' <span class= ' price ' > (? P<name>.*?) </span>.*?class= "Img_album_btn" > (? P<school>.*?) </a>.*?<em class.*?> (? P<like>\d+?) </em> ', Hua_str,re. S) for N in Hua_list: #将名字, school, and number of favorites write dictionary hua_dic[n.group (' name ')]=[n.group (' School '), N.group (' like ')]15 def URL (): #获取url地址17 for I in Range (0,43): urls= "http://www.xiaohuar.com/list-1-%s.html"%i19 y Ield URLS20 #执行爬取内容21 for i in URL (): Run_re (i) print (hua_dic) * with open (' AAA ', ' W ', encoding= ' utf-8 ') as F:27 # f.write (str (hua_dic)) Data=json.dumps (hua_dic) #将爬取的字典进行序列化操作29 print (data) f=open (' Hua.json ', ' a ') F.wri TE (data) #反序列化33 # f1=open (' Hua.json ', ' R ') # New_data=json.loaD (F1) # Print (New_data) 

Configparser Module

This module is suitable for Linux under the Conf configuration file format similar to the Windows INI file, can contain one or more sections (section), each section can have more than one parameter (key = value).

Such as:

[DEFAULT] Serveraliveinterval = 45Compression = Yescompressionlevel = 9forwardx11 = Yes  [bitbucket.org]user = HG  [ Topsecret.server.com]port = 50022forwardx11 = no

Sample Build File:

1 Import configparser 2  3 config = configparser. Configparser () #定义一个对象 4  5 config["Default"] = {' Serveraliveinterval ': ' $ ', #定义DEFAULT节的键值对信息, the DEFAULT section is a special section, The other sections contain the contents of the default section 6                       ' Compression ': ' Yes ', 7                      ' compressionlevel ': ' 9 ', 8                      ' ForwardX11 ': ' Yes ' 9                      }10 11 config[' bitbucket.org ' = {' User ': ' HG '} #普通的节12 config[' topsecret.server.com '] = {' Host Port ': ' 5022 ', ' ForwardX11 ': ' No '} #普通的节14 with open (' Example.ini ', ' W ') as ConfigFile: #写入文件16     config.write (configfile)

Find File Contents:

1 Import configparser 2  3 config = configparser. Configparser () 4 #--------------------------Find the contents of the file, the dictionary-based shape 5 print (Config.sections ())        #  [] 6 config.read (' Example.ini ') 7 print (Config.sections ())        #   [' bitbucket.org ', ' topsecret.server.com '] 8 print (' bytebong.com ') in config) # False 9 print (' bitbucket.org ' in config) # True10 print (config[' bitbucket.org ' ["User"])  # HG12 Print ( config[' DEFAULT ' [' Compression ']) #yes13 print (config[' topsecret.server.com ' [' ForwardX11 '])  #no14 print ( config[' bitbucket.org ')          #<section:bitbucket.org>15 for key in config[' bitbucket.org ']:     # Note, The default defaults will be defaulted by the key of print (     key) print (Config.options (' bitbucket.org '))  # with the For loop, find ' bitbucket.org ' Under All key print (Config.items (' bitbucket.org '))    #找到 ' bitbucket.org ' under all key value pairs (Config.get (' bitbucket.org ', ' Compression ') # Yes       get method takes deep nested values

Subprocess Module

When we need to call the system's command, the first thing to consider is the OS module. Use Os.system () and Os.popen () to operate. However, these two commands are too simple to perform complex operations, such as providing input to a running command or reading the output of a command, judging the running state of the command, managing the parallelism of multiple commands, and so on. In this case, the Popen command in Subprocess will be able to perform the operation we need effectively.

The Subprocess module allows a process to create a new child process, connect to the stdin/stdout/stderr of the child process through a pipeline, and get the return value of the child process. This module is only a single class: Popen. Simple Command
1 Import Subprocess2 #  Creates a new process that is out of sync with the main process  if in Win:3 s=subprocess. Popen (' dir ', shell=true) 4 #  Creates a new process that is not synchronized with the main process  if in Linux:5 s=subprocess. Popen (' ls ') 6 s.wait ()                  # is an instance object of Popen, meaning to wait for the child process to run until it runs 7 print (' Ending ... ')     

Command with options (like win, Linux)

1 Import subprocess2 subprocess. Popen (' ls-l ', shell=true) 3 #subprocess. Popen ([' ls ', '-l '])

Controlling child processes

1 s.poll () # Checking child process Status 2 S.kill () # terminating child process 3 s.send_signal () # Sending signal to child process 4 s.terminate () # terminating child process 5 S.pid: Child process number

Sub-process output flow control

You can change standard input, standard output, and standard errors when Popen () is established, and can take advantage of subprocess. Pipes connect the inputs and outputs of multiple sub-processes together to form a pipeline (pipe):

1 Import subprocess 2 # s1 = subprocess. Popen (["LS", "-l"], stdout=subprocess. PIPE) 3 # Print (S1.stdout.read ()) 4 #s2. Communicate () 5 S1 = subprocess. Popen (["Cat", "/etc/passwd"], stdout=subprocess. PIPE) 6 s2 = subprocess. Popen (["grep", "0:0"],stdin=s1.stdout, stdout=subprocess. PIPE) 7 out = S2.communicate () 8 print (out) 9 s=subprocess. Popen ("dir", shell=true,stdout=subprocess. PIPE) Print (S.stdout.read (). Decode ("GBK"))

Ubprocess. Pipe actually provides a buffer for text flow. The S1 stdout the text out to the buffer, and then S2 's stdin reads the text from the pipe. The output text of the S2 is also stored in the pipe until the communicate () method reads the text from the pipe.
Note: Communicate () is a method of the Popen object that blocks the parent process until the child process finishes

Python-based----regular expression crawler applications, Configparser modules and subprocess modules

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.