Python Development Foundation-DAY15 Regular Expression crawler applications, Configparser modules and subprocess modules

Source: Internet
Author: User
Tags stdin

Regular expression crawler applications (Campus beauty net)

1 ImportRequests2 ImportRe3 ImportJSON4 #定义函数返回网页的字符串信息5 defgetpage_str (URL):6page_string=requests.get (URL)7     returnPage_string.text8 9Hua_dic={}Ten defrun_re (URL): #爬取名字, school, and number of people you love  OneHua_str=getpage_str (URL) AHua_list=re.finditer ('<span class= "Price" > (? P<name>.*?) </span>.*?class= "Img_album_btn" > (? P<school>.*?) </a>.*?<em class.*?> (? P<like>\d+?) </em>', Hua_str,re. S) -      forNinchhua_list: #将名字, school, and number of favorites write a dictionary  -Hua_dic[n.group ('name')]=[n.group ('School'), N.group (' like')] the  - defURL (): #获取url地址 -      forIinchRange (0,43): -urls="http://www.xiaohuar.com/list-1-%s.html"%I +         yieldURLs - #执行爬取内容 +  forIinchURL (): A run_re (i) at  - Print(hua_dic) -  - #With open (' AAA ', ' W ', encoding= ' utf-8 ') as F: - #f.write (str (hua_dic)) -Data=json.dumps (hua_dic) #将爬取的字典进行序列化操作 in Print(data) -F=open ('Hua.json','a') to f.write (data) + #反序列化 - #f1=open (' Hua.json ', ' R ') the #new_data=json.load (F1) * #print (new_data)

Configparser Module

This module is suitable for Linux under the Conf configuration file format similar to the Windows INI file, can contain one or more sections (section), each section can have more than one parameter (key = value).

Such as:

    = * = 9 = yes = HG = 50022 = no

Sample Build File:

1 ImportConfigparser2 3Config =Configparser. Configparser () #定义一个对象4 5config["DEFAULT"] = {'Serveraliveinterval':' $', #定义DEFAULT节的键值对信息, the default section is a special section that contains the contents of the default section in other sections 6                       'Compression':'Yes',7                      'CompressionLevel':'9',8                      'ForwardX11':'Yes'9                      }Ten  Oneconfig['bitbucket.org'] = {'User':'HG'} #普通的节 A  -config['topsecret.server.com'] = {'Host Port':'5022','ForwardX11':'No'} #普通的节 -  theWith open ('Example.ini','W') as ConfigFile: #写入文件 -Config.write (ConfigFile)

Find File Contents:

1 ImportConfigparser2 3Config =Configparser. Configparser ()4 #--------------------------Find file contents, dictionary-based5 Print(Config.sections ())#  []6Config.read ('Example.ini')7 Print(Config.sections ())#[' bitbucket.org ', ' topsecret.server.com ']8 Print('bytebong.com' inchConfig#False9 Print('bitbucket.org' inchConfig#TrueTen  One Print(config['bitbucket.org']["User"])#HG A Print(config['DEFAULT']['Compression'])#Yes - Print(config['topsecret.server.com']['ForwardX11'])#No - Print(config['bitbucket.org'])#<Section:bitbucket.org> the  forKeyinchconfig['bitbucket.org']:#Note that a key with default defaults will -     Print(Key) - Print(Config.options ('bitbucket.org'))#same for loop, find all keys under ' bitbucket.org ' - Print(Config.items ('bitbucket.org'))#Find all key-value pairs under ' bitbucket.org ' + Print(Config.get ('bitbucket.org','Compression'))#the Yes Get method takes a deep nested value

Subprocess Module

When we need to call the system's command, the first thing to consider is the OS module. Use Os.system () and Os.popen () to operate. However, these two commands are too simple to perform complex operations, such as providing input to a running command or reading the output of a command, judging the running state of the command, managing the parallelism of multiple commands, and so on. In this case, the Popen command in Subprocess will be able to perform the operation we need effectively.

The Subprocess module allows a process to create a new child process, connect to the stdin/stdout/stderr of the child process through a pipeline, and get the return value of the child process. This module is only a single class: Popen. Simple Command
1 Importsubprocess2 #Create a new process that is not in sync with the main process if in win:3S=subprocess. Popen ('dir', shell=True)4 #Create a new process that is out of sync with the main process if in Linux:5S=subprocess. Popen ('ls')6S.wait ()#S is an instance object of Popen, meaning that it waits for the child process to run until it finishes running7 Print('ending ...')

Command with options (like win, Linux)

1 Import subprocess 2 subprocess. Popen ('ls-l', shell=True)3#subprocess. Popen ([' ls ', '-l '])

Controlling child processes

1 # Check child process status 2 # terminating a child process 3 # send a signal to a child process 4 # terminating a child process 5 s.pid: Child process number

Sub-process output flow control

You can change standard input, standard output, and standard errors when Popen () is established, and can take advantage of subprocess. Pipes connect the inputs and outputs of multiple sub-processes together to form a pipeline (pipe):

1 Importsubprocess2 #S1 = subprocess. Popen (["LS", "-l"], stdout=subprocess. PIPE)3 #print (S1.stdout.read ())4 #s2.communicate ()5S1 = subprocess. Popen (["Cat","/etc/passwd"], stdout=subprocess. PIPE)6S2 = subprocess. Popen (["grep","0:0"],stdin=s1.stdout, stdout=subprocess. PIPE)7out =s2.communicate ()8 Print(out)9 TenS=subprocess. Popen ("dir", shell=true,stdout=subprocess. PIPE) One Print(S.stdout.read (). Decode ("GBK"))

Ubprocess. Pipe actually provides a buffer for text flow. The S1 stdout the text out to the buffer, and then S2 's stdin reads the text from the pipe. The output text of the S2 is also stored in the pipe until the communicate () method reads the text from the pipe.
Note: Communicate () is a method of the Popen object that blocks the parent process until the child process finishes

Python Development Foundation-DAY15 Regular Expression crawler applications, Configparser modules and subprocess modules

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.