Difference between Python Xpath and Regex, pythonxpathregex
When crawling webpage information, we often need to use Regex or Xpath.Differences between the two:
RegexItself isText matching toolBecause it needs to be matched multiple times, it appliesShort and centralized information. It can be precisely matched and captured. HoweverLarge Capacity,Scattered contentHTML and other text, the efficiency will b
Second day, busy home some things, shun with people to crawl the watercress book top2501. Construct the URLs list urls=[' https://book.douban.com/top250?start={} '. Format (str (i) for I in range (0, 226, 25))]2. Module requests get webpage source code lxml Parse Web page XPath extract3. Extracting information4, can be encapsulated into a function here does not encapsulate the callPython code:#coding: Utf-8import sysreload (SYS) sys.setdefaultencoding
To find out the specific content in the Hrml file, you first need to observe what the content is and where it is, so you can find it.Assume that the HTML file name is: "1.html", the href attribute is all in the a tag.Regular version:# Coding:utf-8 Import Rewith Open ('1.html','r') as F: == Re.findall (R'href= "(. *?)" ' , data) for inch Result: Print eachXPath version:#Coding:utf-8 fromlxmlImportEtreewith Open ('1.html','R') as F:data=f.read () selector=etree. HTML (data) result= Select
XPath syntax:
Locating the root tag
/down Level Search
/text () Extract text content
/@xxx Extract Attribute Contents
Sample:Import requestsfrom lxml Import etreefor i in range (1): URL = "http://www.xxx.com/topic/tv/page/{}". Format (i)
req = Requests.get (URL). Content HTML = etree. HTML (req) # extract Text = Html.xpath ( '/html/body/section/div[1]/div/article[*]/header/h2/a/text () ') For all in text:
fromlxmlImportetreeImportRequestsurl='Https://movie.douban.com/chart'Headers= {"user-agent":"mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) applewebkit/537.36 (khtml, like Gecko) chrome/63.0.3239.84 safari/537.36"}response= Requests.get (url,headers=headers) Html_str=Response.content.decode ()#print (HTML_STR)#using etree to process dataHTML =etree. HTML (HTML_STR)#get the URL address of the movieUrl_list = Html.xpath ("//div[@class = ' indent ']/div/table//div[@class = ' pl2 ']/a/@href")#print
Common statements:1.starts-with (@ attribute name, same part of attribute character) use case: Start with the same characterselector = etree. HTML (HTML) content = Selector.xpath ('//div[start-with (@id, ' Test ')]/text () ') 2.string (.) use case: Label set labelselector = etree. HTML (HTML) data = Selector.xpath ('//div[@id = ' test3 ') ' [0] #先大后小info = Data.xpath (' string (.) ') Content = Info.replace (' \ n ', '). Replace (' , ') #替换换行符和tab Pytho
Python decorator use example and actual application example, python example
Test 1
Deco is running, but myfunc is not running
Copy codeThe Code is as follows:Def deco (func ):Print 'before func'Return func
Def myfunc ():Print 'myfunc () called'Myfunc = deco (myfunc)
Test 2
Reference below: http://www.jb51.net/article/57183.htmIndividual is also a little tidy up, modify some of these errors, these errors related to Scrapy version selection, personal use of Python2.7 + scrapy1.1Another example of the URL (http://www.dmoz.org/Computers/Programming/Languages/Python/Books/) is often inaccessible, people notice, do not think that the script has a problem.Nonsense said, the followin
the document. Therefore, the first instantiated selector must be related to the root node or the entire directory.
In Scrapy, there are four basic methods of selectors (click to view API documentation):
XPath (): Returns a series of selectors, each of which represents the node selected by an XPath parameter expression
CSS (): Returns a list of selectors, each select node that represents a CSS para
, selectors has four basic methods (click to view API documentation):XPath (): Returns a series of selectors, each of which represents an XPath parameter-expression-selected nodeCSS (): Returns a series of selectors, each of which represents a node selected by a CSS parameter expressionExtract (): Returns a Unicode string for the selected dataRe (): Returns a string of Unicode strings for content crawled us
Boost. python compilation and example, boost. python example
Welcome to reprint, reprint please indicate the original address: http://blog.csdn.net/majianfei1023/article/details/46781581
Linux compiled boost link: http://blog.csdn.net/majianfei1023/article/details/46761029
Yesterday, we compiled and installed boost.
The example explains how to call and define functions in Python, and the example explains python
Call the function:
#! /Usr/bin/env python3 #-*-coding: UTF-8-*-# function call >>> abs (100) 100 >>> abs (-110) 110 >>> abs (12.34) 12.34 >>> abs (1, 2) Traceback (most recent call last): File "
Define functions:
#! /Usr/bi
In this textbook, we assume that you have installed the scrapy. If you are not installed, you can refer to this installation guide.
We will use the Open Directory Project (DMOZ) As our example to crawl.
This textbook will take you through the following areas:
Create a new Scrapy project
Define the item that you will extract
Write a spider to crawl the site and extract items.
Write an item pipeline to store the proposed items
Scr
Python: Simple Method example for deleting the same element in the list, python example
This example describes how to delete the same elements in the list in Python. We will share this with you for your reference. The details are
Details about the Python mail sending example and the python mail sending example
Python needs two modules: smtplib and email. It is also because we can import these modules in our actual work that it makes processing tasks easier. Today, let's take a good look at sending em
Python combination mode and responsibility chain mode programming example, python example
Combination ModeWe regard the Composite mode as a complex attribute structure. In fact, there are basically three roles: trunk (defining some operations for leaf) and branches (many branches on the trunk) and the leaf (the object
Python decorator usage example summary, python decorator usage example
This document describes how to use the Python decorator. We will share this with you for your reference. The details are as follows:
I. What is the decorator?
The pyt
Python list and metadata definition and use operation example, python example
This document describes how to define and use Python lists and metadata groups. We will share this with you for your reference. The details are as follows:
# Coding = utf8print ''' the list and ele
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.