2017.07.26 python web crawler scrapy crawler Frame

Source: Internet
Author: User
Tags xpath python web crawler xpath contains

1.windows under Install scrapy:cmd command line: CD to Python's scripts directory, then run pip install command

And then there's scrapy under the Pycharmide:

Run the scrapy command under CMD, Error!!! :

Workaround:

Create a new sitecustomize.py under the Python lib\site-packages folder:

Import sys sys.setdefaultencoding (' gb2312 ')

Run scrapy again under CMD, success:

2.Scrapy selector and XPath and CSS: Select a part of an HTML file by using a specific XPath or CSS expression

(1) XPath is a language used to select nodes in an XML file, or it can be used in HTML, a language for finding information in an XML document , and XPath can be used to traverse elements and attributes in an XML document.

XPath contains more than 100 built-in functions for string values, numeric, date and time comparisons, node and QName processing, sequence processing, logical values, and more

(2) In XPath, there are 7 types of nodes: elements, attributes, text, namespaces, processing instructions, annotations, and document nodes (or root nodes). The XML document is treated as a node tree, and the root of the tree is called the document node or root node

To make a simple XML file:

<superhero>
<class>
<name lang= "en" >tony Stark </name>
<alias>iron Mans </alias>
<sex>male </sex>
<birthday>1969 </birthday>
<age>47 </age>
</class>
<class>
<name lang= "en" >peter Benjamin Parker </name>
<alias>spider Mans </alias>
<sex>male </sex>
<birthday>unknow </birthday>
<age>unknown </age>
</class>
<class>
<name lang= "en" >steven Rogers </name>
<alias>captain America </alias>
<sex>male </sex>
<birthday>19200704 </birthday>
<age>96 </age>
</class>
</superhero>

(3) XPath uses a path expression to select a node in an XML document: Common path expressions are as follows:

NodeName: Selects all child nodes of this node

/: Select from root node

: Selects nodes in the document from the current node of the matching selection, regardless of their location

.: Select the current node

.. : Selects the parent node of the current node
@: Select Properties

*: Matches any element node

@*: Matches any attribute node

Node (): Matches nodes of any type

(4) How the XPath selector collects data:

(5) Nested selector:

3.CSS Selector (cascading style sheets): CSS rules consist of two main components: selectors, and one or more declarations

Selector{declaration1;declaration2;.......declarationn}

CSS Selectors: Examples:

. class. Intro Select all elements of class= "Intro"

#id #firstname Select all elements of id= "FirstName"

* * Select all elements

Element p Select all <p> elements

Element,element div,p Select all <div> elements and all <p> elements

Element element div p Select all p elements inside the <div> element

[Attribute] [target] Selects all elements with the target property

[Attribute=value] [Target=_blank] Select all elements of target= "_blank"

4.CSS Selector Test:

5. Additional selectors:

The XPath selector also has a. Re () method, which is used to extract data through regular expressions, but differs from using. XPath () or CSS (), and the Re () method returns a list of Unicode strings. So. Unable to construct nested. Re () call

2017.07.26 python web crawler scrapy crawler Frame

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.