Python crawler Frame Scrapy Learning Note 7-------Scrapy. Item Source Code Analysis

Source: Internet
Author: User
Tags xpath

In the previous example, we know that defining an item class is as simple as inheriting scrapy. Item, and then add several types to scrapy. Field object as a class property, as in the following

Import Scrapyclass Product (scrapy. Item): name = Scrapy. Field () Price = Scrapy. Field () stock = Scrapy. Field () last_updated = Scrapy. Field (SERIALIZER=STR)


The previous method of using item is also very simple, just like using item with a dict. For example, in the original spider:

Torrent = Torrentitem () torrent[' url '] = response.urltorrent[' name '] = Response.xpath ("//h1/text ()"). Extract () torrent [' description '] = Response.xpath ("//div[@id = ' description ']"). Extract () torrent[' size ' = Response.xpath ("//div[@id = ' Specifications ']/p[2]/text () [2] "). Extract ()


Here are a few questions:

    1. The above name, price, is stock,last_updated really a class attribute?

2. Why can I use an instance of item like a dictionary?

3. What is the use of filed?


Press the CTRL key and the mouse is placed on the scrapy. Item, point in, and the answer is immediately present.

item.py Source

"" "Scrapy itemsee documentation in docs/topics/item.rst" "" From pprint import  pformatfrom UserDict import DictMixinfrom scrapy.utils.trackref import  Object_refclass baseitem (object_ref):     "" "base class for all  Scraped items. "" "     passclass field (dict):     "" "Container of field  metadata "" "Class itemmeta (Type):     def __new__ (Mcs, class_name,  bases, attrs):        fields = {}         new_attrs = {}         For n, v in attrs.iteritems ():             if isinstance (V, field):                 fields[n] = v             else:                new _attrs[n] = v        cls = super (ItemMeta,  MCS). __new__ (Mcs, class_name, bases, new_attrs)          Cls.fields = cls.fields.copy ()         cls.fields.update ( Fields)         return clsclass dictitem (DictMixin,  Baseitem):     fields = {}    def __init__ (self, * Args, **kwargs):        self._values = {}         if args or kwargs:  # avoid creating  dict for moSt common case            for k,  v in dict (*args, **kwargs). Iteritems ():                 self[k] = v    def __ getitem__ (Self, key):        return self._values[key]     def __setitem__ (Self, key, value):         if key in self.fields:             self._values[key] = value        else:             raise keyerror ("%s does not  support field: %s " %                  (Self.__cLass__.__name__, key))     def __delitem__ (Self, key):         del self._values[key]    def __getattr__ (Self,  name):        if name in self.fields:             raise attributeerror ("Use item[%r]  To get field value " % name)         raise  attributeerror (name)     def __setattr__ (self, name, value):         if not name.startswith ('_'):             raise attributeerror ("Use item[%r] = %r to  set field value " %                  (NAMe, value)         super (dictitem, self). __SETATTR__ (Name,  value)     def keys (self):         return  self._values.keys ()     def __repr__ (self):         return pformat (Dict (self))     def copy (self):         return self.__class__ (self) class item (dictitem):     __metaclass__ = itemmeta


Class item inherits Dictitem, and the class instance of class item is created by Itemmeta. (For the contents of the Python meta-class, refer to <<python core programming >>)


Class Dictitem mimics some of Dict's methods and inherits Dictminx to make it a dictionary-like API


Itemmeta's __new__ method did two things: 1. Place the property of type Scrapy.field into the dictionary fields. 2. Place additional attributes into the dictionary new_attrs.

So name, price, stock,last_updated is no longer a class attribute, but is included in the Class attribute fields.


Look at Scrapy again. What exactly is field?

Class Field (Dict): "" "Container of Field metadata" ""

It's not something else, it's just a dict, just a change of name.


The role of filed is (see Official documentation):


FieldObject indicates the metadata for each field (metadata). For example, in the following examplelast_updatedIndicates the serialization function for the field.

You can specify any type of metadata for each field.FieldThe object does not have any restrictions on the accepted values. It is also for this reason that the document cannot provide a key (key) reference list of all available metadata.FieldEach key saved in an object can be used by multiple components, and only those components know the existence of the key. Depending on your needs, you can define the use of otherFieldKey. Set upFieldThe main purpose of an object is to define all the metadata in one place. In general, a component that relies on a field must use a specific key. You must review the documentation for the component to see which metadata keys are used (metadata key).


Official Document: Http://scrapy-chs.readthedocs.org/zh_CN/latest/topics/items.html


Python crawler Frame Scrapy Learning Note 7-------Scrapy. Item Source Code Analysis

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.