In the previous example, we know that defining an item class is as simple as inheriting scrapy. Item, and then add several types to scrapy. Field object as a class property, as in the following
Import Scrapyclass Product (scrapy. Item): name = Scrapy. Field () Price = Scrapy. Field () stock = Scrapy. Field () last_updated = Scrapy. Field (SERIALIZER=STR)
The previous method of using item is also very simple, just like using item with a dict. For example, in the original spider:
Torrent = Torrentitem () torrent[' url '] = response.urltorrent[' name '] = Response.xpath ("//h1/text ()"). Extract () torrent [' description '] = Response.xpath ("//div[@id = ' description ']"). Extract () torrent[' size ' = Response.xpath ("//div[@id = ' Specifications ']/p[2]/text () [2] "). Extract ()
Here are a few questions:
The above name, price, is stock,last_updated really a class attribute?
2. Why can I use an instance of item like a dictionary?
3. What is the use of filed?
Press the CTRL key and the mouse is placed on the scrapy. Item, point in, and the answer is immediately present.
item.py Source
"" "Scrapy itemsee documentation in docs/topics/item.rst" "" From pprint import pformatfrom UserDict import DictMixinfrom scrapy.utils.trackref import Object_refclass baseitem (object_ref): "" "base class for all Scraped items. "" " passclass field (dict): "" "Container of field metadata "" "Class itemmeta (Type): def __new__ (Mcs, class_name, bases, attrs): fields = {} new_attrs = {} For n, v in attrs.iteritems (): if isinstance (V, field): fields[n] = v else: new _attrs[n] = v cls = super (ItemMeta, MCS). __new__ (Mcs, class_name, bases, new_attrs) Cls.fields = cls.fields.copy () cls.fields.update ( Fields) return clsclass dictitem (DictMixin, Baseitem): fields = {} def __init__ (self, * Args, **kwargs): self._values = {} if args or kwargs: # avoid creating dict for moSt common case for k, v in dict (*args, **kwargs). Iteritems (): self[k] = v def __ getitem__ (Self, key): return self._values[key] def __setitem__ (Self, key, value): if key in self.fields: self._values[key] = value else: raise keyerror ("%s does not support field: %s " % (Self.__cLass__.__name__, key)) def __delitem__ (Self, key): del self._values[key] def __getattr__ (Self, name): if name in self.fields: raise attributeerror ("Use item[%r] To get field value " % name) raise attributeerror (name) def __setattr__ (self, name, value): if not name.startswith ('_'): raise attributeerror ("Use item[%r] = %r to set field value " % (NAMe, value) super (dictitem, self). __SETATTR__ (Name, value) def keys (self): return self._values.keys () def __repr__ (self): return pformat (Dict (self)) def copy (self): return self.__class__ (self) class item (dictitem): __metaclass__ = itemmeta
Class item inherits Dictitem, and the class instance of class item is created by Itemmeta. (For the contents of the Python meta-class, refer to <<python core programming >>)
Class Dictitem mimics some of Dict's methods and inherits Dictminx to make it a dictionary-like API
Itemmeta's __new__ method did two things: 1. Place the property of type Scrapy.field into the dictionary fields. 2. Place additional attributes into the dictionary new_attrs.
So name, price, stock,last_updated is no longer a class attribute, but is included in the Class attribute fields.
Look at Scrapy again. What exactly is field?
Class Field (Dict): "" "Container of Field metadata" ""
It's not something else, it's just a dict, just a change of name.
The role of filed is (see Official documentation):
FieldObject indicates the metadata for each field (metadata). For example, in the following examplelast_updatedIndicates the serialization function for the field.
You can specify any type of metadata for each field.FieldThe object does not have any restrictions on the accepted values. It is also for this reason that the document cannot provide a key (key) reference list of all available metadata.FieldEach key saved in an object can be used by multiple components, and only those components know the existence of the key. Depending on your needs, you can define the use of otherFieldKey. Set upFieldThe main purpose of an object is to define all the metadata in one place. In general, a component that relies on a field must use a specific key. You must review the documentation for the component to see which metadata keys are used (metadata key).
Official Document: Http://scrapy-chs.readthedocs.org/zh_CN/latest/topics/items.html
Python crawler Frame Scrapy Learning Note 7-------Scrapy. Item Source Code Analysis