The use of pipeline in the scrapy of Python crawlers

Source: Internet
Author: User

Scrapy pipeline is a very important module, the main function is to write the return items to the database, files and other persistent modules, below we will briefly understand the use of pipelines.

Case one:

  

Items Pool

classZhihuuseritem (scrapy. Item):#Define the fields for your item here is like:    #name = Scrapy. Field ()ID =Scrapy. Field () name=Scrapy. Field () Avatar_url=Scrapy. Field () Headline=Scrapy. Field () Description=Scrapy. Field () URL=Scrapy. Field () Url_token=Scrapy. Field () Gender=Scrapy. Field () Cover_url=Scrapy. Field () Type=Scrapy. Field () Badge=Scrapy. Field () Answer_count=Scrapy. Field () Articles_count=Scrapy. Field () commercial_question=Scrapy. Field () Favorite_count=Scrapy. Field () Favorited_count=Scrapy. Field () Follower_count=Scrapy. Field () Following_columns_count=Scrapy. Field () Following_count=Scrapy. Field () Pins_count=Scrapy. Field () Question_count=Scrapy. Field () Thank_from_count=Scrapy. Field () Thank_to_count=Scrapy. Field () Thanked_count=Scrapy. Field () Vote_from_count=Scrapy. Field () Vote_to_count=Scrapy. Field () Voteup_count=Scrapy. Field () Following_favlists_count=Scrapy. Field () Following_question_count=Scrapy. Field () Following_topic_count=Scrapy. Field () Marked_answers_count=Scrapy. Field () Mutual_followees_count=Scrapy. Field () Participated_live_count=Scrapy. Field () Locations=Scrapy. Field () Educations=Scrapy. Field () employments= Scrapy. Field ()
Items

Basic configuration for writing to the MongoDB database

# Configure connection information for a MONGODB database ' 172.16.5.239 '  = 27017'zhihuuser'# parameter equals false, which is equal to telling you what you want to take on this site, Does not read the Prohibit crawl list under the root directory of each Web site (for example: www.baidu.com/robots.txt)Robotstxt_obey == {   '  zhihuuser.pipelines.MongoDBPipeline': +,}
settings.py

pipelines.py:
1, first we want to read the data from the settings file address, port, database name (not automatically created).
2, get the basic information of the database to connect.
3. Write data to the database
4. Close the database
Note: Only open and close are performed once, and the write operation depends on the number of writes.
Import Pymongoclass Mongodbpipeline (object): "" "1, Connect database Operation" "" Def __init__ (SELF,MONGOURL,MONGOPORT,MONGODB):        "' Initialize MongoDB data URL, port number, database name:p Aram Mongourl::p Aram Mongoport::p Aram MongoDB: "' Self.mongourl = mongourl Self.mongoport = Mongoport Self.mongodb = MongoDB @classmethod D        EF From_crawler (Cls,crawler): "" "1, read the URL of the MongoDB data inside the settings, port, DB.            :p Aram Crawler:: Return: "" "return cls (Mongourl = Crawler.settings.get (" Mongo_url "),    Mongoport = Crawler.settings.get ("Mongo_port"), MongoDB = Crawler.settings.get ("mongo_db")) def open_spider (Self,spider): ' 1, connect MongoDB data:p Aram Spider:: return: ' Self . Client = Pymongo. Mongoclient (self.mongourl,self.mongoport) self.db = Self.client[self.mongodb] def process_item (Self,item,spider)     : "1. Write data to Database   :p Aram Item::p Aram Spider:: return: ' name = item.__class__.__name__ # Self.db[na Me].insert (Dict (item)) self.db[' user '].update ({' Url_token ': item[' url_token ']},{' $set ': item},true) return ite M def Close_spider (self,spider): ' 1, close database connection:p Aram Spider:: return: ' SE Lf.client.close ()

  

The use of pipeline in the scrapy of Python crawlers

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.