Scrapy_redis can only use Redis's db0?

Source: Internet
Author: User
Tags db2 redis
background:

As we all know, the default configuration of Redis generates a total of 16 db db0~db15, and one of the roles of 16 db is to facilitate different projects using different db, to prevent data confusion and to facilitate data viewing.
Python uses db0 by default if you do not specify which db to use when connecting Redis. Students who have used the Scrapy_redis module also know that the db0 and seed queues are on the top of the line.

Now there is a distributed crawler based on Scrapy and Redis, inherited from colleagues. Originally did not feel Scrapy_redis use DB2 to store request and dupefilter what difficulty, also don't care about his code is how to achieve, only know his crawler will seed and go to heavy all put in the DB2. Yesterday added function to change code, the code from the server copy out run test, found that the data is stored in the db0. Carefully read the code, did not find where to change the DB2, but the server running on why the data to DB2 it. The same code, save the place unexpectedly. Great confusion.

Dubious to see the server Scrapy_redis source code, found in the connection.py even Redis's codes have been changed.

Also a little drunk. Directly to change the Python module source code is a big bogey. Although this change is simple and quick, but the module is a common code, if other projects also need to use this project will be chaotic, but also not conducive to the project code migration, and quench thirst no difference.

Then look at the source code of the Scrapy_redis module, found that Scrapy_redis in the connection Redis did not specify DB, the default db0. There is no interface for the user to specify DB. So the question is, Scrapy_redis wants to use Redis's DB2, what to do.



Solve:

1, colleagues that change is very simple and quick (such as the above picture), but can not directly change the module source code, to copy the source code out, put in the project directory, and then modify. When the project calls Scrapy_redis, it also invokes the Scrapy_redis under this directory.

2, and the second is with inheritance. See the above picture, the role of the code is to instantiate the generation of a Redis connection object, in general, to return to Redis. Redis () object. This method is called by the From_settings () method under scheduler.py. The code is as follows:

We can inherit this method, and then a cynical.
How to: Create a new file scheduleroverwrite.py in the settings.py sibling directory, and fill in the following code. You can then specify DB by setting up Scheduler=scheduleroverwrite.schedulerson in settings.py and then redis_db=xxx in settings.py.

Import Redis from Scrapy_redis.scheduler Import Scheduler from Scrapy.utils.misc import load_object # default values Sche
Duler_persist = False Queue_key = '% (spider) s:requests ' queue_class = ' scrapy_redis.queue.SpiderPriorityQueue ' Dupefilter_key = '% (spider) S:dupefilter ' idle_before_close = 0 Redis_url = None redis_host = ' localhost ' redis_port = 637 9 redis_db = 0 def from_settings (settings): url = settings.get (' Redis_url ', redis_url) host = Settings.get (' Redi S_host ', redis_host) port = settings.get (' Redis_port ', redis_port) db = Settings.get (' redis_db ', redis_db) #
    Redis_url takes precedence over Host/port specification. If Url:return redis.from_url (URL) else:return Redis. 
        Redis (Host=host, Port=port, Db=db) class Schedulerson (Scheduler): @classmethod def from_settings (CLS, settings): persist = Settings.get (' scheduler_persist ', scheduler_persist) Queue_key = Settings.get (' Scheduler_queue_
     KEY ', Queue_key)   Queue_cls = Load_object (Settings.get (' Scheduler_queue_class ', queue_class)) Dupefilter_key = Settings.get (' DUPE
        Filter_key ', dupefilter_key) idle_before_close = Settings.get (' scheduler_idle_before_close ', idle_before_close) Server = from_settings (settings) return to CLS (server, Persist, Queue_key, Queue_cls, Dupefilter_key, Idle_befo
 Re_close)



Reprint Please indicate the source, thank you. (Original link: http://blog.csdn.net/bone_ace/article/details/54139500)

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.