International - English

Cart Console

Topic Center

Contact Sales

Home > Developer > Python

Chapter 2 scrapy-redis distributed crawler, Chapter 2 scrapy-redis

Last Update:2017-05-12 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Chapter 2 scrapy-redis distributed crawler, Chapter 2 scrapy-redis
9-1 Key Points of distributed crawling

1. Advantages of distributed architecture

Make full use of the bandwidth of multiple machines to accelerate crawling
Make full use of the IP addresses of multiple hosts to accelerate crawling

Q: Why does scrapy not support distributed deployment?

A: In scrapy, scheduler runs in the queue, while the queue is in the single-host memory. crawlers on the server cannot use the memory queue for any processing. Therefore, scrapy does not support distributed processing.

2. Distributed Problems to be Solved

Centralized requests Queue Management
Decentralized and centralized management

So we need to solve it with redis.

9-2 ~ 3 basic knowledge of redis I. Installation of redis (windows 64-bit)

1. Baidu: redis for windows find the installation package on github

Click to download

Cmd switch to the downloaded directory

Run the following command:

This is already started. You can enter related commands for testing.

Ii. Redis Data Types

String
Hash/hash
List
Set
Sortable set

1. string command

Set mykey ''cnblogs' to create a variable

Get mykey

Getrange mykey start end obtains the string, for example, get name 2 5 # obtain name2 ~ 5 string

Strlen mykey get Length

Incr/decr mykey plus one minus one, type is int

Append mykey ''com ''to add a string to the end

2. Hash command

Hset myhash name "cnblogs" creates a variable. myhash is similar to the variable name, and name is similar to the key. "cnblogs" is similar to values.

Hgetall myhash get key and values

Hget myhash name to get values

Hexists myhash name check whether this key exists

Hdel myhash name Delete this key

Hkeys myhash view key

Hvals muhash view values

3. LIST commands

Lpush/rpush mylist "cnblogs" add value to left/add value to right

Lrange mylist 0 10 View list 0 ~ Value of 10

Blpop/brpop key1 [key2] timeout: delete one from left/right. If there is no key in timeout, it will end after the set time.

Lpop/rpop key is deleted left/right, with no waiting time.

Obtain the length of llen key

The lindex key index takes the index element, and the index starts from 0.

4. Set commands (not repeated)

Sadd myset "cnblogs" add content. If the returned value is 1, it indicates that the content does not exist. If the returned value is 0, it indicates that the content exists.

Scard key to view values in the set

Sdiff key1 [key2] Two sets are used for subtraction, which is actually the part of the communication.

Sinter key1 [key2] the addition of two sets leaves the intersection of the two.

Spop key random deletion Value

Srandmember key member random get member values

Smember key to get all elements

5. sortable set commands

Zadd myset 0 'project1 '[1 'project2'] Add a set element. brackets are not available, which is easy to understand here.

Zrangebyscore myset 0 100 select the score from 0 ~ Elements of 100

Zcount key min max selects scores in min ~ Max number of elements

Iii. Redis Documentation 9-4 ~ 9. All sections mainly explain scrapy-redis

You can see how to use scrapy-redis on github.

The bloomfilter bloom filter is integrated into scrapy-redis.

The source code is not fully understood. I will not explain it first.

For more information about the code, see my github: scrapy-redis application project.

Author: Jin Xiao
Source: http://www.cnblogs.com/jinxiao-pu/p/6838011.html
The copyright of this article is shared by the author and the blog. You are welcome to repost this article, but you must keep this statement without the author's consent and provide a connection to the original article on the article page.

If you think it is good, click a recommendation!

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

Related Keywords:

Python thread pause, resume, exit detail and Example _python 01-18

Python design mode-UML-Package diagrams (Package Diagram) 09-09

Python abstract class (ABC module) 09-18

The difference between OS and sys two modules in Python 04-05

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

What's Trending

Top 10 Tags

datastax versions naming convention zookeeper client class definition md5 microsoft sql server 2005 data structures exception handling error handling

Top 10 Keywords

microsoft download center down wordpress address url site address url wordpress address url windows installer 4 0 download 302 not found web address url definition site address url wordpress db2 integer mac os installation step by step pdf abbreviation for return

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Chapter 2 scrapy-redis distributed crawler, Chapter 2 scrapy-redis

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support