Python crawler (iii)--python set ()

Source: Internet
Author: User

If you have mastered the Crawler foundation, look at my previous three basis and then continue to read this article.
This article focuses on the Python collection that must be used in the crawler, if you are familiar with the collection. That doesn't have to look.

In the crawler, in order not to repeatedly crawl the pages that have been crawled, we need to put the URLs of the crawled pages into the collection,
Before each crawl of a URL, look at whether the collection already exists, if it already exists skip this URL, if it does not exist
Let's put ur into the aggregation and then crawl this page.

Python provides a set of such data structures, set is an unordered, does not contain duplicate elements of the structure, is generally used to test whether
Already contain an element, or repeat for many elements. It supports operations that have intersection, and, poor, and symmetrical differences. Like all the
Container, the method supported by set has
X in Set,len (set), for X in Set
As an unordered structure, set does not record the position of the element and the order in which the elements are inserted, so the set does not support the method
Not supported (Index,slice ...)
The elements in the set are mutable, and the operation of changing the elements in the set can be add () and remove (). He has no hash value, so
Set cannot be considered as a key to a dictionary or as an element of another set

A set can be created with a set () function or curly braces {}, but creating an empty one cannot use a curly brace, only
The set () function, because an empty curly brace creates a dictionary data structure,

Let's look at some examples:
1.1
S1 = {' Girl ', ' Boy ', ' Woman ', ' mans ', ' older ', ' child ', ' man '}
Print (S1)
#结果:
{' Woman ', ' boy ', ' child ', ' Man ', ' girl ', ' older '}
#通过第一个例子可以看出两点, the first set can go directly to the weight, the second set element is unordered

1.2
S1 = {' Girl ', ' Boy ', ' Woman ', ' mans ', ' older ', ' child ', ' man '}
Print (' Girl in S1? ', ' Girl ' in S1)
Print (' Girls in S? ', ' Girls ' in S1)
#输出是:
Girl in the S1? True
Is girls in S? False

1.3
Len (s)
S1 = {' Girl ', ' Boy ', ' Woman ', ' mans ', ' older ', ' child ', ' man '}
Print (' S1 How many elements: ', Len (S1))
Output: 6
Obviously, Len calculates the number of elements after the de-weight.

1.4
About the Issubset (), and Issuperset () methods
S1 = {' Girl ', ' Boy ', ' Woman ', ' mans ', ' older ', ' child ', ' man '}
S2 = {' Boy ', ' Woman '}
S3 = {' Boy ', ' People '}
Print (S2.issubset (S1)) #s2中的元素是否全属于s1中的元素
Print (S3.issubset (S1)) #s3中的元素是否全属于s1中的元素
Print (S1.issuperset (S2)) #s2中的元素是否全属于s1中的元素
Print (S1.issuperset (S3)) #s3中的元素是否全属于s1中的元素

1.5
S2 = {' Boy ', ' Woman '}
S3 = {' Boy ', ' People '}
S2 |= S3
Print (s2) #以上两句话等同于print (S2.union (S3)), go to Heavy
Set.union (Set1,set2,set3 ...) method, or use Set | Set2 | Set3 |
Fetch and set

1.6
S2 = {' Boy ', ' Woman '}
S3 = {' Boy ', ' People '}
S2 &= S3
Print (s2) #以上两句话等同于print (S2.intersection (S3))
Set.intersection (Set1,set2,set3 ...), or Set & Set2 & Set3 ...
Take intersection

1.7
S2 = {' Boy ', ' Woman '}
S3 = {' Boy ', ' People '}
S2-= S3
Print (s2) #以上两句话等同于print (S2.difference (S3))
Set.difference (Set2,set3 ...), or with Set-set1-set2 ...
exists in set but no element exists in other set

1.8
S2 = {' Boy ', ' Woman '}
S3 = {' Boy ', ' People '}
S2 ^= S3
Print (s2) #以上两句话等同于print (S2.symmetric_difference (S3))
Returns a new set in which the individual elements of the two set are stored in the set.

1.9
Finally, the most basic changes and additions
S1 = {' Girl ', ' Boy ', ' Woman ', ' mans ', ' older ', ' child ', ' man '}
Print (S1)
S1.add (' Hello ')
Print (S1)
S1.remove (' Hello ')
Print (S1)
S1.discard (' man ')
Print (S1)
S1.pop ()
Print (S1)
S1.clear ()

With these basics, you'll be able to move on to the next section of your study, and the next one will learn more about Python's regular expressions

Zhongzhiyuan Nanjing 904727147, Jiangsu

Python crawler (iii)--python set ()

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.