Sesame HTTP: how to find the crawler portal and sesame search for the crawler Portal

Last Update:2018-03-06 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Search for Crawler entries
1. The entry of this task. A better entry for this crawler is our usual search engine. Although there are many types of search engines, they are actually doing one thing, indexing webpages, processing, and then providing search services. During normal use, we usually search directly by entering keywords, but there are still many search techniques. For example, if we search for this task like this, we can get the data we want.

Site: zybang.com

Now we have a try at Baidu, Google, sogou, 360, and Bing respectively:

From the figure above, we can see that the returned data volume is in the millions or even tens of millions.

Therefore, it is obviously better to use the data as the entry point for this task. As for anti-crawler measures, the basic skills of individuals will be tested.

2. Other portals (1) the mobile portal obtains data through the mobile portal of the website, so that data can be obtained more quickly.

The simplest way to find a mobile portal is to use the developer mode of Google's browser, click what the following mobile phone looks like, and refresh it.

This method is not omnipotent. Sometimes we can send the website address to our mobile phone and open it in the mobile browser to check whether the format displayed on the mobile phone is different from that displayed on the computer, if they are different, you can copy the website of your mobile browser and send it to your computer.

(2) website maps are web pages that can be easily crawled by website administrators to notify search engines, therefore, using these website maps, you can more efficiently and conveniently obtain some websites that serve as the next entry. (3) modifying the value in the URL first declares that this technique is not omnipotent. This technique is mainly used to obtain the required data from a request based on the values of some fields in the website. This reduces the number of requests and reduces the risk of being banned from the website, this improves the efficiency of crawlers. The following example shows how to capture all the music data of a singer who crawls QQ music in the following format:

Https: // xxxxxxxxx & singermid = xxxx & order = listen & begin = {begin} & num = {num} & songstatus = 1

The returned data packet is as follows:

Some of the field values are replaced by xxx. Note that the num field here is usually displayed on the next page when many songs are used by a singer, therefore, the begin here should be the corresponding value of the first entry of each page, while num is the number of data records on this page. Generally, we can obtain data one page at a time. The default value of QQ music is 30. So do we have to request at least four times to obtain the complete data?

Of course not. In fact, at this time, we can try to change some values in the URL to see if the returned results will change. Here, we will change the values of num and begin. Setting num is the number of all songs of a singer, and the value of begin is 0. In this case, we will request the modified URL again, you can get the following data:

As shown above, 96 data records are returned.

In this way, we can get all the data through two requests. The first request gets the total number, and then modifies the URL and then re-requests it to obtain all the data. Similar fields include pagesize.

Summarizing the tips above for searching for Crawler portals can help us get twice the result with half the effort. Sometimes we can get data at the minimum cost.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Sesame HTTP: how to find the crawler portal and sesame search for the crawler Portal

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Sesame HTTP: how to find the crawler portal and sesame search for the crawler Portal

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support