Analyze spider crawl time from IIS log build seconds to protect original content in time

Source: Internet
Author: User
Keywords Seconds to collect spiders crawling

Intermediary transaction SEO diagnosis Taobao guest Cloud host technology Hall

Often have webmaster complained about the high weight of the site collected their own hard-earned original content, resulting in their own original into other sites, today I through the example and share how to solve this problem.

You can imagine the real life of the recognition of intellectual property, in the simplest case, if a in the magazine published an original article, b read unchanged after the plagiarism and published to other magazines, A to sue B, the court can easily according to the publication of the time to Judge B copy a, because a published works prior ( If B has been modified and so on two processing means to publish again, that will be based on the identification of the court and both sides of the evidence), back to the network world, especially by the Baidu to make rules to determine who is the original system, the assumption that Baidu has been published in two different sites of a certain article content, who is original, very simple, Who first was Baidu included who is original rather than who first published, there are Stationmaster said, my article first published, but after the N-hour Baidu was included, and the other site in Baidu collected before the collection of my and was immediately included in Baidu, so I became not original, yes, the problem is here, included time!

Since Baidu included our web page content speed slow, how to solve it? To allow Baidu the first time included in the Web page, there are generally 2 methods, one is to use ping service, is that you published an article immediately after Ping Baidu to tell it the address of the article (about Ping service introduction and use please refer to Baidu Webmaster platform, can also contact the author, this is generally for authoritative news source site, small web site Baidu seems to ignore, the second method is the focus of this article discussion-Choose the right time to publish.

One, Baidu spider crawl interval and regularity

Baidu Spider is just a program of Baidu, it automatically access the Web page crawl content, and we commonly known as the News Thief is a principle, but this thief we are welcome. Spiders don't always stop on a website, for a large web site, there may be many spiders visit many different pages, resulting in a site every second there are spiders in the activity, but even such a large site, specific to a Web page (such as the homepage), the spider visit will generally have a certain interval, Time from a few seconds to a few hours, but also a few days, this is the spacing of spiders crawling;

Again, the regularity of specific Web sites (Web pages) in accordance with a relatively fixed cycle of crawling, such as every few minutes, a few hours of access, the following an example to illustrate (data through the Web Log Explorer analysis and Export to Excel subtotals).

The above picture is the author statistics homepage Spider Crawl Law (originally wanted to list 2 days in a total of hours of data, found that too much inconvenience to the photos published, had to choose 1 days between 8 o'clock in the morning to 18 o'clock in the afternoon data analysis).

The time column above shows the spider's actual crawling home page time, the summary column is the author based on the time to make a rough summary (individual data can be removed), from the above figure can be found in the spider's approximate crawling law:

The morning is usually divided into 4 sessions per hour, 10-15 minutes, 25-30 minutes, 40-45 minutes, 55-60 minutes.

Every hour of the afternoon is also divided into 4 periods, but performance for the whole point (about), 0 minutes, 15 minutes, 30 minutes, 45 minutes, at the same time I analyzed the data for the second day, basically the same, this determined me to the spider regularity, in fact, I counted nearly 10 days of data, there are similar laws.

II. application of spider regularity practice

Find out the law of the spider crawling, we can advance a little prepared food, my test results, 17:43 published articles and updated home success, the results of the spider in 17:44 or so as scheduled to crawl home, and included related articles.

Iii. Summary:

This article on the "second collection" concept to do a detailed description of the second collection just to release the article happened to be in a specific time spider capture, in this sense, as long as spiders also crawl the site, content for the original, but also can do seconds and do not care about the weight of the site, As for the spiders dragged into the Baidu Index library later how to refine the sorting is another topic. This article is also for those who insist on writing original articles and was collected by mistake for the owners of the ghost to provide a protection of original ideas, the ability of the webmaster can add a own statistical tool on the home page, specifically record the time of a specific search engine crawling, control laws to have a choice of controlled publishing time, to be confident, Every time spiders have food, slowly, spiders will improve the crawling frequency, so you can do any time published articles can be seconds, as the author of the site Spider crawl home time interval of about 15 minutes, basically any time published articles can be called "seconds", Text by the Chinese Agricultural Talent Network-the first domestic establishment of agriculture, forestry and fishery industry Talent Network Http://www.5ajob.com original, into a manuscript in 2013, the first day of the new Year, there are reprinted hope to leave a link welcome to exchange, I wish you webmaster in the new year to become, away from K station!

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.