Today, I will share with you about the search engine spider. We all know that all the pages on the Internet are crawled by Spider. In fact, spider is a code program. When a new page is generated on the Internet, the spider will crawl. Because the Internet generates hundreds of billions of pages every day, a single spider cannot completely crawl in a short time, so a search engine will generate a large number of spider crawlers to climb the entire Internet as much as possible. Each spider represents a different meaning. How do we know which spider crawls the home page and what crawlers capture the inner pages?
This article is for reference only to give you a deeper understanding of spider IP addresses of different IIS engines. We can analyze the status of websites based on different IP addresses. the following uses the Baidu spider IP address on the IIS diary as an example:
123.125.68. * This spider often comes, but few others indicate that the website may have to enter the sandbox, or the website may be downgraded.
220.181.68. * This IP segment only increases or decreases daily and is likely to enter the sandbox or K station.
220.181.7. * And 123.125.66. * represent Baidu Spider's IP address, which is ready to capture your stuff.
121.14.89. * This IP segment is used as the new site inspection period.
203.208.60. * The IP address segment is abnormal on the new site.
210.72.225. * This IP segment continuously patrols the sites.
125.90.88. * Guangdong Maoming City Telecom is also a main component of Baidu spider IP, which is caused by a large number of new online sites and the use of webmaster tools or comprehensive Seo detection.
220.181.108.95: This is the dedicated IP address used by Baidu to capture the homepage. For example, if the IP address segment 220.181.108 is used, your website will take overnight snapshots every day. It is absolutely wrong. I promise.
220.181.108.92 crawls the home page above 98%, and may also crawl other (not the internal page) Sections 220.181 belong to the weight IP segment this paragraph crawled articles or the home page is basically 24 hours.
123.125.71.106 crawls the inner pages and has a low weight. The inner pages that have been crawled through this section will not be published soon because they are not original articles or collected articles.
220.181.108.91 is comprehensive. It mainly crawls the homepage and internal pages or others, and belongs to the weight IP segment. The articles or homepage crawled are usually published 24 hours a day.
220.181.108.75 the content pages of the updated articles are 90%, 8% the homepage, and 2% others. The weight IP segment is displayed in 24 hours for articles or home pages that have been crawled.
220.181.108.86 capture the IP weight segment of the homepage. Generally, the returned code is 304 0, indicating that the IP address is not updated.
123.125.71.95 crawls the content on the internal page and has a low weight. The content on the internal page that has been crawled through this section will not be published soon because it is not original or collected.
123.125.71.97 crawls the inner pages and has a low weight. The inner pages that have crawled through this section will not be published soon, because they are not original articles or collected articles.
220.181.108.89 indicates that the IP address weight segment of the home page is captured. Generally, the returned code is 304 0, indicating that the IP address is not updated.
220.181.108.94 captures the IP weight segment of the homepage. Generally, the returned code is 304 0, indicating that the IP address is not updated.
220.181.108.97 indicates that the IP address weight segment of the home page is crawled. Generally, the returned code is 304 0, indicating that the IP address is not updated.
220.181.108.80 is used to capture the IP weight segment of the homepage. Generally, the returned code is 304 0, indicating that the IP address is not updated.
220.181.108.77 is used to capture the IP weight segment of the homepage. Generally, the returned code is 304 0, indicating that the IP address is not updated.
123.125.71.117 crawls the content on the internal page, and the weight is low. The content on the internal page that crawls This section will not be quickly published, because it is not original or collected.
Note: there are still many IP addresses, but 123.125.71 with the same field bits. * segment IP addresses indicate that the indexing weights of the inner pages are relatively low. it may be because you have collected articles or published articles for the moment. (meaning to be determined ).
220.181.108.83 capture the IP weight segment of the homepage. Generally, the returned code is 304 0, indicating that the IP address is not updated.
220.181.108. * segment IP addresses mainly capture 80% of the home page and 30% of the internal page. The articles or home pages crawled here are definitely released within 24 hours and snapshots taken overnight. I can guarantee this!
Generally, Code 200 is returned for successful capturing. If code 304 is returned, the website is not updated.