A search engine login guide-yahoo Research

Source: Internet
Author: User
A search (yisou.com) is Yahoo Inc. based on the world's leading Yst (Yahoo search Technology) technology, launched in China's independent search portal.

How does a search crawl my website?
Search the application of Yahoo Search engine technology (Yahoo! Search Engine Technology, referred to as yst), it is an algorithm based Web indexing crawler, can automatically detect network content. YST This machine search program collects documents from the Internet and builds a searchable indexing system. The main reason these files (that is, your Web site files) can be found and crawled by yst programs is to include direct links to these documents on other Web pages on the Internet.
YST Search procedures strictly adhere to the robots.txt standard execution crawl, therefore, for those you do not want to be a search engine returned results, the search program will not perform a crawl. Any file that is considered unsuitable for crawling by the robots.txt standard is neither included in the crawl document nor entered into the search engine database.
 
 
How can I reduce the search engine's request to my site?
As we crawl a number of 1 billion of pages from the Internet, we take a lot of systems for Web crawls, so your Web server will log in requests from different yst crawling client IP addresses. Different crawler systems work together to limit any activity from a single network server. The so-called single network server, is judged by the IP address. Therefore, if your server host has more than one IP, its activities are at a higher level.
In Robots.txt, Yst has a specific extension that allows us to set a lower crawl request frequency for our crawlers.
You can add cral-delay:xx instructions, where "XX" means the minimum delay in seconds for the crawler program to enter the site two times. If the crawler frequency is a burden on your server, you can set this latency to any number you think is appropriate, such as 60 or 300.
For example, you want to set a 20-second delay with the following statement:
User-agent:slurp
Crawl-delay:20
 
How do I make a search index only certain pages?
A search of the Yst search engine adheres to the NOINDEX (not indexed) Meta-tag. You can write to the top of a Web document:
At this point, YST will track the document, but will not index it, nor will it revenue the search engine's database.
 
 
See repeated download requests, what's going on?
Typically, during a crawl, the YST search program copies only once for each file in the site. But there is also a chance that, when the crawler is aborted and restarted, it will crawl the last copy of the file. But this kind of situation does not happen frequently, also will not cause your system to malfunction, please rest assured.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.