domainuser-AGENT: * disallow: The above text represents allowing all search robots to access all files under the site www.shijiazhuangseo.com.cn. Specific syntax analysis: The # text is the description information, the User-Agent is the name of the search robot, and the * text is the name of all search robots. disallow
only when the above two conditions are met, that is, the file name is correct and the path is correct. OtherwiseRobots.txt is just a regular file and does not work.
Now you know how to name this file and where to upload it. Next you will learn how to type commands in this file, the search engine will follow the Robots ExclusionProtocol. In fact, its format is very simple and can meet most of the control needs. First, a row of USERAGENT is used to ide
In SEO optimization site, do robots.txt important because each search engine crawl site information is the first step is to crawl robots.txt files to guide search engine crawling, reasonable use of robots file can better focus on the weight of the site, Avoid some do not want to be search engine to see the file crawl, is a very necessary SEO means, but many do SEO friends for the use of the document is not particularly understanding, just know
not add unnecessary content to the page.
About three file changes have been finished, and finally to mention a very important thing, the target Google has to add the speed of the Web page to the algorithm, but also in order to increase the user experience. Difen seo, suggest, read the file size of the first page is the smaller the better, so choose a good template is also a necessary factor for success.
Yes, Difen SEO said the optimization of the zblog is very simple, but there is one thing i
robots.txt files, more or less than friends have heard, may have written their own. In fact, so far I have not written robots.txt file, not not to write, just feel that there is nothing in the blog to stop Spiders crawl. And presumably everyone knows that the probability of a dead link in a personal independent blog should be very small, and it does not require too much dead link processing so I don't feel necessary. However, robots.txt file writing as a personal webmaster must have mastered one
can bring out all the details of the entire site.
what's in a robots.txt file?Each robots.txt file contains one or more records. A record consists of a robot user agent string that you are willing to follow and instructions to apply to it. Don't worry about all the robot user agent strings you need to know to roam the web, because you can use wildcard * To apply all the robots. The following is an example of a record:User-agent: *
snapshot of your site, place this meta tag in the To allow other search engines to display snapshots, but only to prevent Baidu from appearing, use the following tags:Note: This tag only prohibits Baidu from displaying a snapshot of the webpage, and Baidu will continue to index the page and display a summary of the page in the search results.
Baidu image search is forbidden to include some pictures, how to set?Prohibit Baiduspider crawl all pictures on the site, prohibit or allow baiduspider
following statement can do it
User-agent:googlebot-image
Disallow:/3 The following code does not allow any search engine and robot to mine directory and page information
User-agent: *
Disallow:/cgi-bin/
Disallow:/privatedir/
Disallow:/TUTORIALS/BLANK.HTM4) You can also set different targets for multiple
root directory of a site and all file names must be in lowercase.Website URLURL of robots.txtHttp://www.w3.org/Http://www.w3.org/robots.txtHttp://www.w3.org: 80/Http://www.w3.org: 80/robots.txt.Http://www.w3.org: 1234/Http://www.w3.org: 1234/robots.txtHttp://w3.org/Http://w3.org/robots.txt2. robots.txt syntax“Robots.txt "the file contains one or more records separated by blank rows (with CR, CR/NL, or NL as The Terminator). The format of each record is as follows:"In this file,
must be placed in the root directory of a site and all file names must be in lowercase.Website URLURL of robots.txtHttp://www.w3.org/Http://www.w3.org/robots.txtHttp://www.w3.org: 80/Http://www.w3.org: 80/robots.txt.Http://www.w3.org: 1234/Http://www.w3.org: 1234/robots.txtHttp://w3.org/Http://w3.org/robots.txt2. robots.txt syntax“Robots.txt "the file contains one or more records separated by blank rows (with CR, CR/NL, or NL as The Terminator). The format of each record is as follow
, should be written: disallow: *. Html.
Sometimes we write these rules may have some not noticed the problem, now can through Baidu Webmaster Tools (zhanzhang.baidu.com) and Google Webmaster tools to test.
Relatively speaking, Baidu Webmaster tools are relatively simple tools:
The Baidu robots tool can only detect whether each line command conforms to grammati
reasonable.
User-Agent :*Disallow:/WP-AdminDisallow:/WP-content/pluginsDisallow:/WP-content/themesDisallow:/WP-includesDisallow :/? S =Sitemap: http://www.lesishu.cn/sitemap.xml
Allows All search engines to crawl, list directories one by one, and restrict the crawling of search results.
Contains the sitemap. xml address (this Viki has a special description, but the Google administrator tool will prompt 'invalid sitemap reference', the validity is
by the robot. This file is placed under the root directory of the site, namely http: //... /robots.txt.2. Robots META tagA webpage author can use a special html meta tag to identify whether a webpage can be cited, analyzed, or linked.These methods are suitable for most Web robots. As to whether or not these methods are implemented in software, they also depend on Robot developers, and do not guarantee comp
an example of robot.txt:
# Http://www.yoursite.com/robots.txt
User-agent:
Disallow:/tmp/# these files will soon be deleted
Disallow:/test.html
User-agent: InfoSeek rorobot 1.0
Disallow :/
The content after "#" is a comment. The User-agent command is used to specify the Robot to which the Disallow command under it is v
several Disallow lines. The details are as follows:
User-agent:The value of this item is used to describe the name of the search engine robot. In the "robots.txt" file, if there are multiple User-agent records, it indicates that multiple robots will be restricted by this agreement, for this file, there must be at least one User-agent record. If this parameter is set to *, the Protocol is valid for all mach
root directory of the web site. When a robot accesses a website, it first reads the file, analyzes the content, and does not access certain files according to the web administrator's regulations. The following is an example of robot.txt:
# Http://www.yoursite.com/robots.txt
User-Agent:
Disallow:/tmp/# these files will soon be deleted
Disallow:/test.html
User-Agent: infoseek rorobot 1.0
online search, and if it's just to save the bandwidth of the server on which your site is located, the following statement can do itUser-agent:googlebot-imageDisallow:/3 The following code does not allow any search engine and robot to mine directory and page informationUser-agent: *Disallow:/cgi-bin/Disallow:/privatedir/Disallow:/TUTORIALS/BLANK.HTM4) You can al
bot applicable to the following rulesDisallow: the Web page to intercept1. Every time a user tries to access a nonexistent URL, the server will record the 404 error in the log (the file cannot be found ). .2. website administrators must keep spider programs away from directories on some servers to ensure server performance. For example, the log service on the hichina network has a program stored in the javascgi-bin#directory. Therefore, it is a good idea to add "
provided by an organization and put it under the root directory of the website, for exampleUser-Agent :*Disallow :/Disable all robots
Allow all robot access:
User-Agent :*Disallow:
User-Agent :*Disallow:/cyberworld/MAP/do not allow robot to access files in the/cyberworld/map directory
User-Agent: cybermapper
Time Limit: 3.000 secondsTime Limit: 3.000 seconds Background
Background
Robotics, Robot Motion Planning, and machine learning are areas that cross the boundaries of ideas of the Subdisciplines that comprise Computer Science: artificial intelligence, algorithms and complexity, electrical and Mechanical Engineering to name a few. in addition, robots as "turtles" (lost Red by work by Papert, Abelson, and disallow
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.