The simplest robots.txt file uses two rules: User-agent: The roaming machine Disallow that applies the following rules: the URLs to be intercepted are regarded as one entry in the file. You can add any number of entries as needed. You can add multiple Disallow rows and multiple User-agents to an entry. All parts of the robots.txt File
The simplest robots.txt file uses two rules: User-agent: The roaming machine Disallow that applies the following rules: the URLs to be intercepted are regarded as one entry in the file. You can add any number of entries as needed. You can add multiple Disallow rows and multiple User-agents to an entry. All parts of the robots.txt File
The simplest robots.txt file uses two rules:
- User-agent: A roaming bot that applies the following rules
- Disallow: URL to intercept
These two lines are considered as an entry in the file. You can add any number of entries as needed. You can add multiple Disallow rows and multiple User-agents to an entry.
Each part of the robots.txt file is independent, rather than built on the previous part. For example:
User-agent :*
Disallow:/folder 1/User-Agent: Googlebot
Disallow:/folder 2/
In this example, only URLs that match/folder 2/are forbidden by Googlebot.
User-agent and roaming Bot
The User-agent is a specific search engine. The network roaming Server database lists many common roaming servers. You can set an entry to apply to a specific browser (listed as a display name) or all browser (listed as an asterisk ). Entries for all roaming devices should be in the following format:
User-agent: *
Google uses a variety of different roaming devices (User-agent ). The roaming bot used for our webpage search isGooglebot. Other roaming devices such as Googlebot-Mobile and Googlebot-Image follow the rules you set for Googlebot, but you can also set specific rules for these specific roaming devices.
Intercept User-agent
The Disallow row lists the webpages you want to intercept. You can list a specific URL or mode. The entry should start with a forward slash.
- To intercept the entire website, Use a forward slash.
Disallow: /
- To intercept a directory and all its contentsAdd a forward slash after the directory name.
Disallow:/useless directory/
- To intercept a webpage, Please list this webpage.
Disallow:/private file .html
- To delete a specific image from a Google Image, Add the following content:
User-agent: Googlebot-ImageDisallow: // picture/dog .jpg
- Remove all images on your website from the Goo le ImageRun the following command:
User-agent: Googlebot-ImageDisallow: /
- To intercept a specific file type (for example,. gif), Use the following content:
User-agent: Googlebot
Disallow: /*.gif$
- To block the crawling of web pages on your website and display Adsense ads on these web pages, Disable all roaming bots except Mediapartners-Google. In this way, the webpage cannot appear in the search results, and the Mediapartners-Google browser can analyze the webpage to determine the advertisement to be displayed. The Mediapartners-Google roaming bot does not share webpages with other Google User-agents. For example:
User-agent: *
Disallow: /
User-agent: Mediapartners-Google
Allow: /
Note that commands are case sensitive. For example,Disallow: /junk_file.aspHttp://www.example.com/junk_file.asp, and http://www.example.com/junk_file.asp. Googlebot ignores blank content (especially empty rows) and unknown commands in robots.txt.
Googlebot supports submitting site map files through the robots.txt file.
Pattern Matching
Googlebot (but not all search engines) follows certain pattern matching principles.
- To match consecutive characters, use the asterisk (*).For example, to intercept access to all subdirectories starting with "private", use the following content:
User-agent: Googlebot
Disallow: /private*/
- To intercept all question marks (?) URL access(Specifically, this type of URL starts with your domain name, followed by any string, followed by a question mark, and followed by any string), please use the following content:
User-agent: Googlebot
Disallow: /*?
- To match the end character of a URL, Use $. For example, to intercept all URLs ending with. xls, use the following content:
User-agent: Googlebot
Disallow: /*.xls$
You can use this mode in combination with the Allow command. For example, if? Represents a session ID, so you may want to exclude include? To ensure that Googlebot does not capture duplicate web pages. However? The ending URL may be the version of the webpage you want to include. In this case, you can perform the following settings on your robots.txt file:
User-agent: *
Allow: /*?$
Disallow: /*?
Disallow :/*?The command will block the inclusion? (Specifically, it intercepts all URLs starting with your domain name, followed by any string, followed by question marks, and followed by any string ).
Allow :/*? $The command will allow? Any URL Ending with your domain name (specifically, it will allow all URLs starting with your domain name, followed by any string, followed ?,? URLs that are not followed by any characters ).