Some basic skills related to search engines

Source: Internet
Author: User
If you want to make your website and search engine get along with each other and get the information you need as accurately as possible, it is necessary to master some basic skills related to search engines when designing pages. In fact, the basic rules are the same whether it is the internal search engines of websites established through commercial software or some public search engines (such as Google. This article describes how to make your website more accessible to search engines. You will learn some basic methods to control the search results, in addition, there are also technologies that ensure that the website pages can be searched by search engines and that users can obtain more accurate search results.

Use meta tags to control search results

To control the search engine results, whether it is an internal search engine or an external search engine, the most basic method is to write a meta tag named robot to the page, the content should also include index or no index, and follow or no follow. These simple tags tell the search engine how to process the page. Both internal and external search engines process the page according to the meta tag instructions, as shown below:

Index indicates that the page is added to the search engine list, while noindex indicates that the page is not added to the search engine list. This is the most critical parameter. If noindex is selected, the page will not appear in the search results. For example, if the robot meta tag is set to no index on a discontinued product page on an e-commerce website, the product page is not found in the search results of the website, however, you still need to put these old products in the directory so that users who need the product information can view the relevant information from the product category, this avoids users from searching for a large number of old product pages. For newer products, you can set it to index so that the search engine can display it in the search results.

The follow item indicates that the search engine wants to track other superlinks on the page, while nofollow tells the search engine not to track the links on the page. If your page contains links to other websites, you can set the page to nofollow so that the search engine inside your website will not list the content of other websites in the search results. For example, in the website forum page, you should set it to no follow to avoid the search engine from following the link in the post to access other websites. In another case, if you plan to create a page with links to other websites, you should set it to no index, follow, in this way, the search engine will follow the link in the page to search, instead of listing the page itself in the search results.

Create an index list to improve search capability

The biggest problem for building a website with good search capabilities is how to let the search index know which pages should be included in the search scope. Generally, the search engine first locates on the homepage of the website, and then traverses the page of the entire website based on the link in the webpage. This is indeed effective for websites that use the ahref tag as a link sign, but many websites now use JavaScript-based links. The problem is that the search engine cannot find the link on the page, so it cannot traverse the page of the entire website. As a result, the search list can only find several standard links on the homepage.

The solution to this problem is to create a page on the website that contains links to all the pages you want to be searched. If this is an e-commerce website, this page may contain links to all the product pages on the website. If it is a community, this page can contain links to all the discussion pages. Writing such a page does not require any special scripting language. It is a pure HTML page, and the content is basically all used to represent the ahref tag of the link. Only one function is to allow the search engine to quickly find all the pages on the website that you want to search.

Sometimes, when the website itself does not have a site index, this technology can play a site index effect. In addition, you can create a similar list file through the server's file system or through the IIS virtual directory. Such a list may contain all files under the site. Therefore, it may cause search engines to search for isolated pages or files under the site long ago.

In any case, such a search index start page should have a metarobot tag and tell the search engine to track all links on the page but not include the page in the search results. As mentioned above, such a page should be written as no index and follow so that the search engine can traverse the entire website as expected.

For some search engines, especially internal website search engines, you can directly specify the site index list to the search engine. However, in more cases, we cannot specify a page index for the search engine. In this case, you only need to create a standard link on the homepage to point to this index page. As we hope that the search engine can search by following the link in this index, there is no need to add any displayed text content for this link on the home page. For example, add the following statement to the homepage:

<Ahref = "./searchcrawler. aspx"> </a>

Eliminate page interference

After the search engine can retrieve all pages, we need to streamline the search results so that users can obtain the most valuable information. The first step is to remove items on the page that will distract the search engine. For example, navigation menus are useless for search engines, because they appear on each page and contain the same content. On the other hand, the customer may not be able to accurately give the keyword of the content to be queried, but just give a fuzzy word, which may appear in the navigation menu of the website, it also brings trouble to search and locate.

However, there is no way to solve these problems. When the search engine is a website, it first checks that the root domain of the website has a plain text file called robots.txt. The robot contains a User-Agent string line to limit the search engine's access scope to its website, that is, to tell the search engine website which files are allowed for retrieval. Therefore, when a search engine sends a request, it can block menus on the page, advertisements, and other information unrelated to the webpage content.

Through this mechanism, the content searched by the search engine is closely related to the user's needs and does not retrieve information irrelevant to the core content, the keywords in the navigation menu are not listed in the search results.

Add the correct title to the page

To optimize the search engine performance, you must add the correct title to the page. Most search engines list the page titles in the search results. Similarly, using the keywords parameter in meta tags can also improve the ranking of corresponding keywords in the search results.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.