Analysis of sitemap and sitemap on the website
The so-called sitemap is generally called "website map". The sitemap file mainly contains valid links in the website, which is easy for search engines to capture and include. If there is no sitemap file, the spider needs to crawl through the hyperlinks on our website one by one. With sitemap, the search engine can directly read the file, which makes it more effective for the search engine to capture our webpage.
The most common file is the simplest form of sitemap, which is an xml file. We will list the URLs of websites and some metadata about each website. These metadata are usually the last update time, update frequency, importance, and so on. It can make search engine crawling more intelligent. Generally, Baidu sitemap supports three formats: txt text format, xml format, and sitemap index format.
The official definitions of www.sitemaps.org are as follows:
Sitemap allows administrators to easily notify the search engine of webpages available for crawling on their websites. The simplest form of Sitepmap is the XML file, list the websites in the website and other metadata about each website (the last update time, the frequency of changes, and the importance to other websites ), so that the search engine can capture websites more intelligently. Web crawling tools usually search for webpages through links on the website and other websites. Sitemap provides this data so that all URLs provided by Sitemap can be captured by the crawling tool of Sitemap and the URLs using the relevant metadata can be learned. Using the Sitemap protocol does not guarantee that web pages are included in search engines, but some tips can be provided to Web crawling tools so that they can capture websites more effectively. Sitemap 0.90 is provided according to the Attribution-sharing alike Creative Commons License clause, and is widely used by Google and Yahoo! And support from many vendors, including Microsoft.
When writing sitemap. xml, we generally follow the following format:
<? Xml version = "1.0" encoding = "UTF-8"?> <Urlset xmlns = "courier> <url> <loc> http://www.xinxingjiaocheng.com/</loc> <lastmod> </lastmod> <changefreq> monthly </changefreq> <prority> 1.0 </prority> </url> .. other url lists </urlset>
Here we need to talk about the meanings of several labels:
(1) changefreq: page content update frequency
(2) lastmod: last modification time of the page
(3) loc: permanent link address of the page
(4) priority: priority relative to other pages
(5) url: parent tag of the first four tags
(6) urlset: parent label of the first five tags
Note the following two points:
(1) xmlns defines the namespace of the xml, which is equivalent to the
(2) Also, the special characters in the loc tag must be escaped, for example, the form of conversion from greater than sign to & gt.
Lastmod description:
(1) lastmod is the last update time.
(2) Before indexing this link, the robot will first compare it with the last update time of the last index record.
(3) if the time is the same, no index will be skipped.
(4) If the link content changes based on the content of the last index, the value should be updated accordingly.
(5) For this time expression, we can use the time format specified in ISO 8601 to describe it.
(6) The most complete time format is: YYYY-MM-DDThh: mmTZD, for example, 2015-06-01T19: 02: 00 + 08: 00
(7) here the TZD refers to the local time zone mark, for example, in the GMT +
For changefreq, we generally need:
(1) For the homepage of a website, we generally use always to indicate "regular"
(2) We can use yearly to represent the "year" link for a long time ago"
(3) other commonly used time tags: always, hourly, daily, weekly, monthly, and yearly
Priority is described as follows:
(1) It specifies the Priority Ratio of this link to other links
(2) The value ranges from 0.0 to 1.0. The higher the value, the higher the weight.
The following are some tips for sitemap:
(1) In general, for large websites, sitemap can be split into several sitemaps.
(2) Each sitemap can contain up to 50 thousand URLs, and cannot exceed 10 MB before compression.
(3) sitemap can be compressed. gzip compression is recommended to save traffic.
(4)you can add the last row in robots.txt to specify the location of sitemap, example: Sitemap: http://www.a.com/sitemap.xml
However, after my tests on whether sitemap is useful, the result is not necessarily useful. Because all major search engines have strong page capturing capabilities, as long as our internal chain is not fatal, indexing is generally normal. In addition, when the website hierarchy is complex and the website content is frequently updated, the sitemap changes frequently. Therefore, I personally feel that it is not very effective.