sitemap.xml是一種網站地圖協議,此協議檔案基於早期的robots.txt檔案協議,並有所升級。向搜尋引擎中提交了sitemap.xml的 網站將更有利於搜尋引擎網頁爬行機器人的爬行索引,這樣將提高索引網站內容的效率和準確度。
一共有六個標籤,changefreq:頁面內容更新頻率;
lastmod:頁面最後修改時間;
loc:頁面永久連結地址;
priority:相對於其他頁面的優先權(這個標籤可以不使用);
url:相對於前 4個標籤的父標籤;
urlset:相對於前5個標籤的父標籤。
你可以向搜尋引擎提供多個Sitemap檔案,但提供的每個Sitemap檔案包括的網址不得超過50,000 個,並且未壓縮時不能大於10MB 。
向Google提交網站地圖Sitemap: 通過網址http://www.google.com/webmasters管理提交;
向Yahoo!提交網站地圖Sitemap: 通過網址http://siteexplorer.search.yahoo.com管理提交;
向MSN提交網站地圖Sitemap: 用URL直接提交:http://api.moreover.com/ping?u=http%3A//your.domainname/sitemap.xml。這是向MSN直接提交網站地圖的後門URL。注意”:”被%3A替換掉。
向ASK提交網站地圖Sitemap: 直接提交。http://submissions.ask.com/ping?sitemap=http%3A//your.domainname/sitemap.xml。注意”:”被%3A替換掉。
sitemap.xml檔案格式如下:
<?xml version=”1.0″ encoding=”UTF-8″ ?>
<urlset xmlns=”http://www.sitemaps.org/schemas/sitemap/0.9“>
<url>
<loc>http://www.grzz.com.cn/</loc>
<lastmod>2009-04-27</lastmod>
<changefreq>daily</changefreq>
</url>
<url>
<loc>http://www.grzz.com.cn/index.html</loc>
<lastmod>2009-04-27</lastmod>
<changefreq>weekly</changefreq>
</url>
</urlset>
那怎麼製作sitemap.xml。最笨的方法就是按照這六個標籤的規則,自己手寫了。
如果網站的頁面太多了,這個就會變成了一個超級鬱悶的體力勞動。於是就有不少sitemap.xml的產生工具出現了,但是現在大部分的sitemap.xml產生工具都是在用戶端輸入網址,讓工具在網站自行尋找連結產生,這樣的模式,Rookie感覺效率比較低,而且沒有辦法對產生連結做控制。終於在網上找到了一個比較好的方法,適用於將內容產生靜態頁面的網站。有人將產生sitemap.xml的功能,寫成了asp和php的頁面,在頁面上可以控制需要產生哪些連結。按照你的需要修改頁面後,再把頁面上傳到你的網站空間,訪問這個頁面就是你所需要的sitemap.xml檔案。然後就儲存成為xml檔案格式,再上傳到你的空間,再將連結提交給支援sitemap.xml的搜尋引擎。
Asp檔案,將藍色代碼複製到文字檔,再儲存成sitemap.asp,修改相關設定後,上傳到伺服器,訪問即可
<%session(”server”)=”http://www.grzz.com.cn“ ‘將此http://www.grzz.com.cn改成你的網域名稱vDir = “/” ‘製作SiteMap的目錄set objfso = CreateObject(”Scripting.FileSystemObject”)root = Server.MapPath(vDir)response.ContentType = “text/xml”response.write “<?xml version=’1.0′ encoding=’UTF-8′?>”response.write “<urlset xmlns=’http://www.sitemaps.org/schemas/sitemap/0.9′>”Set objFolder = objFSO.GetFolder(root)Set colFiles = objFolder.FilesFor Each objFile In colFilesresponse.write getfilelink(objFile.Path,objfile.dateLastModified)NextShowSubFolders(objFolder)response.write “</urlset>”set fso = nothingSub ShowSubFolders(objFolder)Set colFolders = objFolder.SubFoldersFor Each objSubFolder In colFoldersif folderpermission(objSubFolder.Path) thenresponse.write getfilelink(objSubFolder.Path,objSubFolder.dateLastModified)Set colFiles = objSubFolder.FilesFor Each objFile In colFilesresponse.write getfilelink(objFile.Path,objFile.dateLastModified)NextShowSubFolders(objSubFolder)end ifNextEnd SubFunction getfilelink(file,datafile)‘changefreq更改參數:always, hourly, daily, weekly, monthly, yearly , neverfile=replace(file,root,”")file=replace(file,”\”,”/”)If FileExtensionIsBad(file) then Exit Functionif month(datafile)<10 then filedatem=”0″if day(datafile)<10 then filedated=”0″filedate=year(datafile)&”-”&filedatem&month(datafile)&”-”&filedated&day(datafile)getfilelink = “<url><loc>”&server.htmlencode(session(”server”)&vDir&file)&”</loc><lastmod>”&filedate&”</lastmod><changefreq>weekly</changefreq></url>”Response.FlushEnd FunctionFunction Folderpermission(pathName)’需要過濾的目錄(不列在SiteMap裡面)PathExclusion=Array(”\ad”,”\admin”,”\aspnet_client”,”\Count”,”\data”,”\Inc”,”\upload”,”\template”)Folderpermission =Truefor each PathExcluded in PathExclusionif instr(ucase(pathName),ucase(PathExcluded))>0 thenFolderpermission = Falseexit forend ifnextEnd FunctionFunction FileExtensionIsBad(sFileName)Dim sFileExtension, bFileExtensionIsValid, sFileExtExtensions = Array(”html”)‘設定列表的檔案名稱,副檔名不在其中的話SiteMap則不會收錄該副檔名的檔案if len(trim(sFileName)) = 0 thenFileExtensionIsBad = trueExit Functionend ifsFileExtension = right(sFileName, len(sFileName) - instrrev(sFileName, “.”))bFileExtensionIsValid = false ‘assume extension is badfor each sFileExt in extensionsif ucase(sFileExt) = ucase(sFileExtension) thenbFileExtensionIsValid = Trueexit forend ifnextFileExtensionIsBad = not bFileExtensionIsValidEnd Function%>
Php檔案,將綠色代碼複製到文字檔,再儲存成sitemap.php,修改相關設定後,上傳到伺服器,訪問即可
<?phpheader(’Content-type: application/xml; charset=”GB2312″‘,true);?><?php$website = “http://www.grzz.com.cn“; /* 將此http://www.grzz.com.cn改成你的網域名稱 */ $page_root = “/”; /*更改成你網站的目錄位址*//* changefreq可自行設定 */$changefreq = “weekly”; //”always”, “hourly”, “daily”, “weekly”, “monthly”, “yearly” and “never”./* 修改時間 */$last_modification = date(”Y-m-d\TH:i:s”) . substr(date(”O”),0,3) . “:” . substr(date(”O”),3);/* 需要產生的目錄 */$allow_dir[] = “web”;/* 需要過濾的目錄(不列在SiteMap裡面) */$disallow_dir[] = “admin”;$disallow_dir[] = “_notes”;/* 設定列表的檔案名稱,副檔名不在其中的話SiteMap則不會收錄該副檔名的檔案 */$disallow_file[] = “.inc”;$disallow_file[] = “.old”;$disallow_file[] = “.save”;$disallow_file[] = “.txt”;$disallow_file[] = “.js”;$disallow_file[] = “~”;$disallow_file[] = “.LCK”;$disallow_file[] = “.zip”;$disallow_file[] = “.ZIP”;$disallow_file[] = “.CSV”;$disallow_file[] = “.csv”;$disallow_file[] = “.css”;$disallow_file[] = “.class”;$disallow_file[] = “.jar”;$disallow_file[] = “.mno”;$disallow_file[] = “.bak”;$disallow_file[] = “.lck”;$disallow_file[] = “.BAK”;/* simple compare function: equals */function ar_contains($key, $array) {foreach ($array as $val) {if ($key == $val) {return true;}}return false;}/* better compare function: contains */function fl_contains($key, $array) {foreach ($array as $val) {$pos = strpos($key, $val);if ($pos === FALSE) continue;return true;}return false;}/* this function changes a substring($old_offset) of each array element to $offset */function changeOffset($array, $old_offset, $offset) {$res = array();foreach ($array as $val) {$res[] = str_replace($old_offset, $offset, $val);}return $res;}/* this walks recursivly through all directories starting at page_root andadds all files that fits the filter criterias */// taken from Lasse Dalegaard, function getFiles($directory, $directory_orig = “”, $directory_offset=”") {global $disallow_dir, $disallow_file, $allow_dir; if ($directory_orig == “”) $directory_orig = $directory; if($dir = opendir($directory)) {// Create an array for all files found$tmp = Array(); // Add the fileswhile($file = readdir($dir)) {// Make sure the file existsif($file != “.” && $file != “..” && $file[0] != ‘.’ ) {// If it’s a directiry, list all files within it//echo “point1<br>”;if(is_dir($directory . “/” . $file)) {//echo “point2<br>”;$disallowed_abs = fl_contains($directory.”/”.$file, $disallow_dir); // handle directories with pathes$disallowed = ar_contains($file, $disallow_dir); // handle directories only without pathes$allowed_abs = fl_contains($directory.”/”.$file, $allow_dir);$allowed = ar_contains($file, $allow_dir);if ($disallowed || $disallowed_abs) continue;if ($allowed_abs || $allowed){$tmp2 = changeOffset(getFiles($directory . “/” . $file, $directory_orig, $directory_offset), $directory_orig, $directory_offset);if(is_array($tmp2)) {$tmp = array_merge($tmp, $tmp2);}}} else { // filesif (fl_contains($file, $disallow_file)) continue;array_push($tmp, str_replace($directory_orig, $directory_offset, $directory.”/”.$file));}}} // Finish off the functionclosedir($dir);return $tmp;}}$a = getFiles($page_root);echo ‘<?xml version=”1.0″ encoding=”UTF-8″?>’;?><urlset xmlns=’http://www.sitemaps.org/schemas/sitemap/0.9′><?foreach ($a as $file) {?><url><loc><? echo utf8_encode($website.$file); ?></loc><lastmod><? echo utf8_encode(date(”Y-m-d\TH:i:s”, filectime($page_root.$file)). substr(date(”O”),0,3) . “:” . substr(date(”O”),3));?></lastmod><changefreq><? echo utf8_encode($changefreq); ?></changefreq></url><?}?></urlset>