Using php to generate sitemap functions for Baidu sites

Source: Internet
Author: User
Baidu site map is a very practical function in Baidu tools. it allows our website to regularly crawl data for Baidu in real time. let's take a look at an xml document generated by php. the company's website is a question-and-answer Encyclopedia website, where seo engineers propose a request according to the website's questions... baidu site map is a very practical function in Baidu tools. it allows our website to regularly crawl data for Baidu in real time. let's take a look at an xml document generated by php.

The company's website is a question-and-answer Encyclopedia website. seo engineers make demands to generate xml files based on website problems. Each xml file contains 5000 pieces of setmap format data, currently, there are about 140 million questions on online websites. Therefore, xml files are generated and an index file is generated. for example, if the file name starts with a number, the index file contains the path and name of each xml file.

Why Store 5000 pieces of data per file? this is a mysql boundary value. if you get more data each time, it may affect online user access or slow down the speed, 5000 data records are stored in each file, but 5000 data records cannot be retrieved at a time in mysql selsect. Currently, 1000 data records are written at a time, which makes the logic a little complicated.

First, let's talk about the implementation:

First, take out 1000 pieces of data, which can be flexible and easy to modify later, and then generate xml format files cyclically. file_puts_contens writes data to the file, then, write the generated xml file name, the minimum id of the problem to be retrieved, the maximum id of the problem to be retrieved, and the number of items to be retrieved into the txt file for index query, the format is probably like this.

0,3146886, 3145887,1000

Is it found that the number of last lines is 1000? for the first select operation, 1000 pieces of data are retrieved, and then 0 is written. in the xml file, the retrieved xml file name, minimum id, maximum id, and number of entries are written into the index query txt. for the first time, 1000 pieces of data are written to 0.xmland 1000 pieces of data are generated, during the second query, the select statement becomes the largest id obtained by where id>. Currently, mysql is a forward query. if it is a reverse query, it is changed to less, in limit 1000, 1000 is taken out again, and then the minimum id and maximum id of the index query txt are modified, and the number of generated entries is increased to 2000. Similarly, when the number of generated entries reaches 5000, another row is written to the index file, similar to this.

0,3146886, 3145887,5000

1, 3148886, 3147887,1000

This reduces the pressure on the server. the implementation code below is a bit messy. the code is as follows:

 $ Psize? $ Psize: ($ maxXml-$ arr [3]); $ bs = 0;} else {$ filename = $ arr [0] + 1; $ bs = 1 ;}} $ maxid = emptyempty ($ arr [1])? 0: $ arr [1]; $ minid = emptyempty ($ arr [2])? 0: $ arr [2]; echo "File name:". $ filename. ". xml "."
"; Echo" maximum id: ". $ maxid ."
"; Echo" minimum id: ". $ minid ."
"; Echo" maximum xml write Record: ". $ maxXml ."
"; Echo" number of database reads each time: ". $ psize ."
"; $ List = self: $ questionObj-> getQuestionSetMap ($ where, $ maxid, $ psize); if (count ($ list) <= 0) {echo 1; exit;} $ record = $ arr [3] + count ($ list); // number of records written to the index file $ indexArr = array ('filename' => $ filename, 'maxid' => $ maxid, 'minid' => $ minid, 'maxxml' => $ record); $ start =' ". Chr (10); $ start. =""; Foreach ($ list as $ k => http: // pic4.phprm.com/2014/08/20/ http://pic4.phprm.com/2014/08/20/ $Qinfo.jpg.jpg) {if ($ k = 0) $ indexArr ['minid'] = $ qinfo ['id']; $ qinfo ['lastmod'] = substr ($ qinfo ['lasttime'], 0, 10); $ qinfo ['mobilelurl'] = self: $ askMobileUrl. $ qinfo ['id']. '.html '; // mobile link $ qinfo ['pcurl'] = self: $ askPcUrl. $ qinfo ['id']. '-p1.html '; // URL of the PC version $ xml. = $ this-> askMapMobileUrl ($ qinfo); // $ xml for mobile. = $ this-> askMapPcUrl ($ qinfo); // PC version} $ maxid = end ($ list); $ indexArr ['maxid'] = $ max Id ['id']; // update the index file if ($ bs = 0) {// update the last row $ txt = file ($ index ); $ txt [count ($ txt)-1] = $ indexArr [filename]. ','. $ indexArr [maxid]. ','. $ indexArr ['minid']. ','. $ indexArr ['maxxml']. "\ r \ n"; $ str = join ($ txt); if (is_writable ($ index) {if (! $ Handle = fopen ($ index, 'w') {echo "cannot open the file $ index"; exit;} if (fwrite ($ handle, $ str) === FALSE) {echo "cannot be written to the file $ index"; exit;} echo "successfully written to the file $ index"; fclose ($ handle );} else {echo "file $ index cannot be written"; exit;} fclose ($ index);} elseif ($ bs = 1) {// add a new row $ fp = fopen ($ index, 'A'); $ num = count ($ list); $ string = $ indexArr [filename]. ','. $ indexArr [maxid]. ','. $ indexArr ['minid']. ', '. $ Num. "\ r \ n"; if (fwrite ($ fp, $ string) === false) {echo "failed to append the new row... "; Exit;} else {echo" append successfully
"; // Update the sitemap index file $ xmlData =" ". Chr (10); $ xmlData. =""; If (! File_exists ($ askXml) file_put_contents ($ askXml, $ xmlData); $ fileList = file ($ askXml); $ fileCount = count ($ fileList); $ setmapxml = "external "; // normal problem link $ txt = $ this-> setMapIndex ($ setmapxml); $ fileList [$ fileCount-1] = $ txt.""; $ NewContent =''; foreach ($ fileList as $ v) {$ newContent. = $ v;} if (! File_put_contents ($ askXml, $ newContent) exit ('data cannot be written '); echo 'documented already '. $ askXml;} fclose ($ fp);} $ filename = APP_PATH. 'setmapxml /'. $ filename. '. XML'; // update to the xml file and add the end if (! File_exists ($ filename) file_put_contents ($ filename, $ start); $ xmlList = file ($ filename); $ xmlCount = count ($ fileList ); $ xmlList [$ xmlCount-1] = $ xml.""; $ NewXml =''; foreach ($ xmlList as $ v) {$ newXml. = $ v;} if (! File_put_contents ($ filename, $ newXml) exit ("data writing error"); else echo "data writing successful
";}// Question and answer mobile xml private function askMapMobileUrl ($ data) {$ xml =''; if (is_array ($ data )&&! Emptyempty ($ data) {$ xml. =" ". Chr (10); if ($ data ['id']) $ xml. =' '. $ Data ['mobilelurl'].' '. Chr (10); // link of the mobile version $ xml. =" ". Chr (10); if ($ data ['lastmod']) $ xml. =' '. $ Data ['lastmod'].' '. Chr (10); $ xml. =' Daily '. Chr (10); $ xml. =' 0.8 '. Chr (10); $ xml. =" ". Chr (10); return $ xml ;}// Q & A for the xml private function askMapPcUrl ($ data) {$ xml = ''; if (is_array ($ data )&&! Emptyempty ($ data) {$ xml. =' '. Chr (10); if ($ data ['id']) $ xml. =' '. $ Data ['pcurl'].' '. Chr (10); // link to the PC version if ($ data ['lastmod']) $ xml. =' '. $ Data ['lastmod'].' '. Chr (10); $ xml. =' Daily '. Chr (10); $ xml. =' 0.8 '. Chr (10); $ xml. =' '. Chr (10); return $ xml ;}// setmaps index file private function setMapIndex ($ filename) {$ xml = ''; $ xml. =" ". Chr (10); $ xml. =" {$ Filename} ". Chr (10); $ xml. =" ". Date (" Y-m-d ", time ())." ". Chr (10); $ xml. =" ". Chr (10); return $ xml ;}}?>


Xml index file format, the code is as follows:

  
  
  
   http://www.phprm.com/ask/setmapxml/0.xml
   
  
   2014-05-12
   
  
  
  
   http://www.phprm.com/ask/setmapxml/1.xml
   
  
   2014-05-12
   
  

Xml file format. each file needs to be stored in 5000 records. here is an example. the instance code is as follows:

  
  
  
   http://m.xxx.cn/ask/7460.html
   
   
  
   2013-01-11
   
  
   daily
   
  
   0.8
   
  
  
 


Address:

Reprinted at will, but please attach the article address :-)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.