PHP article content crawl

Source: Internet
Author: User
Ask God to help crawl this page http://sports.sohu.com/zhongchao.shtml data on the leaderboard (including the standings and Sagittarius)



Reply to discussion (solution)

Crawl Research Research Phpquery

$url = ' http://sports.sohu.com/zhongchao.shtml '; $s = file_get_contents ($url);p Reg_match_all ('/(? <=) \s
 
  
   
  /isu ', $s, $m);p Rint_r (Preg_grep ('/rank/', $m [0]));
 
  
Array ([2] = =

You can use Preg_match to crawl the corresponding HTML code and then filter the data you want.

Recommend a class for you simple_html_dom




Include "simple_html_dom.class.php"; $url = "http://sports.sohu.com/zhongchao.shtml"; $dom = new Simple_html_dom (); $ html = $dom->load (file_get_contents ($url)), $res = $html->find ("Div#turnidb Div.turn"); # standings echo $res [0]-> outertext;# Archer's list echo $res [1]->outertext;
$str =file_get_contents ("http://sports.sohu.com/zhongchao.shtml");p Reg_match_all ('/
...... Then do it yourself. Results \s*
Rank Team matches points
01 Guangzhou Heng da 20 45
02 Beijing Guoan
(.+?) <\/td>\s* (.+?) <\/td>\s* (\d+) <\/td>\s* (.+?) <\/td>\s*<\/tr>/i ', $str, $match 1), foreach ($match 1 as $k = + $v) {if ($k!=0) {foreach ($v as $k 1=> $v 1) {if ($k 1<=15) {$jifen [$k][]= $v 1;} else{$sheshou [$k][]= $v 1;}}} echo "
";p Rint_r ($jifen);p rint_r ($sheshou); echo"
/*array ([1] = = Array ([0] = [1] = [2] = 03 [3] = [4] = [5] = [6] = [7] = 08 [8 ] = [9] = [ten] = [one] = [one] = [13] [ + [+] [+] [2] = = Array ([0] = wide State Evergrande [1] = Beijing Guoan [2] = Guangzhou freego [3] = Shanghai East Asia [4] = Guizhou Maotai [5] Shandong Luneng [6] = Tianjin Teda [7] = Jiangsu Sainty Day [8] = Shanghai Greenland [9] = Changchun Yatai [Ten] = Hangzhou Greentown [one] = Dalian Aerbin [[+] = Shanghai Shen Xin [] = Henan Jianye [] =&G T Liaoning Hong Yun [[+] = Harbin Yi Teng) [3] = = Array ([0] = [1] = 19 [2] = 19 [3] = [4] = [5] = [6] = [7] = 18 [8] = [9] = [ten] = [one] = [+] = 19 [4] = [+] ([+] = [] = +) [+] = = Array ([ 0] [1] = [2] [3] = [4] = 30 [5] [6] [7] = [8] = [9] = 21 [10] = [one] = [16] = [+] = [+] + [1] 5] (= +)) array ([1] = = Array ([0] = [1] = [2] =&G T [3] = [4] = [5] = [6] = [7] = 08 [8] = = [9] = [+] = [+] ~ [ALL] [] = [+] [] [] [] [] [] []] [=&G] T [+] [+] = [2] = = Array ([0] = = Elksen [1] = = Hammed [2] = Haisen [3] = Damrey [4] [+] = [5] = Lowe [6] = [7] = Dejan [8] = Bathala [9] = = Bruno [Ten] = Ricardo [One] = Zhao Dongquan [[]] = Eno [All] = Yuri [+] [MO] Renault [] = Rene ) [3] = = Array ([0] = [1] = [2] = 13 [3 ] = 9 [4] = 9 [5] = 9 [6] = 9 [7] = 8 [8] = = 7 [9] = 7 [Ten] = 7 [One] = 7 [All] = 7 [] = 7 [+] = 6 [+] = 6) [4] = = Array ([0] = Guangzhou Evergrande [1] = Guangzhou freego [2] = Shanghai East Asia [3] = Guangzhou freego [4] = Harbin Yi Teng [5] = Shandong Luneng [6] = Hangzhou Greentown [7] = Beijing Guoan [8] = Beijing Guoan [9] = Dalian Aerbin [ten] = Harbin Yi Teng [ Shanghai East Asia [Three] = Changchun Yatai [All] = Guizhou Maotai [+] = Shanghai Greenland [All] = Guangzhou Evergrande ))*/
I'll handle it in the back.

$url = ' http://sports.sohu.com/zhongchao.shtml '; $s = file_get_contents ($url);p Reg_match_all ('/(? <=) \s
 
       
        
  /isu ', $s, $m);p Rint_r (Preg_grep ('/rank/', $m [0]));
 
       
Array ([2] = =


Sohu page is gb2312, after the acquisition need to turn UTF8, otherwise it will garbled

Echo '
  
           ; $url = ' http://sports.sohu.com/zhongchao.shtml '; $s = file_get_contents ($url); $s = iconv (' GBK ', ' UTF8 ', $s); gb2312 Turn Utf8preg_match_all ('/(? <=) \s
  
          
           
   /isu ', $s, $m);//Get the standings preg_match_all ('
  /
          
...... And then I'm going to do it myself. How is an empty array \s* \s*
Rank Team matches points
01 Guangzhou Heng da 20 45
02 Beijing Guoan
(.+?) <\/td>\s* (.+?) <\/td>\s* (\d+) <\/td>\s* (.+?) <\/td>\s*<\/tr>/i ', $m [0][2], $scores), $scoreboard = Array (); for ($i =0, $len =count ($scores [1]); $i <$ Len $i + +) {$tmp = Array ($scores [1][$i],strip_tags ($scores [2][$i]), $scores [3][$i], $scores [4][$i]); Array_push ($ Scoreboard, $tmp);} Print_r ($scoreboard);//Archer list Preg_match_all ('/
(.+?) <\/td>\s* (.+?) <\/td>\s* (\d+) <\/td>\s* (.+?) <\/td>\s*<\/tr>/i ', $m [0][3], $shooters), $shooterboard = Array (); for ($i =0, $len =count ($shooters [1]); $i < $len; $i + +) {$tmp = Array ($shooters [1][$i],strip_tags ($shooters [2][$i]), $shooters [3][$i], $shooters [4][$i]); Array_push ($ Shooterboard, $tmp);} Print_r ($shooterboard);

Standings
Array ([0] = = Array ([0] = [1] = Guangzhou Evergrande [2] = 20 [3            ] [1] = = Array ([0] = [1] = = Beijing Guoan [2] = 19            [3] = [2] = = Array ([0] = [1] = + Guangzhou Freego [2] = [3] = 3] [+] = [0] = [1] = Shanghai East            Asia [2] = [3] = [4] = = Array ([0] = 05 [1] = Guizhou Maotai [2] = [3] = [5] = = Array ([0] = 0            6 [1] = Shandong Luneng [2] = [3] = [6] = = Array (        [0] = [1] = Tianjin Teda [2] = [3] = +) [7] = = Array ([0] = = [1] = = Jiangsu Sainty Day [2] = [3] = [8] = = Array (        [0] = [1] = Shanghai Greenland [2] = [3] = +) [9] = = Array ([0] = [1] = Changchun Yatai [2] = [3] = 21) [10] =        > Array ([0] = = [1] = Hangzhou Greentown [2] = [3] = 21 ) [One] = Array ([0] = [1] = Dalian Aerbin [2] = 19 [3             ] = [+] = Array ([0] = [1] = Shanghai Shen Xin [2] = 19            [3] = [+] = Array ([0] = [1] = Henan Jianye [2] = [3] = + [+] = = Array ([0] = [1] = = Liaoning Hongyun [2] =>            [3] [+] [+] [+] = Array ([0] = [1] = = Harbin Yi Teng [2] = [3] = 12))


Shooter List
Array ([0] = = Array ([0] = [1] = Elksen [2] = 17 [3             ] = Guangzhou Evergrande) [1] = = Array ([0] = [1] = = Hammed [2] = 16            [3] = Guangzhou freego) [2] = = Array ([0] = [1] = Haisen [2] = [3] = Shanghai East Asia) [3] = = Array ([0] = [1] = =            Damrey [2] = 9 [3] = Guangzhou freego) [4] = = Array ([0] = 04  [1] [+] = [2] = 9 [3] = + harbin Yi Teng) [5] = = Array ([0] = =            [1] = Lowe [2] = 9 [3] = Shandong Luneng) [6] = = Array (        [0] = [1] + [2] = 9 [3] = Hangzhou Greentown) [7] = = Array ([0] = = [1] = = Dejan [2] = 8 [3] = = Beijing Guoan) [8] = = Array (        [0] = [1] = Bathala [2] = 7 [3] = = Beijing Guoan) [9] = = Array ([0] = [1] = Bruno [2] = 7 [3] = = Dalian Aerbin) [10] =        > Array ([0] = [1] = Ricardo [2] = 7 [3] = Harbin Yi Teng ) [one] = = Array ([0] = [1] = Zhao Dongquan [2] = = 7 [3] =            > Shanghai East Asia) [+] = Array ([0] = [1] = = Eno [2] = 7 [3] = Changchun Yatai) [+] = Array ([0] = [1] = Yuri [ 2] + + 7 [3] = Guizhou Maotai) [+] = Array ([0] = [1] = +- Connaught [2] = 6            [2] = 6 [3] = Guangzhou Evergrande))
  • Contact Us

    The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

    If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

    A Free Trial That Lets You Build Big!

    Start building with 50+ products and up to 12 months usage for Elastic Compute Service

    • Sales Support

      1 on 1 presale consultation

    • After-Sales Support

      24/7 Technical Support 6 Free Tickets per Quarter Faster Response

    • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.