PHP script to play the role of reptiles, may be the first time you think it might be a regular, personal rules of the regular can not remember, it is difficult to start, today there is a need to crawl a site some store information
Inadvertently on the Internet to see a better class library called: Simple_html_dom
Github:https://github.com/samacs/simple_html_dom
The most important step: you need to understand the structure of other people's website, and know which tab to start is the data you want
Below is a demonstration of the process.
The implementation process I took three steps.
1, the store information latitude and longitude, name and other important information first inserted into the local surface
[PHP]View PlainCopy
- Set_time_limit (0);
- $host = ' 127.0.0.1 ';
- $user = ' root ';
- $user _pwd = ";
- $database = ' Dataname ';
- $conn = mysql_connect ($host,$user,$user _pwd) or Die (' SSS ');
- mysql_select_db ($database,$conn) or Die (' dddd ');
- mysql_query (' Set names UTF8 ');
- Include ('./simple_html_dom-master/simple_html_dom.php ');
- $url = ' site URL to crawl ';
- $html = file_get_html ($url);
- $n = 1;
- foreach ($html->find (' li[data-counts=0] ') as $e) {
- $storeid = $e->storeid;
- $star = $e->level. 0 ';
- $work _time = $e->time;
- $mapx = $e->mapx;
- $mapy = $e->mapy;
- $nickname = $e->mapname;
- $mapadd = $e->mapadd;
- $maptel = $e->maptel;
- $time = date (' y-m-d h:i:s ');
- $query = "INSERT into ' store ' (' StoreID ', ' star ', ' Work_time ', ' longitude ', ' latitude ', ' create_time ', ' nickname ', ' Address ', ' tel ')
- VALUES ($storeid,' ". $star." ',' ". $work _time." ', ' ".$mapx." ', ' ". $mapy." ', ' ". $time." ', ' " ". $nickname." ',' ". $mapadd." ',' ". $maptel." ') ";
- $res = mysql_query ($query);
- //echo $query; exit ();
- if ($res) {
- echo ' successfully imported into the section '. $n.' stores <br> ';
- $n + +;
- }else{
- Die (' failure <br> ');
- }
- }
2, jump into the site of another page to get store logo image
[PHP]View PlainCopy
- $query = "Select StoreID from store ORDER by id DESC";
- $row = mysql_query ($query);
- while ($rows = mysql_fetch_array ($row)) {
- $url = ' http://Others ' site domain name/'. $rows [' StoreID ']. Jhtml ';
- $html = file_get_html ($url);
- foreach ($html->find (' div.onlyonepic ') as $e) {
- //Get the SRC attribute of img
- $img = $e->firstchild ()->src;
- //Save remote picture to local
- $content = file_get_contents ($img);
- file_put_contents ('./store/'). $rows [' StoreID ']. JPEG ', $content);
- }
- }
3, update the table of the corresponding Store logo field
[PHP]View PlainCopy
- $query = "Select StoreID from store ORDER by id DESC";
- $row = mysql_query ($query);
- $n = 1;
- while ($rows = mysql_fetch_array ($row)) {
- $img = "https://my own site domain name/". $rows [' StoreID '].". JPEG ";
- $sql = "UPDATE store set Img_url= '". $img. "' Where storeid=." $rows [' StoreID '];
- $res = mysql_query ($sql);
- if ($res) {
- echo ' successfully updated section '. $n.' stores <br> ';
- $n + +;
- }else{
- echo ' failure ';
- }
- }
OK, the function is realized, but there is no more in-depth understanding of the other functions of this class library, here is just a record, convenient for later when needed to use
PHP uses the Simple_html_dom class to get page content and act as a crawler