My picture crawl finished, below is I use the Baidu Picture interface code! Very simple Oh ~ ~ ~ Take it, don't thank me ... Ha ha!!! My struggle is to decrypt there, the efficiency may be lower, I am a letter a letter to find! Do not know if there is a faster way, here also ask you, if there is a good way to tell me ha!
/** * through Baidu to search image link, and then download (here first in the local computer test, not put on the server, because my server disk space is too small, because it is a picture, afraid to save) * 1. First take out my vehicle ID and modelname * 2. Use Baidu Search The interface of the cable image gets the picture JSON information * 3. Parse the JSON, traverse the data inside, get Objurl * 4. Decrypt the Objurl * 5. Detect the availability of decrypted Objurl * 6. Get the available Objurl pictures to local */Public function Getimagefrombaidu () {//This is my own business method, you can adjust according to your own needs, but there is also a need to pay attention to the place//turn off the browser does not interrupt the request Ignore_user_abort (TRUE); Set_time_limit (0); $carList = M (' car ')->field (' Id,modelname ')->select (); foreach ($carList as $car) {\think\log::write (' getting '. $car [' modelname ']. ' Image data ', ' WARN '); Request interface, get JSON data $jsonResult = $this->usebaidusearch ($car [' modelname ']); if ($jsonResult = = NULL | | $jsonResult = = ') {\think\log::write ($car [' modelname ']. The JSON gets failed ', ' WARN '); Continue } $jsonResult = Str_replace ("'", ' "', $jsonResult); The JSON data returned must remember to replace the single quotation mark inside with double quotes! PHP4 above for JSON data requirements comparisonStrict
$jsonResult = Preg_replace ('/,\s* ([\]}])/M ', ' $ ', $jsonResult); $jsonArray = Json_decode ($jsonResult, true); Loop inside the data section, because he is the Stdclass type, so the need for foreach ($jsonArray [' data '] as $data) {$OBJURL = $data [' Objurl ']; if (! $objUrl) {\think\log::write (' Objurl not available ', ' WARN '); Continue }//Decrypt the Objurl $OBJURL = $this->decodeobjurl ($OBJURL); Test if the URL is available if ($this->http_status ($OBJURL)) continue; If available, download $result = getImage ($objUrl, ', './uploads/'. $car [' modelname ']. ' /'); if (! $result) {\think\log::write ($objUrl. ' Get failed ', ' WARN '); }} \think\log::write ($car [' modelname ']. ' Image data obtained ', ' WARN '); Sleep (1); }/* * Use the Baidu image search interface to get the image json insideCapacity data * @param string $word search keywords * @return JSON */Public Function usebaidusearch ($word = ") {//This is the real request JSON data, you can transform according to your own language, interface address is BaseURL, inside the $word is the keyword if ($word = = ") return null;
Below this is I use the URL, if you use a bad time, you can refer to my previous article "Looking for Baidu image search interface--one two three", and then find the next $baseUrl = ' http://image.baidu.com/ Search/acjson?tn=resultjson_com&ipn=rj&ct=201326592&fp=result&ie=utf-8&oe=utf-8&word= ' . $word. ' &pn=30&rn=30&gsm=700001e&1457697756442= '; Return file_get_contents ($BASEURL); }
/** * Decrypt objurl * @param string $objUrl to decrypt objurl * @return String decrypted Objurl
* Here the decryption password, from the Baidu JS file, if there are changes, please refer to the previous article, find your own */Public function decodeobjurl ($OBJURL = ") {if ($objUrl = =") Return '; Here the decryption control to Baidu JS inside the quasi-\think\log::write (' ready to decrypt a objurl: '. $objUrl, ' WARN '); $map = Array (' w ' = = ' A ', ' k ' = ' = ' b ', ' V ' = ' = ' c ', ' 1 ' = ' d ', ' j ' = ' e ', ' u ' = ' f ', ' 2 ' + ' g ', ' i ' + = ' h ', ' t ' and ' I ', ' 3 ' = ' j ', ' h ' = ' k ', ' s ' = ' and ' l ', ' 4 ' = ' m ', ' G ' = "n", ' 5 ' = "O", ' r ' = ' P ', ' q ' = ' Q ', ' 6 ' = ' R ', ' f ' = ' = ' s ', ' p ' = ' = ' t ', ' 7 ' =& Gt "U", ' e ' = ' V ', ' o ' = ' w ', ' 8 ' = ' 1 ', ' d ' = ' 2 ', ' n ' = ' 3 ', ' 9 ' = ' 4 ', ' C ' and ' 5 ', ' m ' = ' 6 ', ' 0 ' = ' 7 ', ' B ' = ' 8 ', ' l ' = ' 9 ', ' a ' and ' 0 ', ' _z2c$q ' = ': ', ' _z&e3b ' and '. ', ' azdh3f ' = '/'); $OBJURL = Str_replace (' _z2c$q ', $map [' _z2c$q '], $OBJURL); $OBJURL = Str_rePlace (' _z&e3b ', $map [' _z&e3b '], $OBJURL); $OBJURL = Str_replace (' azdh3f ', $map [' azdh3f '], $OBJURL); for ($i =0; $i
by the way, give you a code to detect pages 403, 404, 500:
/** * Determine if the page is 404\500\403 * @param string $url The page that is about to be requested * @return bool */Public function Http_status ($url) { $ch = Curl_init (); curl_setopt ($ch, Curlopt_url, $url); curl_setopt ($ch, Curlopt_header, 1); curl_setopt ($ch, curlopt_nobody, 1); curl_setopt ($ch, Curlopt_returntransfer, 1); curl_setopt ($ch, Curlopt_timeout, 5); $status = curl_exec ($ch); Curl_close ($ch); $have 404 = Strpos ($status, ' Location:/error/404.html '); $have = Strpos ($status, ' Location:/error/500.html '); $have 403 = Strpos ($status, ' Location:/error/403.html '); if ($have 404 | | $have | | $have 403) { \think\log::write (' ERROR: '. $have 403. ': ' $have 404. ': ' $have ' WARN '); return true; } return false; }
ok! finished work, and finally with the pictures I caught:
The above describes the use of Baidu interface, including the aspects of the content, I hope that the PHP tutorial interested in a friend has helped.