Baidu AI Open Platform -API actual Call
First, Preface
First, the project needs.
Two users, respectively uploaded two paragraphs of different text, to calculate how much similarity between two paragraphs, matching the data in the database, the initial estimate will have 60-100 or so, not more, only less. The final requirement is to find those items with higher similarity from these matching results.
It is a big project to write your own algorithm, which involves some aspects of natural language processing, which is more complicated. So the internet search, found that the natural language processing Baidu open platform can be free to call, and every day there are 10W calls, for my small project just meet. However, in the downward turn, found that Baidu gave hints, do not guarantee concurrency, that is, in my call when it is easy to return error results, this part of the need for appropriate processing.
Since it's actual combat, we start from the beginning.
Second, Preparatory work
From where to start to say, from the Baidu open platform to create an application. After the app is created (the following development environment is PHP, so when you choose to use the type of application to fill in the HTML), there will be application Id,key,secret. The latter two parameters will be useful next.
I'm using the short text similarity API, and the other types of functionality are basically exactly the same (don't accept contradicting). First look at the development document Http://ai.baidu.com/docs#/NLP-API/top you can see that two invocation methods are available, where we use the invocation method as an example- to the API service address POST send the request in this way to implement the above functions. Look at the description, need a parameter called access_token, The original text also provides its acquisition way. To obtain the short-text similarity of Access_token as an example:
Https://aip.baidubce.com/oauth/2.0/token?grant_type=client_credentials&client_id=mzrn******txgske3qrf5yj69 &client_secret=a30CAbc*****bDuuGLdHLeyRaZk1tq5&
There are altogether three parameters involved, the first one fixed to client_credentials do not change.
The second third, respectively, is the Key,secret acquired by the front.
Put this string address in the Address bar, enter to return the JSON-formatted string, find access_token copy exists in a file, reserved.
Note: When copying an address, there may be spaces in the middle, so be sure to delete it. In particular, when copying the code in the development document, there is no return result with spaces. I can copy this section of my address without a problem.
Third, Hands
Start writing PHP code below. All we need is a file.
PHP can use Curl to request URL parameters. In the development documentation, it is explained that you want to pass the parameter Access_token and that the request text is passed in JSON format (the encoding is GBK). Here's the code:
$access _token = "24.a810b4be2b5683a4d6af2f47b420877f.2592000.1507883636.282335-10044457"; $url $access _token ; $body=array( "text_1" = "I saw a quilt at the door of the second house B, it should be a classmate who forgot to take away, remember to take oh.") ", " text_2 "+" information a bicycle at the door, yellow, unlocked, please the owner to claim. "); $json _data=json_encode ($body);
This code implements the above functions in a timely manner. When converting $body to JSON format, the Chinese has been transferred from UTF8 to GBK by default, without additional action.
$curl=Curl_init (); curl_setopt ($curl, Curlopt_url,$url); curl_setopt ($curl, Curlopt_returntransfer,true); curl_setopt ($curl, Curlopt_post,true); curl_setopt ($curl, Curlopt_postfields,$dataArray[$i]); curl_setopt ($curl, Curlopt_ssl_verifypeer,false);//Canceling SSL authentication
Finally this cancellation SSL authentication is necessary, otherwise it will be an error!
$result = curl_exec ($ch); // Var_dump ($result); $json Iconv $result); // The return format is Chinese GBK encoded and needs to be converted to UTF8
Output $json You can see the returned results.
But the above process only achieves a record of similarity matching, how to achieve multiple? Use loops??? No!no!no! to test it himself. What happens if you request 10 URLs at a time. Well, the answer is: The result is returned correctly. However, the return time is very slow because 10 requests are executed sequentially and not in parallel. So the next problem is to solve the parallel problem.
Four, Solve parallel problems and QPS concurrency problems
The bad news that everyone knows is that PHP itself does not support multithreading. There is wood to want to die feeling??
Again, the good news is that curl can process multiple URL requests in parallel to simulate multi-threaded, which is very good, PA! 30 URL requests are sent simultaneously, and the final time depends on the slowest request. But the results are great too.
Here is the first code (I made 20 data (the text content is the same, in fact, it does not matter)):
<?PHPHeader(' Content-type:text/html;charset=utf8 ');$localtime=Date(' Y-m-d h:i:s ', Time());Echo"Start time:".$localtime;$access _token= "24.a810b4be2b5*******************507883636.282335-10044457";$url= "https://aip.baidubce.com/rpc/2.0/nlp/v2/simnet?access_token=".$access _token;$body=Array( "Text_1" = "I saw a quilt at the door of the second house B, should be which classmate forgot to take away, remember to take oh." "," text_2 "+" information a bicycle at the door, yellow, unlocked, please the owner to claim. ");$json _data=json_encode ($body);$dataArray=Array(); for($i= 0;$i<160;$i++){ Array_push($dataArray,$json _data);}$jsonResultArray=Array(); Mfunction ($url,$dataArray,$jsonResultArray);/*$jsonResultArray =func ($url, $json _data);//Store the returned JSON array*/functionMfunction ($url,$dataArray,&$jsonResultArray){ $multicurl=Curl_multi_init (); $curls=Array();//Storage of all CH objects for($i= 0;$i<Count($dataArray);$i++){ $curl=Curl_init (); curl_setopt ($curl, Curlopt_url,$url); curl_setopt ($curl, Curlopt_returntransfer,true); curl_setopt ($curl, Curlopt_post,true); curl_setopt ($curl, Curlopt_postfields,$dataArray[$i]); curl_setopt ($curl, Curlopt_ssl_verifypeer,false);//Canceling SSL authenticationCurl_multi_add_handle ($multicurl,$curl); Array_push($curls,$curl); }/*$running = null; do {$MRC = Curl_multi_exec ($multicurl, $running); } while ($MRC = = Curlm_call_multi_perform); while ($running && $MRC = = CURLM_OK) {if (Curl_multi_select ($multicurl)! =-1) {//$MH batch also has an executable $ch handle, Curl_ Multi_select ($MH)! =-1 The program exits the blocking state. Do {//continues execution of the $ch handle that needs to be processed. $MRC = Curl_multi_exec ($multicurl, $running); } while ($MRC = = Curlm_call_multi_perform); } }*/ $running=NULL;//executing a batch handle Do { Usleep(10000); Curl_multi_exec ($multicurl,$running); } while($running> 0); $failArray=Array(); for($i= 0;$i<Count($dataArray);$i++){ $temp=Iconv("gb2312", "UTF-8", Curl_multi_getcontent ($curls[$i]));//get the JSON format string that returns the result $resultarray=json_decode ($temp);//get an array if(array_key_exists("Error_msg",$resultarray)){//The request is resent and the resulting result is assigned to TEMP Array_push($failArray,$dataArray[$i]); Curl_multi_remove_handle ($multicurl,$curls[$i]); }Else{ Array_push($jsonResultArray,$temp); Curl_multi_remove_handle ($multicurl,$curls[$i]); } } if($failArray!=NULL){//if the $failarray array is not empty, continue calling func ()Curl_multi_close ($multicurl); Mfunction ($url,$failArray,$jsonResultArray);//$url, $json _data,& $jsonResultArray }Else{//if the $failarray array is empty, return exits the function. Curl_multi_close ($multicurl); return; }} for($i= 0;$i<Count($jsonResultArray);$i++){ Var_dump($jsonResultArray[$i]);}$localtime=Date(' Y-m-d h:i:s ', Time());Echo"End Time:".$localtime;
The above code, yes, all the code is posted, comments can also be seen. Together, the code solves all the problems associated with concurrency.
Curl_multi_init (); The use of not much to say, surfing the internet everywhere is, you can go to the novice tutorial to see. Let's talk about the problem of dealing with QPS.
Because the parallel commit is too fast and the server is prone to a QPS limit, an error code is returned:
- ' {' error_msg ': ' Open API QPS request limit reached ', ' error_code ': ' (length=66)
There is no good solution, to be free to use, my solution is to detect the return results of the existence of error_msg as long as it exists, you have to resend the request. The returned result is normal, and the returned result is in the existing group of people. In the Process resend Request section, the function iteration is used until the call ends without an error message. Eventually all the correct results exist in the array.
Five, Test Results
Tested By:
The test data entry and the corresponding response time are acceptable from the results.
/**/
The article was late last night to write, did not expect the school suddenly cut off the network power, power is not terrible, the important thing is the electricity is gone, mobile phone network also disappeared ... Disappear...... Lost......
Get up and re-send in the morning
Baidu AI open platform-API combat call