Php regular expression crawling webpage content php crawling webpage content optimization
I want to capture the link of the HD video on the youku network and send it to the mobile client. However, the capture time is not ideal (about 50 videos, and it takes more than 6 seconds for the computer to capture and display the video on the webpage, it takes more than 30 seconds to send the message to your mobile phone.) what are the optimization methods?
Reply to discussion (solution)
Change Optical fiber!
The main user wants to capture the link, not the video content. use CURL
The main user wants to capture the link, not the video content. use CURL
I am using a regular expression to capture tags. it takes too long. do you mean you can use curl to capture tags?
Is there any specific idea?
You can post your code to see if there are any optimizations.
You can post your code to see if there are any optimizations.
/I "; // get the video link $ reg2 ="/] *) \ s * class = \ "imgdetail \" \ s * src = ('| \") ([^ '\ "] +) (' | \")/I "; $ reg3 =" "; $ reg4 = "/
.*? <\/P>/I "; // obtain the video Title $ content = file_get_contents ($ url); preg_match_all ($ reg1, $ content, $ matches ); $ video = $ matches [0]; // link to the home video $ resultArray = array (); // array for loading all data // $ subArray = array (); // subarray foreach ($ video as $ key) {// process the url and get the video click url $ position = strpos ($ key, "href "); $ substring = substr ($ key, $ position + 11); $ pos = strpos ($ substring, ">"); $ link = substr ($ substring, 0, $ pos-1); $ nextUrl = $ url. $ link; $ nextContent = file_get_contents ($ nextUrl); // get the video image preg_match_all ($ reg2, $ nextContent, $ img); $ img_arr = $ img [0]; foreach ($ img_arr as $ arr) {$ position = strpos ($ arr, "src"); $ sub = substr ($ arr, $ position + 5 ); $ last = substr ($ sub, 0, $ pos);} // Obtain the VOD address preg_match_all ($ reg3, $ nextContent, $ vids ); $ video_arr = $ vids [0]; $ vid = $ video_arr [0]; $ position = strpos ($ vid, "href"); $ v_string = substr ($ vid, $ position + 11); $ pos = strpos ($ v_string, "\" "); $ add = substr ($ v_string, 0, $ pos); $ video_url = $ url. $ add; // obtain the video title preg_match_all ($ reg4, $ nextContent, $ match); $ title = $ match [0]; $ r = serialize ($ title ); $ position = mb_strpos ($ r ,"
"); $ Sub = substr ($ r, 0, $ position); $ pos = mb_strrpos ($ sub,"> "); $ til = substr ($ sub, $ pos + 1); $ subArray = array ('image' => $ last, 'video' => $ video_url, 'title' => $ til ); array_push ($ resultArray, please subarray?##resultjson=json_encode({resultarray={file_put_contents('web.txt ', print_r ($ resultJson, true ));
The above is all the code ..
Foreach ($ video as $ key)
{
$ NextContent = file_get_contents ($ nextUrl );
...
I want to change the optical fiber. The loop file_get_contents is 6 seconds cheaper.
Foreach ($ video as $ key)
{
$ NextContent = file_get_contents ($ nextUrl );
...
I want to change the optical fiber. The loop file_get_contents is 6 seconds cheaper.
? Silk cannot be changed
Use curl_multi_exec () for concurrent capturing
Use curl_multi_exec () for concurrent capturing
Although it has not been done yet, the great gods have provided ideas and directions for me to learn!