PHP Synchronous large collection, super difficult problem, novice do not enter, thank you

Source: Internet
Author: User
1, the program is similar to Myip.cn/wanben.net, his station can be in 3 seconds to collect all the information wanben.net.

2, I can also complete the wanben.net through PHP information, but the speed is too slow, such as
Capture website Title
Collect Alexa Information
Collect Domain name information
Collect server information,

I write the program through PHP, to execute all the code in sequence. So it's a long time, it takes about 15 seconds to complete the acquisition.

and myip.cn collect the same information above, spents 3 seconds or so.


Master please answer, is to use Php+ajax or PHP synchronous bulk acquisition done, please give the principle

My station Wanben.net collection is all I write, now it is necessary to achieve high-volume fast acquisition and return value.




Reply to discussion (solution)

1, the program is similar to Myip.cn/wanben.net, his station can be in 3 seconds to collect all the information wanben.net.

2, I can also complete the wanben.net through PHP information, but the speed is too slow, such as
Capture website Title
Collect Alexa Information
Collect Domain name information
Collect server information,

I write the program through PHP, to execute all the code in sequence. So it's a long time, it takes about 15 seconds to complete the acquisition.

and Myip ...

Data acquisition recommendations use Curl to complete


PHP's Curl Library features: Crawl Web pages, post data, and more


This article introduces several ways to use the Curl Library in PHP. Curl is a powerful PHP library that can be used to get web content, get Web page content, take an XML file and import it into a database, and more.
Using the PHP Curl Library, you can easily and effectively grab pages. You just need to run a script, then analyze the page you crawled, and then you can get the data you want in a program. Whether you want to take part of the data from a link, or take an XML file and import it into the database, the fear is simply to get the Web content, CURL is a powerful PHP library. This article mainly describes if you use this PHP library.

Enable CURL settings

First, we have to make sure that our PHP opens the library, and you can get this information by using the Php_info () function.

?? Phpphpinfo ();??




If you can see the following output on a webpage, it means that the Curl Library is turned on.

If you see, then you need to set up your PHP and open the library. If you are under the Windows platform, then very simple, you need to change the settings of your php.ini file, find Php_curl.dll, and cancel the previous semicolon comment on the line. As shown below:

Cancel the comment under the Extension=php_curl.dll



If you are under Linux, then, you need to recompile your PHP, editing, you need to open the compilation parameters?? Add the "? With-curl" parameter to the Configure command.

A small example

If everything is ready, here is a small routine:

?? Php
Initialize a CURL object
$curl = Curl_init ();

Set the URL you need to crawl
curl_setopt ($curl, Curlopt_url, ' http://cocre.com ');

Set Header
curl_setopt ($curl, Curlopt_header, 1);

Sets the curl parameter, which requires the result to be saved to a string or output to the screen.
curl_setopt ($curl, Curlopt_returntransfer, 1);

Run Curl, request a Web page
$data = curl_exec ($curl);

Close URL Request
Curl_close ($curl);

Show the data obtained
Var_dump ($data);


How to post data

The above is the code to crawl the Web page, the following is the post data to a page. Suppose we have a URL http://www.example.com/sendSMS.php that handles the form, which can accept two form fields, one is the phone number, and the other is the message content.

?? Php$phonenumber = ' 13912345678 '; $message = ' This message is generated by curl and php '; $curlPost = ' pnumber= '. UrlEncode ($phoneNumber). ' &message= '. UrlEncode ($message). ' &submit=send '; $ch = Curl_init (); curl_setopt ($ch, Curlopt_url, ' http://www.example.com/sendSMS.php '); curl_ Setopt ($ch, Curlopt_header, 1); curl_setopt ($ch, Curlopt_returntransfer, 1); curl_setopt ($ch, Curlopt_post, 1); Curl_ Setopt ($ch, Curlopt_postfields, $curlPost); $data = Curl_exec (); Curl_close ($ch);??



From the above program we can see that using Curlopt_post to set the HTTP protocol post method instead of the Get method, and then set the post data to Curlopt_postfields.

About proxy servers

The following is an example of how to use a proxy server. Please note that the highlighted code, the code is very simple, I will not have to say more.

?? PHP $ch = Curl_init (); curl_setopt ($ch, Curlopt_url, ' http://www.example.com '); curl_setopt ($ch, Curlopt_header, 1); curl_setopt ($ch, Curlopt_returntransfer, 1); curl_setopt ($ch, Curlopt_httpproxytunnel, 1); curl_setopt ($ch, CURLOPT_ PROXY, ' fakeproxy.com:1080 '); curl_setopt ($ch, curlopt_proxyuserpwd, ' User:password '); $data = Curl_exec (); curl_close ($ch);??


About SSL and cookies

About SSL is the HTTPS protocol, you just need to curlopt_url the connection of http://Into https://. Of course, there is also a parameter called Curlopt_ssl_verifyhost that can be set to verify the site.

For cookies, you need to understand the following three parameters:

Curlopt_cookie, set a COOKIE in the face of the conversation

Curlopt_cookiejar, save a cookie when the session ends

The Curlopt_cookiefile,cookie file.

HTTP Server Authentication

Finally, let's look at the HTTP server Authentication scenario.

?? Php
$ch = Curl_init ();
curl_setopt ($ch, Curlopt_url, ' http://www.example.com ');
curl_setopt ($ch, Curlopt_returntransfer, 1);
curl_setopt ($ch, Curlopt_httpauth, Curlauth_basic);
curl_setopt (Curlopt_userpwd, ' [Username]:[password] ')

$data = Curl_exec ();
Curl_close ($ch);
??


For additional information, please refer to the relevant Curl manual.

Official Curl Chinese Document:

curl_setopt

Ask the landlord, if the network environment is not good, or the amount of Web site data, or PHP program structure to perform a comparison of time, PHP implementation timeout?

Quote Landlord Mywaster Reply:
1, the program is similar to Myip.cn/wanben.net, his station can be in 3 seconds to collect all the information wanben.net.

2, I can also complete the wanben.net through PHP information, but the speed is too slow, such as
Capture website Title
Collect Alexa Information
Collect Domain name information
Collect server information,

I write the program through PHP, to execute all the code in sequence. So the time is very long, all the acquisition completed to 15 seconds or so ...

I just forgot to say that I am using PHP curl to write the collection, please answer according to my question


Capture website Title
Collect Alexa Information
Collect Domain name information
Collecting server information

What do you want to do at the same time??????????

Have the ability to write their own extensions chant, multi-threading, cloud computing

Reference 1 Floor Skyaspnet's reply:
Quote Landlord Mywaster Reply:
1, the program is similar to Myip.cn/wanben.net, his station can be in 3 seconds to collect all the information wanben.net.

2, I can also complete the wanben.net through PHP information, but the speed is too slow, such as
Capture website Title
Collect Alexa Information
Collect Domain name information
Collect server information,

I write the program through PHP, to order execution ...



Capture website Title
This is simple and can be removed directly after the data has been obtained.

Collect Alexa Information
This needs to send a query to Alexa to get the data, suggestions and get the title of the operation separately.

Collect Domain name information
This step also needs to send a query command, the proposal is also separate to do

Collecting server information
Server information can only be obtained in a very small part, such as some header information, most of which are not available


To speed up, it is recommended that a series of operations be linked in a certain way, and then executed in a step, so that the speed will be significantly improved

I tested it.
Retrieving wanben.com on myip.cn for the first time takes 16 seconds (of course, my computer shows a bit slow)
Retrieved on myip.cn again for just two seconds

So you can be sure that myip.cn uses the cache, and the results you see later are paged out from the cache, so soon.

I solved it myself.

Thank you guys for your way of thinking, look at my space.
Http://hi.baidu.com/dalianufo/blog/item/c70ef1d9a1a92a3f10df9b0a.html

Http://www.chaiba.com is fast, similar in principle.

Http://www.chayiba.com this

What if there are child links under the URL? For example, http://news.ifeng.com/mil/, the following sub-column, sub-link, how to do it?

  • Contact Us

    The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

    If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

    A Free Trial That Lets You Build Big!

    Start building with 50+ products and up to 12 months usage for Elastic Compute Service

    • Sales Support

      1 on 1 presale consultation

    • After-Sales Support

      24/7 Technical Support 6 Free Tickets per Quarter Faster Response

    • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.