PHP millions per hour of Curl request server lost data

Source: Internet
Author: User

The following scenario is described directly in detail:

Server A will forward the curl amount of millions per hour by forwarding the data to Server B.
curlThe request is a 0-byte file on Server B, as long as the request is recorded in Server B's Nginx log.
Now the data on both sides of the pair, found that the data will be lost thousands of a day, an hour will be lost one hundred or two hundred.
Server A used curl_errno() , even added curl_getinfo() to determine whether the return is 200 is not recorded to curl error
This part of the missing Curl request, want to ask you have any good analytical ideas and experience can learn from.

Supplemental: Intra-server DNS resolution, excluding DNS resolution timeout failures

Ask the gods for a weapon.

Reply content:

The following scenario is described directly in detail:

Server A will forward the curl amount of millions per hour by forwarding the data to Server B.
curlThe request is a 0-byte file on Server B, as long as the request is recorded in Server B's Nginx log.
Now the data on both sides of the pair, found that the data will be lost thousands of a day, an hour will be lost one hundred or two hundred.
Server A used curl_errno() , even added curl_getinfo() to determine whether the return is 200 is not recorded to curl error
This part of the missing Curl request, want to ask you have any good analytical ideas and experience can learn from.

Supplemental: Intra-server DNS resolution, excluding DNS resolution timeout failures

Ask the gods for a weapon.

Merging requests is a good idea, concurrent processing too much, not only the B server, a server can not be expected to deal with.

Or you can use a queue to process, and each second can control a magnitude down request B server.

    • First of all, you have no code to help you "guess" the answer, that the simple description of the text can replace the code logic absolutely not, we write code of the person buckle is 1 and 0 difference, by Description no way to guess.

    • For any request for accuracy requirements (such as your simple count), it is best not to use curl directly, network uncertainty is a bit, if you want to send data to another host, at least the sender to confirm that the other party received the message is not? From the point of view of the design program, what is the difference between putting an empty file and inserting information into the database without confirming success?

    • I do not understand the PHP source, with limited knowledge to guess, millions the number of requests about 280QPS, you only account for a server is a forwarding server, but does not indicate whether a server is multithreaded or single-threaded, is the Apache module or CGI execution, concurrency how much, I can only tell you, In any concurrent quantity, PHP error occurs in the probability of some, with the increase in concurrency, the program without bugs in the case of 3xx, 4xx, 5xx errors will appear, which is the Web server load is too large normal response, my intuition is you do not have a machine accurate statistics, It is wrong to assume that all server A requests have been successfully executed. This part of the missing request is not a curl error, it is a PHP error, so you cannot handle it with the curl function.

If you want to follow the current method, it is recommended that the B-machine interface return the success code and failure code, a machine with the return code to logging.

If I'm wrong, you still have to improve on the number of requests and HTTP requests (don't know why, intuition is not reliable, even if you change B machine to pure MySQL memory table is more than this way). There are two ideas, 1 is to turn on the multithreading of Curl (not recommended with curl), or simply add a multi-threaded PHP extension, PHP to run the daemon, to eliminate the Web server concurrency bottleneck! 2 is the data to block it, it is very important, this kind of log things, real-time is not so high, in a machine to accumulate 10,000 pieces of hair once not? I grabbed Twitter data from Washington and sent it to Beijing.

Previous experiments, a concurrent 50 or so requests, relatively stable. When you get more, you start to lose the request. I don't know why.

Why not consider first using a server to write the log form required by B server to the disk of a server, and then B server every minute, curl let B server catch a server this minute data record, and then write to B server nginx log set?

This request 60 times per hour, even the two sides of the communication can do a failure retransmission, greatly reducing the high concurrency of network packet loss phenomenon. The specific communication interval can be based on the amount of data transmitted per communication, such as controlling curl the transmission of only 1MB of content, in order to ensure the normal completion of communications.

It takes time to write this script, as appropriate, to provide only one idea.

If you simply log the request, it is recommended that you use UDP transfer

Only about 300 requests per second.
Nginx processing 1W Requests per second should not be a problem.

So the problem is definitely on curl. Curl is famous, but it doesn't really do much good.

It is recommended that you spell the HTTP request yourself.

Why should Curl go? If only for the sake of leaving the log. Then open a location on the nginx of a. Proxy to B. Page script src to this location of a

Dare to question the master, your server so after the post-log analysis of how the storage is implemented? In addition, I think the loss of data is bound to happen, millions of not lost is almost scientific research rather than engineering. I need this sort of punch and write to MySQL like the master. Can the Lord tell me about your existing structure?

  • Contact Us

    The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

    If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

    A Free Trial That Lets You Build Big!

    Start building with 50+ products and up to 12 months usage for Elastic Compute Service

    • Sales Support

      1 on 1 presale consultation

    • After-Sales Support

      24/7 Technical Support 6 Free Tickets per Quarter Faster Response

    • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.