PHP requests millions of curl requests per hour to the server for data loss

Source: Internet
Author: User
Tags php error
Directly describe the scenario in detail: server A forwards data to server B through curl, and server A forwards millions of data entries per hour. Curl requests a 0-byte file on server B, as long as the request is recorded in the nginx log of server B. Now we can directly describe the scenario in detail through two:

Server A will passcurlData is forwarded to server B, and server A forwards millions of data entries per hour.
curlThe request is0-Byte FileYou only need to record the request in the nginx log of server B.
By comparing the data on both sides, we can find that thousands of data entries are lost in a day, and 100 or 200 data entries are lost in an hour.
Servercurl_errno()And even addedcurl_getinfo()No curl error is reported if the returned result is 200.
For this part of the lost curl request, I would like to ask you if you have any good analysis ideas and experience to learn from.

Supplement: Intranet dns resolution is used between servers to eliminate dns resolution timeout.

Thank you!

Reply content: directly describe the scenario in detail:

Server A will passcurlData is forwarded to server B, and server A forwards millions of data entries per hour.
curlThe request is0-Byte FileYou only need to record the request in the nginx log of server B.
By comparing the data on both sides, we can find that thousands of data entries are lost in a day, and 100 or 200 data entries are lost in an hour.
Servercurl_errno()And even addedcurl_getinfo()No curl error is reported if the returned result is 200.
For this part of the lost curl request, I would like to ask you if you have any good analysis ideas and experience to learn from.

Supplement: Intranet dns resolution is used between servers to eliminate dns resolution timeout.

Thank you!

Merging requests is A good idea. If too many concurrent requests are processed, not only is server B, but server A cannot handle the requests as expected.

Alternatively, you can use a queue for processing, and the number of concurrent requests per second can be controlled by one magnitude to request server B.

  • First of all, you can't "Guess" the answer without sticking to the Code, so that a simple description can replace the logic of the Code. The person who writes the code deducts the difference between 1 and 0, there is no way to guess by description.

  • For any request method that requires precision (such as your simple count), it is best not to use Curl directly. The uncertainty of the network is one point, if you want to send data to another host, at least the sender must confirm that the recipient has received the message, right? From the perspective of the design program, what is the difference between placing an empty file and inserting information into the database without confirmation of success?

  • I don't understand the PHP source code. with limited knowledge, I guess that millions of requests are about 280qps. You only told me that server A is A forwarding server, but it does not indicate whether server A is multi-thread or single-thread, whether it is executed in apache module or cgi Mode, and how many concurrent threads, I can only tell you, under any number of concurrent threads, the probability of PHP errors is all there. As the number of concurrency increases, errors 3xx, 4xx, and 5xx will also occur when the program has no bugs. This is a normal response when the web server load is too large, my intuition is that you didn't make accurate statistics on server A. I mistakenly thought that all requests of server A were successfully executed. This part of the lost request is not a CURL error or a PHP error, so you cannot use the CURL function to handle it.

If you need to use the current method, we recommend that you use the B interface to return the success code and Failure code. Machine A uses the returned code to perform logging.

If I'm wrong, you still need to improve the number of requests and http requests (I don't know why, intuition is unreliable, even if you change B to a MySQL-only memory table, it will be more reliable than this method ). There are two other ideas: 1. Enable the multi-threading of CURL (CURL is not recommended), or simply add a multi-threaded extension to PHP to run PHP as a daemon, eliminate the concurrency bottleneck of web servers! 2. Data is segmented. This is very important. This kind of log-like stuff is not so real-time-consuming. can I collect 10 thousand million logs and send them once on machine? I once captured Twitter data and sent it from Washington to Beijing in this package.

I have done some experiments before. I have concurrency of about 50 requests at a time, which is relatively stable. When the request is lost, the request is lost. I don't know why.

Why not use server A to write the logs required by server B to A certain corner of server A's disk, and then every minute?curlServer B allows server B to capture the data records of server A in one minute, and then write the data to server B.nginxLog set?

In this way, the request is sent 60 times per hour, and even the communication between the two sides can be re-transmitted, greatly reducing the network packet loss in high concurrency. The specific communication interval can be determined based on the amount of data transmitted each time, for example, oncecurlOnly 1 MB of content is transmitted to ensure normal communication.

Only one idea is provided. It takes time to write such scripts.

If you only record the request log, we recommend that you use UDP for transmission.

Only about 300 requests per second.
Nginx cannot process requests per second.

Therefore, the problem must be curl. Although curl is well-known, it is really not easy to use.

We recommend that you splice the HTTP request yourself.

Why do we need curl? If you only want to leave logs. Set a location on nginx of. Proxy to B. The location from script src to a on the page

Dare to ask the question: How does one implement the log analysis and Database Import after such hitting between your servers? In addition, I think the loss of data will certainly happen, and millions of entries will not be lost, almost all of them are scientific research rather than engineering. Like the subject, I need such hitting and writing data to mysql. Can you tell me your current architecture?

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.