Small tragedy caused by big data (1)

Last Update:2018-12-05 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

A few days ago, MonitorServer was reported to be unable to work at the customer's site, so it immediately followed up.

The task of this function is to receive configuration parameters from the upper-layer (BS system) and forward the configuration parameters to the specified lower-layer system (some of which are sent to embedded devices) According to system running conditions, some are sent to other programs ).

Before testing locally, everything is OK. Why is it not possible at the customer's site?

So we did two tests:

(1) Use local data to retest and the result is normal.

(2) The involved customer's on-site data is imported back, and the test fails to work properly.

Tracking found that the problem occurs when the MonitorServer forwards parameters to the lower layer (another program monitord) and does not receive the response returned by monitord, resulting in forwarding failure.

So I checked the monitord program and found that it crashed. Of course, I would not send a response to the MonitorServer.

The question is, why does monitod crash? All previous tests passed.

After analyzing the data, we found that:The data forwarded by the customer on site is far greater than the data during local testing.

Analyze the messages sent by the MonitorServer and the messages received by monitord as follows:

1 // MonitorServer sends Code 2 3 esmonitor_assist_t m_monitord_prms; 4 5... 6 7 tmp = sock. createSock (pConf-> m_monitordServerList [I]. c_str (), ES_MONITOR_PORT, IPPROTO_TCP, CLIENT); 8 9... 10 11 if (sock. sendData (_ int8 *) & m_monitord_prms, sizeof (m_monitord_prms) <0) 12 {13 printf ("setAlarmPrmTask: Send2Monitord (). sendData failed. \ n "); 14 sock. closeHandle (); 15 nRet =-1; 16 continue; 17} 18 19 esmonitor_assist_resp_t resp; 20 if (sock. receiveData (_ int8 *) & resp, sizeof (resp) <0) 21 {22 printf ("setAlarmPrmTask: Send2Monitord (). receiveData failed. \ n "); 23 sock. closeHandle (); 24 nRet =-1; 25 continue; 26} 27 28 29 // The following is the definition of the sent struct 30 31 # define max_0000_num 102432 typedef struct _ esmonitor_0000_t33 {34 _ esmonitor_0000_t () 35 {36 memset (& header, 0, sizeof (header_t); 37 I _1__num = 0; 38 memset (& threshold, 0, sizeof (threshold_t) * max_1__num); 39 40 header. I _sync = htonl (0x12345678); 41 header. I _vession = htonl (0x1); 42 header. I _type = htonl (bytes); 43} 44 45 header_t header; 46 uint32_t I _1__num; 47 threshold_t threshold [max_1__num]; 48} esmonitor_1__t;

Monitord receives the Code:

 1 static uint8_t p_recv_buf[1500]; 2 while (1) 3 { 4     sock_accept = (SOCKET)accept(sock, (SOCKADDR*)&addr_from, &i_len); 5     if (sock_accept > 0) 6     { 7         i_recv_size = recv(sock_accept, &p_recv_buf, sizeof(p_recv_buf), 0); 8         if (i_recv_size > 0) 9         {10             p_header = p_recv_buf;11             p_header->i_type = ntohl(p_header->i_type);12 13             ...14         }    15     }16 }

No problems found?

The receiving buffer of monitord is only 1500 bytes, while the struct sent by MonitorServer far exceeds it! The size of sizeof (m_monitord_prms) exceeds 6000 bytes!

Why does the crash occur when the data volume is small and the data volume is large?

Let's analyze it.

First, in the Structure esmonitor_pai_t defined by the sender, the size of the first few fields is fixed, followed by 1024 arrays (each array stores a set of configuration parameters ), use the I _number_num field to specify the actual number of valid arrays. In this way, the number of bytes sent each time is sizeof (m_monitord_prms), that is, about 6000 bytes (assuming 6000 bytes ).

Then, the receiving buffer defined by the acceptor is uint8_t p_recv_buf [1500], that is, 1500 bytes.

In this way, the receiving end can only receive the first 6000 bytes from the user each time.

After monitord receives the 1500 bytes, it performs the following processing:

1 for (i = 0; i < p_cfg->i_cfg_num; i++)2 {3     p_threshold = p_cfg->threshold + i;4     p_threshold->i_alarm_delay = ntohl(p_threshold->i_alarm_delay);5     p_threshold->i_alarm_id = ntohl(p_threshold->i_alarm_id);6 ...7 }

When the I _1__num specified by the sender is relatively small, although the user only receives part of the data, monitord does not access the lost data.

Once the data indicated by I _1__num is not in the received 1500 bytes, p_threshold will cause an array out of bounds, resulting in a dangerous "Wild Pointer", resulting in a program crash.

After finding out the cause, the problem is well solved: increasing the receiving buffer of monitord is at least not smaller than the size of the sending structure.

----------------------------------------------------------------------------------

Ps: MonitorServer and monitord are different people responsible for this problem. They have not been coordinated before.

I couldn't help but yell at me: my father...

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Small tragedy caused by big data (1)

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Small tragedy caused by big data (1)

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support