Small tragedy caused by big data (1)

Source: Internet
Author: User

A few days ago, MonitorServer was reported to be unable to work at the customer's site, so it immediately followed up.

The task of this function is to receive configuration parameters from the upper-layer (BS system) and forward the configuration parameters to the specified lower-layer system (some of which are sent to embedded devices) According to system running conditions, some are sent to other programs ).

 

Before testing locally, everything is OK. Why is it not possible at the customer's site?

So we did two tests:

(1) Use local data to retest and the result is normal.

(2) The involved customer's on-site data is imported back, and the test fails to work properly.

 

Tracking found that the problem occurs when the MonitorServer forwards parameters to the lower layer (another program monitord) and does not receive the response returned by monitord, resulting in forwarding failure.

So I checked the monitord program and found that it crashed. Of course, I would not send a response to the MonitorServer.

The question is, why does monitod crash? All previous tests passed.

 

After analyzing the data, we found that:The data forwarded by the customer on site is far greater than the data during local testing.

Analyze the messages sent by the MonitorServer and the messages received by monitord as follows:

1 // MonitorServer sends Code 2 3 esmonitor_assist_t m_monitord_prms; 4 5... 6 7 tmp = sock. createSock (pConf-> m_monitordServerList [I]. c_str (), ES_MONITOR_PORT, IPPROTO_TCP, CLIENT); 8 9... 10 11 if (sock. sendData (_ int8 *) & m_monitord_prms, sizeof (m_monitord_prms) <0) 12 {13 printf ("setAlarmPrmTask: Send2Monitord (). sendData failed. \ n "); 14 sock. closeHandle (); 15 nRet =-1; 16 continue; 17} 18 19 esmonitor_assist_resp_t resp; 20 if (sock. receiveData (_ int8 *) & resp, sizeof (resp) <0) 21 {22 printf ("setAlarmPrmTask: Send2Monitord (). receiveData failed. \ n "); 23 sock. closeHandle (); 24 nRet =-1; 25 continue; 26} 27 28 29 // The following is the definition of the sent struct 30 31 # define max_0000_num 102432 typedef struct _ esmonitor_0000_t33 {34 _ esmonitor_0000_t () 35 {36 memset (& header, 0, sizeof (header_t); 37 I _1__num = 0; 38 memset (& threshold, 0, sizeof (threshold_t) * max_1__num); 39 40 header. I _sync = htonl (0x12345678); 41 header. I _vession = htonl (0x1); 42 header. I _type = htonl (bytes); 43} 44 45 header_t header; 46 uint32_t I _1__num; 47 threshold_t threshold [max_1__num]; 48} esmonitor_1__t;

 

Monitord receives the Code:

 1 static uint8_t p_recv_buf[1500]; 2 while (1) 3 { 4     sock_accept = (SOCKET)accept(sock, (SOCKADDR*)&addr_from, &i_len); 5     if (sock_accept > 0) 6     { 7         i_recv_size = recv(sock_accept, &p_recv_buf, sizeof(p_recv_buf), 0); 8         if (i_recv_size > 0) 9         {10             p_header = p_recv_buf;11             p_header->i_type = ntohl(p_header->i_type);12 13             ...14         }    15     }16 }

 

No problems found?

The receiving buffer of monitord is only 1500 bytes, while the struct sent by MonitorServer far exceeds it! The size of sizeof (m_monitord_prms) exceeds 6000 bytes!

Why does the crash occur when the data volume is small and the data volume is large?

 

Let's analyze it.

First, in the Structure esmonitor_pai_t defined by the sender, the size of the first few fields is fixed, followed by 1024 arrays (each array stores a set of configuration parameters ), use the I _number_num field to specify the actual number of valid arrays. In this way, the number of bytes sent each time is sizeof (m_monitord_prms), that is, about 6000 bytes (assuming 6000 bytes ).

Then, the receiving buffer defined by the acceptor is uint8_t p_recv_buf [1500], that is, 1500 bytes.

In this way, the receiving end can only receive the first 6000 bytes from the user each time.

After monitord receives the 1500 bytes, it performs the following processing:

 

1 for (i = 0; i < p_cfg->i_cfg_num; i++)2 {3     p_threshold = p_cfg->threshold + i;4     p_threshold->i_alarm_delay = ntohl(p_threshold->i_alarm_delay);5     p_threshold->i_alarm_id = ntohl(p_threshold->i_alarm_id);6 ...7 }

 

When the I _1__num specified by the sender is relatively small, although the user only receives part of the data, monitord does not access the lost data.

Once the data indicated by I _1__num is not in the received 1500 bytes, p_threshold will cause an array out of bounds, resulting in a dangerous "Wild Pointer", resulting in a program crash.

 

After finding out the cause, the problem is well solved: increasing the receiving buffer of monitord is at least not smaller than the size of the sending structure.

 

----------------------------------------------------------------------------------

Ps: MonitorServer and monitord are different people responsible for this problem. They have not been coordinated before.

I couldn't help but yell at me: my father...

 

 

 

 

 

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.