Similar to the previous phenomenon: the data volume is normal in an hour, and monitord does not respond if it is a little larger.
Specific tracking found the following phenomena:
(1) MonitorServer sends a request to monitord. Everything is normal, but no response is received from monitord;
(2) monitord receives and parses the request, and everything is normal;
(3) monitord executes the request, and everything is normal;
(4) Last step: monitord updates the configuration file and times out.
Crash...
Will file writing times out? What a huge amount of data is there?
To put it bluntly, read the code and visit the server:
1 static int set_threshold_cfg() 2 { 3 ... 4 5 int i=0; 6 for(;i<num; i++) 7 { 8 ConfigSetKey(); 9 }10 ...11 }12 13 int ConfigSetKey(void *CFG_file, void *section, void *key, void *buf)14 { 15 FILE *fp1, *fp2; 16 char buf1[MAX_CFG_BUF + 1]; 17 int line_no, line_no1, n, ret, ret2; 18 char *tmpfname; 19 20 ret = ConfigGetKey(CFG_file, section, key, buf1); 21 if(ret <= CFG_ERR && ret != CFG_ERR_OPEN_FILE) return ret; 22 if(ret == CFG_ERR_OPEN_FILE || ret == CFG_SECTION_NOT_FOUND) 23 { 24 25 if((fp1 = fopen((char *)CFG_file, "a")) == NULL) 26 27 return CFG_ERR_CREATE_FILE; 28 29 if(fprintf(fp1, "%c%s%c\n", CFG_ssl, section, CFG_ssr) == EOF) 30 { 31 fclose(fp1); 32 return CFG_ERR_WRITE_FILE; 33 } 34 if(fprintf(fp1, "%s=%s\n", key, buf) == EOF) 35 { 36 fclose(fp1); 37 return CFG_ERR_WRITE_FILE; 38 } 39 fclose(fp1); 40 return CFG_OK; 41 } 42 if((tmpfname = tmpnam(NULL)) == NULL)43 {44 return CFG_ERR_CREATE_FILE; 45 }46 if((fp2 = fopen(tmpfname, "w")) == NULL)47 return CFG_ERR_CREATE_FILE; 48 ret2 = CFG_ERR_OPEN_FILE; 49 50 if((fp1 = fopen((char *)CFG_file, "rb")) == NULL) goto w_cfg_end; 51 52 if(ret == CFG_KEY_NOT_FOUND) 53 line_no1 = CFG_section_line_no; 54 else /* ret = CFG_OK */ 55 line_no1 = CFG_key_line_no - 1; 56 for(line_no = 0; line_no < line_no1; line_no++) 57 { 58 ret2 = CFG_ERR_READ_FILE; 59 n = FileGetLine(fp1, buf1, MAX_CFG_BUF); 60 if(n < 0) goto w_cfg_end; 61 ret2 = CFG_ERR_WRITE_FILE; 62 if(fprintf(fp2, "%s\n", buf1) == EOF) goto w_cfg_end; 63 } 64 if(ret != CFG_KEY_NOT_FOUND) 65 for( ; line_no < line_no1+CFG_key_lines; line_no++) 66 { 67 ret2 = CFG_ERR_READ_FILE; 68 n = FileGetLine(fp1, buf1, MAX_CFG_BUF); 69 if(n < 0) goto w_cfg_end; 70 } 71 ret2 = CFG_ERR_WRITE_FILE; 72 if(fprintf(fp2, "%s=%s\n", key, buf) == EOF) goto w_cfg_end; 73 while(1) 74 { 75 ret2 = CFG_ERR_READ_FILE; 76 n = FileGetLine(fp1, buf1, MAX_CFG_BUF); 77 if(n < -1) goto w_cfg_end; 78 if(n < 0) break; 79 ret2 = CFG_ERR_WRITE_FILE; 80 if(fprintf(fp2, "%s\n", buf1) == EOF) goto w_cfg_end; 81 } 82 ret2 = CFG_OK; 83 w_cfg_end: 84 if(fp1 != NULL) fclose(fp1); 85 if(fp2 != NULL) fclose(fp2); 86 if(ret2 == CFG_OK) 87 { 88 ret = FileCopy(tmpfname, CFG_file); 89 if(ret != 0) return CFG_ERR_CREATE_FILE; 90 } 91 remove(tmpfname); 92 return ret2; 93 }
Each time a configuration is written, the "open/close file" operation is executed. In this case, tens of thousands of open/close file operations are executed for each request. It is not surprising that the operation times out.
Find the root cause of the problem and solve it well: rewrite the function of writing the configuration file, and only open/close the file once. The specific process is omitted.