Process crashes due to suspected CPU or memory failure

Source: Internet
Author: User

We have a service running on all the hosts in Microsoft Cloud. A recent discovery of a machine on which the service process continues to crash.
The reason for the crash is access to an invalid pointer, the corresponding code is as follows


SERVICELISTINIBUFFER.APPENDF ("serverlist=%s\r\n",
M_newservicelist.config->getstringparameter ("Manifest", "ServerList", "" "). C_str ());

Found from dump and live repo, invalid pointer from the first parameter of the APPENDF function.
But strangely, this argument is a dead string, how does this happen?

See Assembly code

00007ff6 ' 36961907 4889442420 mov qword ptr [Rsp+20h],rax
00007ff6 ' 3696190c 4c8d0d45270b00 Lea r9,[getandrunservices! ' string ' +0xfd8 (00007FF6 ' 36a14058)]
00007ff6 ' 36961913 4c8d054e270b00 Lea r8,[getandrunservices! ' string ' +0xfe8 (00007FF6 ' 36a14068)]
00007ff6 ' 3696191a 488d5500 Lea RDX,[RBP]
00007ff6 ' 3696191e 488b8f98030000 mov rcx,qword ptr [rdi+398h]
00007ff6 ' 36961925 e8f649f3ff call Getandrunservices!apsdk::configuration::iconfiguration::getstringparameter ( 00007ff6 ' 36896320)
00007ff6 ' 3696192a-NOP
00007ff6 ' 3696192b 4883781810 cmp qword ptr [rax+18h],10h
00007ff6 ' 36961930 7203 JB getandrunservices! Getandrunservices2::generateomnewservicelistfile+0x9e5 (00007FF6 ' 36961935)
00007ff6 ' 36961932 488b00 mov rax,qword ptr [Rax]
00007ff6 ' 36961935 4c8bc0 mov r8,rax
00007ff6 ' 36961938 488d1541270b01 Lea Rdx,[00007ff6 ' 37a14080]
00007ff6 ' 3696193f 488d4d20 Lea rcx,[rbp+20h]
>>>>>>>>>>>>>>>>>>>>00007ff6 ' 36961943 E8981EF3FF call GETANDRUNSERVICES!APSDK::D ynstringt<127>::appendf (00007FF6 ' 368937e0)

The specific crash occurred in APPENDF. However, the problem is caused by invalid pointers and is independent of the APPENDF function. In order to omit space, we skipped over the APPENDF part.

As you can see here, RCX should keep the this pointer of Servicelistinibuffer. Because C defaults from right to left, the RDX should be the first parameter, and the R8 should be the second parameter.

0:000> da R8
00000001 ' 0a3a48c0 "25.66.164.187:25.66.164.4:25.66."
00000001 ' 0a3a48e0 "164.251:25.66.165.4"

0:000> da RDX
00007ff6 ' 37a14080 "????????????????????????????????"

Sure enough, RDX is ineffective.

Take a closer look at the previous assembly, such as writing a dead string when calling Getstringparameter. For example, the following two sentences are
00007ff6 ' 3696190c 4c8d0d45270b00 Lea r9,[getandrunservices! ' string ' +0xfd8 (00007FF6 ' 36a14058)]
00007ff6 ' 36961913 4c8d054e270b00 Lea r8,[getandrunservices! ' string ' +0xfe8 (00007FF6 ' 36a14068)]

As you can see, these strings are read from the global table of getandrunservices! ' string '. These two addresses are also accurately falling into the address range of the execution file:
0:000> LMF
Start End Module Name
00007ff6 ' 36880000 00007ff6 ' 36bda000 getandrunservices C:\App\getandrunservices.ap_10_09_10_8_5003_2510\ GetAndRunServices.exe

The problem is that the address is 00007ff6 ' 37a14080, and indeed is outside the scope of the file address.
But take a closer look, 00007ff6 ' 37a14080 this address is compared to the legal interval, and only one bit is different:
00007ff6 ' 3[7]a14080 00007ff6 ' 3[6]a14080
Here 7 and 6 have the bit at the end of the binary. If I manually change to a 6 look at it:

0:000> da 00007ff6 ' 36a14080
00007ff6 ' 36a14080 "serverlist=%s ..."

Sure enough, match with the source code.

How does the fan explain it? The first possibility is that the execution file is compromised. However, it is not possible to have integrity and signature protection for execution files by default. It's exactly the same as copying files and other documents that are not problematic.
Or maybe a virus? But the host machines on the cloud are absolutely isolated. As seen from the live debug process, there is no injection of user configuration.
The rest of the reason is a hardware problem. Data corruption is triggered when memory or CPU read pipeline encounters a binary rule.

This is my current inference. Next week, talk to an experienced engineer to see if this analysis is right. have also encountered a suspected hardware problem caused by the crash, but today is the first time to seize the scene!
If you have a similar experience or experience, please advise and share.

Process crashes due to suspected CPU or memory failure

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.