CPU Execution Branch Tracing Tool Based on BranchTraceStore Mechanism -- CpuWhere [drive only for correction], gitbranch switch Branch

Source: Internet
Author: User

CPU Execution Branch Tracing Tool Based on BranchTraceStore Mechanism -- CpuWhere [drive only for correction], gitbranch switch Branch
[Preface]

In his book software debugging, Mr. Zhang yinkui explained in detail the branch record mechanism using memory-BTS mechanism (5.3), and provided the sample tool CpuWhere and its source code.However, the actual running (VMware XP_SP3 Single-core) does not reflect the expected results and the Branch records cannot be read.I checked the source code and found no problems, which is consistent with what I mentioned in the book. Since there is no problem with the software itself, will it be a problem of running in a virtual machine?

The old machine that has been idle for many years, Pentium Dual + XP_SP3, added/numproc = 1 in the startup configuration, and set single-core startup. The test results remain unchanged.Searching several times on the Internet is also fruitless. After all, it is a small number of things. I only found a forum with a modified version for multiple cores. because I did not have an account for this forum, I could not download and test it.

Finally, I read the Intel manual and found the cause of the problem:If the DS mechanism uses the DTES64 mode, the CPU will expand the branch record size to 64 bits.

After testing, in the Pentium Dual + XP_SP3 configuration, the DTES64 mode is enabled, so the problem should be located here. It is not clear why VMware is ineffective. It only finds that the operation of the DS and BTS-related MSR registers has no effect (read/write failure), maybe the virtual machine is not configured, it may also be that VMware has not virtualized the DS and BTS mechanisms, so try not to test and use such tools in virtual machines.

 

 

[Detailed steps for correct use of the BTS mechanism] 1. Use the CPUID command to determine whether the DS and RDMSR/WRMSR commands are supported; read the IA32_MISC_ENABLE register to determine whether the BTS mechanism is supported.

1. When EAX = 1, the signs in EDX that indicate DS and RDMSR/WRMSR support are:

 

2. Supported BTS mechanisms in the IA32_MISC_ENABLE register:

 

3. The Code is as follows:

 1 BOOLEAN IsSupported() 2 { 3     DWORD _edx = 0; 4     DWORD _eax = 0; 5      6     _asm 7     { 8         mov eax,1 9         cpuid10         mov _edx,edx11     }12     13     if ((_edx & (1 << BIT_DS_SUPPORTED)) == 0)14     {15         DBGOUT(("Debug store is not supported."));16         return FALSE;17     }18 19     if ((_edx & (1 << BIT_RWMSR_SUPPORTED)) == 0)20     {21         DBGOUT(("RDMSR/WRMSR is not supported."));22         return FALSE;23     }24 25     ReadMSR(IA32_MISC_ENABLE, &_edx, &_eax);26     if ((_eax & (1 << BIT_BTS_UNAVAILABLE)) != 0)27     {28         DBGOUT(("Branch trace store is not supported."));29         return FALSE;30     }31 32     return TRUE;33 }

 

 

Ii. Use the CPUID command to determine whether the current DS mechanism is in DTES64 mode. If yes, the base address, index, boundary address, and interrupt threshold of the BTS in the DS structure should be 64-bit, and the source address, destination address, and flag data in BranchRecord should also be 64-bit. PEBS is the same.

1. When EAX = 1, ECX indicates whether the flag of DTES64 mode is:

 

2. The Code is as follows:

 1 BOOLEAN IsDTES64() 2 { 3     DWORD _ecx = 0; 4  5     _asm 6     { 7         mov eax,1 8         cpuid 9         mov _ecx,ecx10     }11 12     return ((_ecx & (1 << BIT_DTES64)) != 0) ? TRUE : FALSE;13 }

 

 

3. Set the corresponding DS and BTS based on the results of step 2 (only code in DTES64 mode is compiled. For details about non-DTES64 mode, refer to the original book ).

1. Structure of DS and BranchRecord in DTES64 mode:

 

2. In DTES64 mode, the structure declaration of DS and BranchRecord is as follows:

 1 typedef struct _DEBUG_STORE 2 { 3     ULONG64    btsBase; 4     ULONG64    btsIndex; 5     ULONG64    btsAbsolute; 6     ULONG64    btsInterruptThreshold; 7     ULONG64    pebsBase; 8     ULONG64    pebsIndex; 9     ULONG64    pebsAbsolute;10     ULONG64    pebsInterruptThreshold;11     ULONG64    pebsCounterReset;12     ULONG64    reserved;13 } DEBUG_STORE, *PDEBUG_STORE;
1 typedef struct _BRANCH_RECORD2 {3     ULONG64    from;4     ULONG64    to;5     ULONG64    flags;6 } BRANCH_RECORD, *PBRANCH_RECORD;

 

3. You must apply for non-Paging memory for DS and BTS:

 

4. The Code is as follows:

 1 BOOLEAN InitDebugStore() 2 { 3     g_pDebugStore = ExAllocatePoolWithTag(NonPagedPool, sizeof(DEBUG_STORE), (ULONG)"SD__"); 4     if (g_pDebugStore == NULL) 5     { 6         DBGOUT(("Failed to allocate memory for debug store.")); 7         return FALSE; 8     } 9     memset(g_pDebugStore, 0, sizeof(DEBUG_STORE));10 11     return TRUE;12 }
 1 BOOLEAN InitBranchTraceStore() 2 { 3     g_pBranchTraceStore = ExAllocatePoolWithTag(NonPagedPool, sizeof(BRANCH_RECORD) * MAX_RECORD, (ULONG)"STB_"); 4     if (g_pBranchTraceStore == NULL) 5     { 6         DBGOUT(("Failed to allocate memory for branch trace store.")); 7         return FALSE; 8     } 9     memset(g_pBranchTraceStore, 0, sizeof(BRANCH_RECORD) * MAX_RECORD);10 11     return TRUE;12 }

 

5. The code for setting DS is as follows:

1 VOID SetDebugStore()2 {3     g_pDebugStore->btsBase = (ULONG64)g_pBranchTraceStore;4     g_pDebugStore->btsIndex = (ULONG64)g_pBranchTraceStore;5     g_pDebugStore->btsAbsolute = (ULONG64)g_pBranchTraceStore + sizeof(BRANCH_RECORD) * MAX_RECORD;6     g_pDebugStore->btsInterruptThreshold = (ULONG64)g_pBranchTraceStore + sizeof(BRANCH_RECORD) * (MAX_RECORD + 1);7     WriteMSR(IA32_DS_AREA, HIDWORD(g_pDebugStore), LODWORD(g_pDebugStore));8 }

PS: I learned how to force type conversion in C Language (small type to large type ).

During dual-machine debugging, check the memory after g_pDebugStore is forcibly converted to ULONG64, and its high 32-bit value is 0 xFFFFFFFF. I always thought that the small type to the large type is in its high position to add 0, this situation makes me very puzzled, so check the disassembly found the reason:

Because the maximum register width that a 32-bit program can use is 32-bit, the format of a 64-bit register is [Reg: Reg], for example, EDX: EAX. When a 32-bit data is extended to 64-bit, the CPU uses the CDQ command to copy the symbol bit of the value to every bit in EDX, such as EDX: EAX indicates a 64-bit data.

Return to the code. Because this is a driver and runs in Ring0, the virtual address allocated by the system must be greater than 0X7FFFFFFF. In this way, for 32-bit data, this table is a negative number. The negative symbol is 1, and the value of EDX is 0 xFFFFFFFF when it is filled with 1, which ensures that 0xffffffffff ~ XXXXXXXX is equal to the original value. If 0 is filled, it becomes a positive number, which is naturally incorrect.

This incident once again taught me that everything cannot be taken for granted.

 

 

4. Enable the BTS Mechanism

1. In the IA32_DEBUGCTL register,The branch record method and the flag used to trigger the interrupt when the buffer is full are:

 

2. Set TR and BTS bit to 1 to enable the BTS mechanism. Set BTINT bit to 0 to indicate a ring buffer. The Code is as follows:

 1 VOID EnableBranchTraceStore() 2 { 3     DWORD _edx = 0; 4     DWORD _eax = 0; 5  6     ReadMSR(IA32_DEBUGCTL, &_edx, &_eax); 7     _eax |= 1 << BIT_TR; 8     _eax |= 1 << BIT_BTS; 9     _eax &= ~(1 << BIT_BTINT);10     WriteMSR(IA32_DEBUGCTL, _edx, _eax);11 }

 

 

5. Write the above steps into the DriverEntry routine

1. You can enable the BTS mechanism in DTES64 mode by calling it in sequence:

 1 NTSTATUS DriverEntry(PDRIVER_OBJECT pDriverObject, PUNICODE_STRING pRegisterPath) 2 { 3     UNREFERENCED_PARAMETER(pRegisterPath); 4  5     DBGOUT(("DriverEntry()")); 6  7     pDriverObject->DriverUnload = MyCpuWhereUnload; 8  9     if (!IsSupported())10     {11         return STATUS_FAILED_DRIVER_ENTRY;12     }13 14     if (IsDTES64())15     {16         DBGOUT(("Running on DTES64 mode."));17     }18     else19     {20         DBGOUT(("Not running on DTES64 mode."));21     }22 23     if (!InitDebugStore())24     {25         return STATUS_FAILED_DRIVER_ENTRY;26     }27 28     if (!InitBranchTraceStore())29     {30         return STATUS_FAILED_DRIVER_ENTRY;31     }32 33     SetDebugStore();34 35     EnableBranchTraceStore();36 37     return STATUS_SUCCESS;38 }

 

VI, To simplify the Code (for test only), all the code for disabling the BTS mechanism, obtaining Branch records, and releasing the DS and BTS memory is put into the DriverUnload routine.

1. Disable the BTS mechanism before reading the BTS buffer (consistent with the enabling process but reversed the flag value ):

 1 VOID DisableBranchTraceStore() 2 { 3     DWORD _edx = 0; 4     DWORD _eax = 0; 5  6     ReadMSR(IA32_DEBUGCTL, &_edx, &_eax); 7     _eax &= ~(1 << BIT_TR); 8     _eax &= ~(1 << BIT_BTS); 9     WriteMSR(IA32_DEBUGCTL, _edx, _eax);10 }

 

2. cyclically read data based on the BTS StructureBranchRecord:

See

 

3. Release the non-Paging memory applied for before DS and BTS:

See

 

4. The Code is as follows:

 1 VOID MyCpuWhereUnload(PDRIVER_OBJECT pDriverObject) 2 { 3     PBRANCH_RECORD pRecord = NULL; 4     DWORD count = 0; 5  6     UNREFERENCED_PARAMETER(pDriverObject); 7  8     DBGOUT(("DriverUnload()")); 9 10     DisableBranchTraceStore();11 12     pRecord = (PBRANCH_RECORD)LODWORD(g_pDebugStore->btsBase);13     for (; pRecord < (PBRANCH_RECORD)LODWORD(g_pDebugStore->btsAbsolute) && count < MAX_RECORD; ++pRecord, ++count)14     {15         if (pRecord->from == 0)16         {17             break;18         }19         DBGOUT(("%d: From: 0x%08X\n%d: To:   0x%08X", count + 1, (DWORD)pRecord->from, count + 1, (DWORD)pRecord->to));20     }21 22     ExFreePoolWithTag(g_pBranchTraceStore, (ULONG)"STB_");23     ExFreePoolWithTag(g_pDebugStore, (ULONG)"SD__");24 }

 

 

7. This completes the entire process of enabling the BTS mechanism under DTES64. Because multi-core is not supported, unexpected situations may occur. Please use it with caution.

Running effect:

 

 

 

[Summary]

It is used only for learning. It does not write the GUI and the communication routines of R3 & R0, nor implement code compatible with the non-DTES64 mode. HoweverAll these points can be found in the source code of the original CpuWhere prepared by Teacher Zhang yinkui.

 

Instructor Zhang yinkui's original CpuWhere (Bin & Src)

The license for downloading and using this tool is that the user has purchased software debugging.

: Http://advdbg.org/books/swdbg/t_cpuwhere.aspx

CPU where (Src VS2013 + WDK8.1) modified for DTES64 Mode)

: Http://files.cnblogs.com/files/Chameleon/MyCpuWhere.zip

 

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.