AMD64 Architecture Learning Notes

Source: Internet
Author: User
Tags exception handling

About Address-size

In 64-bit mode, the default, 64-bit addressing space, can use the 67H instruction prefix, overload to 32-bit addressing space, 64-bit mode, does not support 16-bit addressing space.

About Operand-size

In 64-bit mode, the default is to use the 32-bit operand, using the Rex prefix, and the overload to 64 as the operand. Use the prefix of 66H to overload to 16-bit operands. Some instructions using a near jump to RSP (CALL,JCC,JMP,LOOP,LOOPCC,PUSH,POP) Use the 64-bit operand by default, without the addition of the Rex prefix.

About Call:
1, call in paragraph (near call), the RIP into the stack of this process.
2, between paragraphs call (far call, the same protection level), the RIP,CS is pressed into the stack of this process.
3, switch protection-level call (far call, enter a higher protection level), obtain a new program stack pointer from TSS, and save the old ss.,rsp to a new stack, in compatibility mode, Copies the specified number of parameters from the old stack into the new stack. In long mode, there is no such copy action, and the individual thinks that there is no concept of a fragment of code in long mode. Push CS, rip into new stack.

About task switching:
1, in compatibility mode, the current process state (the value of each register, the page table address, and so on) is saved to the current process TSS. Loads the state information of the new process into the processor from the TSS of the process.
2.Long mode does not support hardware process switching.

About Syscall & Sysret:
Using the traditional call, the RET processor takes more action because it is based on paragraph protection and privilege checking. It is not just the cost of loading the segment descriptor that is caused by the reload of the segment register, it also takes time to check for protection, type, and privilege restrictions.
These instructions eliminate segment-based privilege checking with using predetermined target and return code segments and s Tack segments. The operating system sets up and maintains the predetermined segments using special registers within the processor Segment descriptors does not need to is fetched from memory the instructions are. The simplifications made to privilege checking allow SYSCALL and Sysret to complete in far fewer processor clock cycles th A call and RET.

This paragraph can only be understood, can not be explained, haha. Basically, the operating system uses special processing in the CPU to set up and maintain the segments that will be used.
Sysret directives can only be used in cpl=0 programs, cpl>0 applications are not available,

Sysenter & Sysexit is not supported in Long mode.

About exception Handling (Exception):

Divided into three kinds:

1,fault: Before the exception boundary is interrupted, the execution of the interrupted instruction is returned after the exception routine has been processed. Page exception is the exception.

2,trap: When the boundary is executed, the program that caused the exception is completed before the exception service program runs. Software interrupts and breakpoint exceptions for debug are such types.

3,abouts: is a vague and imprecise exception and cannot resume an interrupted program to continue running.

In Long mode, the Interrupt & exception jump always saves the ss.rsp of the old process and loads the new ss,rsp from the TSS of the interrupt handler.

Interrupt returns: IRET, iretd, and IRETQ instruction causes the CPU to return from the interrupt handler. If the interrupt or exception is pushed an error code to the stack, the interrupt handler must pop it out before executing the IRET.

About cache:
Cache management is the entire cache into a number of blocks, called Cache line, cache fill and discard, are in the cache line as the unit.
Cache Contamination:
1, temporal locality: The processor assumes that the recently accessed memory will be accessed again in a short period of time, otherwise the cache is considered contaminated.
2.Spatial Locality: The processor assumes that adjacent memory will be accessed within a short period of time, otherwise the cache is considered contaminated.
3,stale Data: The processor assumes that the memory is not accessed for a long time and will not be accessed for a short time. That the cache ' was contaminated,

Cache Control Instructions:
1:prefetchlevel: If this instruction is executed with an invalid memory address as an operand, no exception is generated and the instruction execution has no effect. If the memory address specified by the operand is non-cacheable or writecombining type of memory, the instruction execution has no effect.

2:prefetch: Read the load data to the cache, write this cache line will have additional time to use to change the cache line modified logo.
3.PREFETCHW: Write the way load data to Cache,load will be the cache line of the modified flag set. Speed of execution.

About floating point numbers:

Sign Exponent fraction Bias
Single Precision 1 [31] 8 [30-23] 23 [22-00] 127
Double Precision 1 [63] 11 [62-52] 52 [51-00] 1023

The index of the floating-point number is the actual exponential value plus the bias value, stored in the storage location
So the actual exponential value = the stored exponential value-bias

Floating point numbers can be expressed in the range:

denormalized normalized approximate Decimal
Single Precision ±2-149 to (1-2-23) x2-126 ±2-126 to (2-2-23) x2127 ±~10-44.85 to ~1038.53
Double Precision ±2-1074 to (1-2-52) x2-1022 ±2-1022 to (2-2-52) x21023 ±~10-323.3 to ~10

According to the range of the specified floating-point number and the method of exponential calculation, it is known that the normalized Non-zero floating-point number of the exponent portion of the value is 1 to 254, between.

Floating point numbers are divided into:
1, normalized floating-point number: Index part is 1-254
2. Non-normalized floating-point number: exponent part is 0, the mantissa is not 0
3, plus or minus 0: exponent part 0, Mantissa 0 number
4, plus and minus: exponent part 255, Mantissa 0 number
5.NaN, illegal floating-point number.
Qnan:: The exponent part is 255, the mantissa number of the first valid bit is 1
Qnan:: The exponent part is 255, the mantissa first valid bit is 0 non 0 number

Sign Exponent (e) fraction (f) Value
0 00..00 00..00 +0
0 00..00 00..01
:
11..11
Positive denormalized Real
0.FX2 (-b+1)
0 00..01
:
11..10
Xx.. Xx Positive normalized Real
1.FX2 (E-b)
0 11..11 00..00 +infinity
0 11..11 00..01
:
01..11
Snan
0 11..11 10..00
:
11..11
Qnan
1 00..00 00..00 -0
1 00..00 00..01
:
11..11
Negative denormalized Real
-0.FX2 (-b+1)
1 00..01
:
11..10
Xx.. Xx Negative normalized Real
-1.FX2 (E-b)
1 11..11 00..00 -infinity
1 11..11 00..01
:
01..11
Snan
1 11..11 10..00
:
11.11
Qnan

128-bit Multimedia instruction:

1. Move series Instruction

MOVD: Memory, Universal registers and XMM registers pass between 32 bits or 64 bits of data, if you want to pass 64-bit data, need the rex instruction prefix.

Passed to the XMM register, will return 0 to extend the remaining highs.

MOVQ: Memory, Universal registers and XMM registers pass between 64 bits of data, passed to XMM registers, will return 0 to extend the remaining high.

MOVDQA: Memory, XMM registers and XMM registers pass 128 bits of data, if the data is passed between and memory, the need for memory address alignment.

Movdqu: Memory, XMM registers and XMM registers pass 128 bits of data, if the data is passed between and memory, do not need the memory address alignment.

MOVDQ2Q: Passes the low 64 bits in the XMM registers to the MMX registers.

MOVQ2DQ: The data in the MMX register is passed to the lower 64 bits of the XMM register, and is extended to 128 bits by 0.

2, the use of non-temporal way to transfer data, can reduce the cache of pollution. These instructions get data directly from memory, not through the cache

MOVNTDQ, Maskmovdqu

PMOVMSKB: Saves each byte in the XMM register to a low of 32-bit or 64-bit general-purpose registers, and 0 expands,

Data conversion: From XMM registers, in-memory integers to floating-point numbers

Cvtdq2ps:4 A 32-bit signed integer into 4 single-precision floating-point numbers

Cvtdq2pd:2 a 64-bit signed integer into 2 double-precision floating-point numbers

Convert MMX registers or in-memory integers to floating-point numbers:

Cvtpi2ps: Converts 2 DWORD-width integers to 2 single-precision floating-point numbers to low 64 bits of XMM, and 0 expands to 128 bits

CVTPI2PD: Converts an integer of 2 DWORD widths to 2 double-precision floating-point numbers.
Insert Segment AD

Linux Culture T-shirts, Taobao sales, interested in can buy.

Taobao Store Address:

Http://auction1.taobao.com/auction/item_detail-0db2-5ba9dd77b24e43b427e1d71d7b19a0d2.jhtml

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.