CLR via C # Reading Notes 2-3 cache lines and false sharing (high-speed buffer zone and error sharing ???)

Source: Internet
Author: User

Due to limited level, I don't know how to express cache lines and false sharing in Chinese (translate them into high-speed buffers and error sharing for the time being. If there is a mistake, please correct it)

 

The current CPU generally has multiple cores and one cache in the CPU (generally L2 ?)

These caches are generally located in the CPU chip, and their speed is much higher than the memory on the motherboard,

Generally, the CPU loads data from the memory to the cache to achieve better performance (especially frequently used data)

This cache is divided into 64 bytes as a region by default (This number may be different on different platforms and can be modified using the Win32 API function getprocessorinformation)

Only one core operation is allowed in one region at a time point.

That is to say, you cannot have multiple cores to operate a cache region at the same time (the CPU is now multi-core ...)

Because 64 bytes of space can be stored in multiple data structures at the same time, for example, 16 integer32

For example:Code(This code is only used for demo. Please do not go into his Naming and Design Questions ...)

  Private ClassData
{
PublicInt32 field1;
PublicInt32 field2;
}

It is entirely possible that when a thread is operating field1 and another thread running on another CPU wants to operate field2, it must wait until thread 1 completes before obtaining access to this cache region.

This results in high performance loss when data operations are intensive.

Refer to the following code

 

Code

 Internal     Static     Class  Falsesharing
{
Private Class Data
{
Public Int32 field1;
Public Int32 field2;
}
Private Const Int32 iterations = 100000000 ; // 100 million
Private Static Int32 s_operations = 2 ;
Private Static Int64 s_starttime;
Public Static Void Main ()
{
// Allocate an object and record the start time
Data data = New Data ();
S_starttime = Stopwatch. gettimestamp ();
// Have 2 threads access their own fields within the structure
Threadpool. queueuserworkitem (o => Accessdata (data, 0 ));
Threadpool. queueuserworkitem (o => Accessdata (data, 1 ));
// For testing, block the main thread
Console. Readline ();
}
Private Static Void Accessdata (data, int32 field)
{
// The threads in here each access their own field within the data object
For (Int32 x = 0 ; X < Iterations; x ++ )
If (Field = 0 ) Data. field1 ++ ; Else Data. field2 ++ ;
// Whichever thread finishes last, shows the time it took
If (Interlocked. decrement ( Ref S_operations) = 0 )
Console. writeline ( " Access time: {0: N0} " , Stopwatch. gettimestamp () - S_starttime );
}
}

This code runs 2,471,930,060 (timestamp) on my machine and says my machine is really bad ..... if anyone is interested, study the speed on your machine ......

If you change the data class to the following definition:

Code

  [Structlayout (layoutkind. explicit)]  //  Specify the memory Layout  
Private Class Data
{
// These two fields are separated now and no longer in the same cache line
[Fieldoffset ( 0 )] // Memory Address offset 0
Public Int32 field1;
[Fieldoffset ( 64 )] // Memory Address offset 64
Public Int32 field2;
}

The running time on my machine is 1,258,994,700 (timestamp)

Modify the memory layout to offset field2 by 64 bytes,ProgramRequires more space (2 caches), but it can run faster.

 

The most practical inference:

C # all one-dimensional arrays inherit from system. array and perform some special processing, such as border check.

Border Check: When you access any array element, CLR must verify that the index value must be within the valid length of the array (index <length)

This means that no matter which element of the array is accessed, the CLR must first access the length.

Length is an integer 32 value located before the array element to indicate the length of the array.

The data structure of the array in the memory is roughly as follows:

     Int  [] Vals  =    New     Int  [] {  1  ,  2  ,  3  ,  4  ,  5  };
// Length: first element, second element, third element, fourth element, Fifth Element
// 5 1 2 3 4 5

Assuming that the default buffer size is 64 bytes, the length of the array and the starting elements must be in a buffer area.

Therefore, it is necessary to avoid a thread a that is always operating on length, or other elements starting with data.

At the same time, to read data from other parts of the array, you must wait until thread a completes before reading other parts of the array.

 

PS: only personal inferences are not verified.

A buffer zone should allow concurrent reading but not concurrent writing, and write operations block all other operations (such as read and Other writes)

 

 

 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.