. NET CORECLR Developer's Guide (top)

Last Update:2016-10-01 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

1. Why every CLR developer needs to read this article

Compared to all other large code libraries, the CLR code base has many and more sophisticated code debugging tools to detect bugs. It's very important for programmers to understand these rules and idioms.

This article makes it possible for all CLR developers to understand the content of their own work in the CLR with less knowledge. This article will show you the history of the CLR, the different issues addressed at different stages, and some of the more convenient things you can bring to developers after the different stages of solving the problem.

1.1 Code Specification

This is one of the most important chapters! Imagine some of the items in the directory of this article, and then think about how you can design your code. This chapter will be divided into 2 parts, part of the managed code, the other part unmanaged code, different parts, will face different problems.

The specification is related to the invariance and some of the team's plans.

Immutability is semantically controlled by architecture. For example: the reference to a managed object in a secure GC is not present in unmanaged code, and when you violate this constancy, this is a very obvious bug for developers.

The team planning specification is when we write some "very good exercise" code, for example: I now stipulate that each method must be accompanied by a contract, if there is a way to violate this rule, then unless you can explain why this code does not follow the specification, otherwise it will create some ideas of other members of the team.

Team planning is not as important as immutability (architecture). For example, for you to use SAFEMATH.H, it is much more important to obey some of the specifications of the function than you do an integer overflow check. But for security reasons, we usually put it in a high priority position.

There is a specification you will not find in this article, that is, code cleanliness, such as the location of braces, of course, this is not part of the code category. We also do not require these language issues to be mandatory. This article will cover the following topics:

Introduce an actual bug that exists
Greatly increase the risk of serious bugs
Some sense of frustration with generic bug detection.

1.2 How to insert a common task

This section can be understood as a FAQ, and if you have special needs, search for "best practice" in this catalogue, and if you want to add another hash table to the CLR, take a look at this article, which has ready-made code to fit your business needs.

2. Code specification (unmanaged) 2.1 is your code a GC secure 2.1.1 GC black hole is generated

The term GC black hole is quoted from a classic GC BUG. GC Black hole is a deadly bug because it is very easy to be introduced for GC accidents, it is rarely duplicated, and debugging bugs is tedious and takes a lot of time to find the problem. A simple GC black hole could allow developers and testers to spend weeks troubleshooting problems.

One of the main functions of the CLR is the garbage collection mechanism. This means that when you allocate memory space to an object, if it is a managed application, you do not have to deliberately release the allocated memory space. In addition, the CLR will have a timer to periodically implement the garbage collection mechanism, when the GC will discard the object and no longer use it. At the same time, the GC tightens the heap to prevent the generation of useless black holes in memory. Therefore, objects in the managed heap do not have a fixed memory address, because in the GC, objects are constantly changing their position in memory like tadpoles.

To do this, the GC must tell all GC objects about their references. The GC must know the address of each element in the stack, and a pointer to a GC object for each registered and non-GC data structure object. These external pointers are referred to as root references.

It's written here:

When you have this information, the GC can find objects that are directly referenced from the external GC heap, and these objects are referenced by other objects in turn. Extending from these references, the GC will find all "live" objects, and the so-called objects that cannot be found (dead objects) will be discarded. The GC will then move the live objects to reduce memory fragmentation, and if this is done, the GC will update the references to all the moved objects.

Any time an object is allocated memory space, the GC will be triggered. The GC will show a request to the Garbagecollect method, and the GC will not be called asynchronously outside of these events, except that the other running threads will trigger the GC, and those threads will trigger the GC asynchronously unless you display a call to the GC, which will be described in more detail later.

A GC black hole is formed when the CLR creates a reference to the GC. If we don't let the GC know which references to do, do something that directly or indirectly triggers the GC and then try to use the original reference. At this point, the reference points to the garbage memory, and the CLR reads the wrong data, no matter where the reference is pointing.

2.1.2 First GC black hole

The following code will describe the GC black hole in the system in the simplest code.

// ObjectRef Here is the address pointed to by a typedef to represent the object's pointer {     *pmt = g_pobjectclass->getmethodtable ();      = Allocateobject (pMT);      = Allocateobject (PMT);
Error, a May point to garbage memory if the second allocateobject triggers a GC garbage collection mechanism

What the above code does is just allocate 2 pieces of managed code, and A and B call the DoSomething method to execute some logic code.

The above code, if you do it directly, seems to be fine, but the code will eventually explode with some problems. Why is it? Because the second piece of code inadvertently triggers the GC,GC will discard the object instance of the variable a you just assigned. This code, like the C + + code in CORECLR, is compiled by an unmanaged compiler, and the GC does not know that the variable A is a root reference for an object and is an object that is not recycled by the GC.

The above is worth reproducing. The GC does not have a root reference stored in a local variable or a knowledge point of a non-GC data structure; For the CLR, you have to run it in the right way.

2.1.3 Use Gcprotect_begin to keep quotes in the currency

The following code shows how to fix the GC black hole that appears in the above code:

" Frames.h " {     *pmt = g_pobjectclass->getmethodtable ();     //correct wording    ObjectRef a = allocateobject (pMT);    Gcprotect_begin (a);     = Allocateobject (pMT);    DoSomething (A, b);    Gcprotect_end ();}

Notice the addition of the Gcprotect_begin line of text, Gcprotect_begin is a macro with the argument as a reference type, and it is an expression that can be fully referenced by the address & assignment. Gcprotect_begin tells GC two things:

The GC does not discard any object that the reference to variable a refers to.
If the GC moves the object pointed to by the reference to variable a, then variable a will point to a newly opened memory space.

Now if the second Allocateobject () method triggers the Gc,a object, it will still be around, and local variable A will still point to the A object. The address of a may no longer be consistent with the previous. However, it will still point to the same object, so the value of dosomething () will be correct.

Here we notice that we did not protect B in the same way, because the callback function did not use B after the completion of the dosomething execution, in more depth, there is no pointer b to keep its update state because the DoSomething method actually receives a copy of the reference, Be careful not to mix with the copy of the object, it does not refer to itself. The dosomething also triggered a copy of gc,dosomething only to protect their own copies of A and B.

As I said before, no one should complain if you let it be "safe" and Gcprotect variable B. You never know when someone else will write code that would make your protection the same, so it's a must.

Each gcprotect_begin must have a gcprotect_end in order to end the protection of the variable A, as an additional protection, Gcprotect_end rewrite the variable a so that it becomes a garbage variable, if you use a again at this point will cause the error. Gcprotect_begin and Gcprotect_end will produce a new level of C-language scope, and if these 2 are not paired, then an exception will be thrown.

2.1.4 do not use non-local return in Gcprotect

Never use return, Goto, and other non-local returns between Gcprotect_begin and Gcprotect_end, which would be a thread-frame chain crash.

If a managed exception is thrown in a gcprotect block (usually an exception triggered by the Complusthrow () method), the abnormal subsystem knows the gcprotect and correctly fixes the frame chain to solve the problem of broken frame chains.

Why is gcprotect not derived from the C + + smart pointer base class? Because Gcprotect originates from the. NET Framework 1.0, it is essentially a macro. All errors were explicitly terminated at that time, and no exception handling or stack memory allocations were used for any C + +.

2.1.5 Do not use Gcprotect 2 times in the same position

The following code is wrong and will cause some different crash exceptions:

ObjectRef a = allocateobject (...); Gcprotect_begin (a); Gcprotect_begin (a);

Of course, if the GC is strong enough to ignore the second gcprotect, it is virtually impossible for gcprotect to be "protected" multiple times.

Do not confuse references to referenced copies, it is legal to protect 2 references, and incorrect to protect copies of 2 references, so the following code is correct:

ObjectRef a = allocateobject (...); Gcprotect_begin (a);D osomething (a); Gcprotect_end (); void dosomething (ObjectRef a) {    gcprotect_begin (a);    Gcprotect_end ();}

2.1.6 Protect Multiple ObjectRef

You can use Gcprotect to protect multiple objectref addresses, but it is limited by the C + + multilevel scope and imagine how you need to store root references in a non-GC data structure with uncertain time complexity.

The workaround is that Objecthandle.objecthandle tells the GC to allocate an address for a particular block of memory, and any root reference stored here will not be destroyed during its lifetime, and if there is a move of the object, its address will be updated. You can indirectly restore it to the correct memory address.

Handles is the realization of a number of different levels of Prime minister, a public official interface used by objecthandle.h exposure; Don't be confused about what handletable.h contains. The CreateHandle () API method allocates new memory space, Objectfromhandle () indirectly references handle and returns the most recent reference, Destroyhandle () frees up memory space.

The following code snippet tells us how to use handles, and in fact, people prefer to use gcprotect.

{    *pmt = g_pobjectclass->getmethodtable ();     // Another way is to use Handles.handles to use more memory, for such a simple example     // if you want to protect something for a long time, using handles will help.      objecthandle ah;    ObjectHandle BH;     = CreateHandle (allocateobject (PMT));     = CreateHandle (allocateobject (PMT));    DoSomething (Objectfromhandle (AH),                 objectfromhandle (BH));    Destroyhandle (BH);    Destroyhandle (AH);}

The system provides us with different kinds of handles. Here are a few common ones, if you want to see all the objecthandle.h inside there are complete.

Hndtype_strong: the default. Its function is equal to ordinary reference, using method: CreateHandle (ObjectRef).
Hndtype_weak_long: Keeps track of its strongly typed references throughout the life of an object and not itself to prevent it from triggering a GC. How to use: Createweakhandle (OBJECTREF).
hndtype_pinned: Blocks the movement of object references in the garbage collection lifecycle of an object, and the strong handles of attributes that have been added to the top of the stack. When the GC is enabled, it is particularly useful to pass pointers to internal objects outside the runtime.

Note: If you use a third, GC garbage collection is best for a long period, because short-term recycling prevents GC boxing and causes unnecessary memory consumption. So when you use it, you should think twice.

2.1.8 correct use of GC mode: Preemptive VS Cooperative working

In the early days, the GC was not automatically triggered, and for a thread that was wired, it was right. But the CLR is a multithreaded creature, and if your thread executes consistently and does not throw exceptions until the end, then it has nothing to do with the other threads in the process.

Imagine that there are 2 different ways to perform a GC:

preemption: Any single thread triggers the GC, and this thread does not care about the state of other threads, that is, other threads may be triggered by a point in time with the GC colleague of the thread.
Collaborative work: A thread can start a GC only once and other threads will open the GC-initiated permission to the thread, and if the current thread makes a GC request, it will be blocked until the other threads agree to the thread's GC operation.

Each of these different patterns has its own advantages and disadvantages, and preemption looks more attractive and efficient, except for one thing: completely breaking the GC protection mechanism we discussed earlier. For example, the following code:

ObjectRef a = allocateobject (...) Gcprotect_begin (a);D osomething (a);

Now, let's take a look at the relatively complete pseudo-code:

Pager    Allocateobjectmov [A],eax  ;;  Store The results in a ... Omit gcprotect_begin code ... push    [A]        ; pass parameters to Dosomethingcall    dosomething

Regardless of the circumstances, this code is not a problem to run new, on the surface through the Gcprotect view. What happens after the push command is conceived? The other thread gets the time fragment, starts executing the GC and moves the A object. The local variable A will be updated correctly, but the parameters for the DoSomething () method (a copy of a) are not updated. Therefore, dosomething () will accept a pointer to the old reference and cause the program to crash. Now we know further that the CLR cannot be satisfied if a preemptive GC is used alone.

So when to choose which mode is better? Co-working GC? In this case, none of the above problems will occur and gcprotect work as expected. Unfortunately, the CLR has to interact with legacy unmanaged code. Imagine that a managed application wakes up waiting for a user to click a button to return to the Win32 MessageBox API until the user taps the button, and all managed threads in the same process will be blocked by GC block, which will obviously affect the execution efficiency of the program.

Because there is no way to meet the needs of the CLR alone, the CLR supports 2 ways of working together, and as a developer, it only takes a corresponding switch thread. Note the GC Dispatch mode is a property of a separate thread, not a global system property.

To be precise: a thread runs for a long time in cooperative mode, which guarantees that the GC only acts when the memory allocation is triggered by the threads, wakes up the interruptible managed code, or explicitly requests the GC. Other threads are blocked by GC, and if a thread is working for a long time in preemption mode, you must assume that the GC can be started by another thread at any time and run with other threads.

A good rule of thumb is that a CLR thread runs in cooperative mode at any time, then it runs in managed code or manipulates the object's reference in any way. An exception engine, run in preemption mode, typically runs unmanaged code Exception. For example, if it is out of the managed domain, multiple threads in a process running in preemption mode will never enter the CLR, and many of the CLR's internal code uses preemption mode to run.

If it is running in preemption mode, ObjectRef will strictly not do any intervention, white point, and it doesn't matter anymore, when the resulting value is completely unreliable. In fact, if you add ObjectRef in preemption mode, the compiler will check its "correctness" at compile time. In co-operation mode, because the GC causes other threads to block, you should reduce the wait time operation, which is one of the ways to improve efficiency. You must also pay attention to the critical section or signal of the passive waiting.

Set GC mode: Typically, the Gcx_coop and GCX_PREEMP macro commands are used. These macro commands should be manipulated as containers, and you must declare it at the beginning of the code interval you want to execute, and the auto-restore function will force the restore to the original mode if the local or non-local exits outside the scope.

will always open a new C + + scope to change the pattern     Gcx_coop ();     inch  //  leave the scope and revert to the pattern before the change

If a thread is working in a cooperative mode to call Gcx_coop () is legal, gcx_coop in that case will be a NOP, the same applies to gcx_preemp.

Gcx_coop and Gcx_preemp never throw an exception and return a non-error state.

Of course there is a special case for purely unmanaged threads (threads without any thread structure) that you can interpret as permanent in competitive mode, so if Gcx_coop wakes up such threads then

Gcx_preemp will be a subsystem of NOP.

There will be a set of variants for special cases:

Gcx_maybe_* (BOOL): This is only performed if the argument is true, it is important to revert to the initial state at the end of the scope, depending on whether the value of BOOL is true (this is significant, but only in the scope, if the pattern is changed in other ways, Typically, this does not happen).
Gcx_*_thread_exists (thread*): If you care about repeated getthread () and the empty thread selected in the container, use an efficient version by caching the thread pointer and passing it to all gcx_* callers. You cannot use this area to change the pattern of other threads, and of course you cannot pass null here.

To change the pattern multiple times in a method, you must use a new scope for each change, or you can call the Gcx_pop () method before the scope of the Restore Mode is finished (this mode is restored again before the scope ends.) Because the pattern restore is idempotent, this should not be cared for, never do as follows:

{     gcx_coop ();     ...     Gcx_preemp ():  // error!}

The system throws a compilation error: The variable has been declared in the same scope.

Does the container-based macro command have a better way to change the pattern? Sometimes you need to leave a pattern of change when you are out of scope, and you need an unprocessed, non-scoped method:

GetThread ()->disablepreemptivegc ();   // Switch to co-operation mode GetThread ()->enablepreemptivegc ();  // Switch to preemption mode

There is no automatic mode recovery mechanism for these methods, so it is your duty to manage its life cycle, in addition, the pattern changes cannot be nested, and if you change an existing pattern you will get an assertion (assert), the current object must be the currently executing thread, not the pattern of the other thread.

Key point: Using Gcx_coop/preemp is better than no-scope call DISABLEPREEMPTIVEGC () whenever possible.

You need to use assertions in special patterns in the contract, which can be done using the following pattern:

contractl{    mode_cooperative}contractl_endcontractl{    mode_preemptive}contractl_end

The following is a standalone version:

{    gcx_assert_coop ();} {    gcx_assert_preemp ();}

You will notice that the standalone version is more like a container than a simple version, so that the container will be judged before leaving the scope to ensure that the unboxing operation is correct. However, the exit check is eventually disabled because it is enabled and is not initialized until all the unpacking code is cleaned up. Unfortunately, there is no problem at all when you use the GCX container to manage the change of mode.

Reference: Https://github.com/dotnet/coreclr/blob/master/Documentation/coding-guidelines/clr-code-guide.md#1

. NET CORECLR Developer's Guide (top)

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

. NET CORECLR Developer's Guide (top)

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

. NET CORECLR Developer's Guide (top)

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support