Solaris2.4 multi-thread programming guide 7-Programming Guide

Source: Internet
Author: User
Sender: McCartney (coolcat), email zone: Unix
 
Mailing site: BBS shuimu Tsinghua station (Sun May 17 16:34:37 1998)
 
7 Programming Guide
 
This chapter provides some key points of Thread Programming. Special emphasis is placed on the differences between single-thread and multi-thread programming methods.
Recognition of global variables
Provides static local variables
Thread Synchronization
Avoid deadlocks
Some basic precautions
Multi-processor Programming
 
In history, most code is programmed in a single thread. If the library function is called in the C program
In particular:
· If you assign a value to a global variable and read it later, the read result is the same as that written.
· For non-Global static storage
· No synchronization mechanism is required, because there is nothing to synchronize
In the following multi-threaded examples, we have discussed the problems that will occur when the above assumptions are used, as well as your
How to deal with these problems.
 
7.1 re-recognize global variables
 
Traditionally, single-threaded C and Unix have a tradition of Handling System Call errors. System Call can return
Any value (for example, write () returns the number of bytes transmitted) as a functional return value. However,-1 is retained,
It indicates an error. Therefore, if a system call returns-1, you will know it is a failure.
 
Code Example 7-1 global variables and error code errno
 
Extern int errno;
...
If (write (file_desc, buffer, size) =-1 ){
/* The System Call failed */
Fprintf (stderr, "something went wrong, error code = % d/N", errno );
Exit (1 );
}
...
The function does not directly return the error code (it will be confused with the normal return value), but puts the error code into
In the global variable named errno. If a system call fails, you can read errno to identify the problem.
Now, in a multi-threaded program, the two threads almost fail at the same time, but the error code is different. They all want
Find the problem in errno, but an errno cannot store two values. This global variable cannot be used by multi-thread programs.
.
The Solaris thread package solves this problem through a completely new storage type-thread-specific data.
Similar to global variables, this memory can be accessed in any process during online running. However, it is thread-Private
-- If two threads refer to the thread-specific storage area with the same name, they are actually two storage areas.
Therefore, if a thread is used, each operation on errno is thread-specific, because each thread has
Private copy of errno.
 
7.2 provide static local variables
 
Example 7-2 shows a problem similar to errno, but involves static storage instead of global storage.
Storage. The gethostbyname (3N) function uses the computer name as the parameter. The returned value is a pointer to the structure,
This structure contains necessary information for accessing a specified computer through the network.
 
Code Example 7-2 gethostbyname
 
Struct hostent * gethostbyname (char * Name ){
Static struct hostent result;
/* Lookup name in hosts database */
/* Put answer in reault */
Return (& result );
}
Returning pointing to an automatic local variable is not a good choice, although this example is acceptable, because
The specified variable is static. However, if two threads access this region with different computer names at the same time
State storage usage conflicts.
The thread-specific data can replace static storage, as in the errno issue, but this involves dynamic
Allocates memory and increases the call overhead.
A better solution is that the caller provides a storage area for storing data. This requires adding
Parameter, an output parameter. A new version of gethostbyname is required.
In Solaris, this technology is used to handle many similar problems. In most cases
The name is suffixed with "_ r", for example, gethostbyname_r (3N ).
 
7.3 Thread Synchronization
 
The threads in an application must use the synchronization mechanism to process shared data and process resources.
A problem occurs when multiple threads control an object. In the single-threaded world
Step-by-Step access is not a problem, but as shown in Example 7-3, pay attention to it in multi-threaded programming. (Note Solaris
Printf (3 S) is secure for multi-threaded programs. This example shows that problems will occur if printf is not secure .)
 
Code Example 7-3 printf ()
/* Thread 1 */
Printf ("go to statement reached ");
 
/* Thread 2 */
Printf ("Hello World ");
 
Printed on display:
Go to hello
 
7.3.1 single-thread Policy
 
One method is to use a single mutex lock that is valid within the application scope and must be used when printf is called.
Mutex lock protection. Because only one thread can access Shared data at a time, the memory seen by each thread is consistent.
.
Because this is invalid tively a single-threaded program, very little is
Gained bythis strategy.
 
7.3.2 reentrant function
 
A better solution is to adopt the idea of modularization and Data encapsulation. An alternative function is provided in several lines
It is safe to call at the same time. The key to writing an alternative function is to figure out what operations are "correct ".
Functions that can be called by several threads Must be reinjected. This may need to change the implementation of the function interface.
All functions that access the global status, such as memory and files, have the re-entry problem. These functions need to be
The correct synchronization mechanism provided by the Solaris thread is used to protect the global access status.
The two basic policies that ensure function re-entry are code locks and data locks.
 
7.3.2.1 Code Lock
 
The code lock is a function call-level policy that ensures that the function runs completely under the protection of the lock. This policy assumes that
All data is accessed through functions. Data sharing functions should be implemented under the protection of the same lock.
Some parallel programming languages provide a mechanism called Monitor,
Function Code is implicitly used for protection. A monitor can also be implemented using mutex locks.
 
7.3.2.2 data lock
 
The data lock ensures the consistency of the collection of data maintenance. For data locks
Although there is a concept of code lock, the code lock only involves accessing shared data. For a mutex lock protocol,
There is only one thread to operate on each data set .???
In the multi-read single-write Protocol, several read operations or one write operation can be allowed. Operate on different data
Set, or when the same dataset is closed without violating the multi-read single-write protocol, multiple
The thread can be executed simultaneously. Therefore, data locks provide more synchronization than code locks.
If you want to use a lock, which one do you want to use (mutex lock, conditional variable, semaphore? You need
Try to lock more concurrency only when necessary (fine-grained locking fine-grain lock ),
Is to make the lock effective for a period of time to avoid additional overhead of locking and releasing the lock (coarse-grained
Locking )?
Lock texture (it can be understood as the frequency of locking and Releasing locks. The higher the frequency, the finer the texture.) according
The data volume protected by the volume. A coarse-grained lock can be a single lock that protects all data. Adapt data
It is important to protect the number of locks separately. If the texture is too small, performance may be affected, and too many locks and resolutions may be affected.
The lock operation will accumulate to a considerable extent.
The common practice is to use a coarse-grained lock to locate the performance bottleneck, and then add details as needed.
Lock to reduce the bottleneck. It seems that this is a reasonable method, but you need to make your own judgment to achieve the best effect.
 
7.3.2.3 Invariant
 
No matter code locks or data locks, constants are of great significance for controlling the complexity of locks. One
A constant is a permanent condition or relation.
This definition must be modified when the application is executed simultaneously: a constant is a permanent bar.
If the lock is not set. Once the lock is set, the invariant may be false. However
Before releasing the lock code, you must re-establish the invariant.
A constant can also be a permanent condition or link, if the lock has not been set. Conditional variables can be considered
Having an invariant is its condition.
 
Code Example7-4 uses assert (3x) to test the invariant
 
Mutex_lock (& lock );
While (condition)
Cond_wait (& cv );
Assert (condition) = true );
.
.
.
Mutex_unlock ();
 
The assert () command is used to test the constants. The cond_wait () function does not protect the constants, so the thread returns
The constant must be re-evaluated.
Another example is a module that controls the elements of a double-stranded table. For each component in the linked list, a good or bad
A variable is a pointer to the previous item and to the next item.
Assume that this module uses the code lock, that is, only one global mutex lock is used for protection. If one item is deleted
Except or if one item is added, the pointer is correctly operated and the mutex lock is released. Obviously
In a sense, the immutator is false, but the immutator is re-established before the mutex lock is released.
 
7.4 avoid deadlock
 
Deadlock is a permanent congestion caused by a series of threads competing for a series of resources. Some threads can run.
Other threads have no deadlocks.
The most common error that causes a deadlock is self deadlock or recursive.
Deadlock): A thread tries to obtain the lock again when it has a lock. Recursive deadlock is programming
Is very prone to errors.
For example, if a Code Monitor obtains mutex lock for each module's function during the call
Any function called between modules protected by mutex locks will immediately cause a deadlock. If a function is called
Code other than the module, and the code calls this module through a complex or simple path.
Functions protected by the same mutex can also experience deadlocks.
The solution to this deadlock is to avoid calling functions other than modules. If you do not know whether they will
Callback to this module without recreating the immutator and discard all acquired locks before calling. Of course
After the lock is used, it is obtained again. Check the status to make sure that the operation you want to perform is still valid.
Another situation of deadlock is that thread 1 and thread 2 obtain mutex lock a and mutex lock B respectively. Thread 1
To obtain mutex lock B, while thread 2 wants to obtain mutex lock. As a result, thread 1 is blocked and waits for B, while thread 2 is blocked.
Wait for a, causing a deadlock.
This type of deadlock can be avoided by arranging the sequence of mutex locks (Lock hierarchy ). For example
If all threads apply for mutex locks in the specified sequence, the deadlock will not occur.
The locking order is not the best practice. If thread 2 has mutex lock B, the module status is very good.
If the number of mutex locks is large, discard mutex lock B to apply for mutex lock a, and then apply for mutex lock B again in order.
These assumptions are meaningless and have to re-evaluate the status of the module.
Blocking synchronization of the original language usually has a non-blocking version, such as mutex_trylock (). It allows the thread
The lock level is broken during competition. If there is competition, the obtained locks are usually released and then applied in order.
 
7.4.1 deadlock Scheduling
 
Because the acquisition of locks has no guarantee of order, a common problem in thread programming is that a specific thread will never
A lock (usually a condition variable) is obtained even if it looks like it should.
This usually occurs when the thread that owns the mutex lock releases the lock and then acquires the lock again after a period of time.
Because the lock is released, it seems that other threads will obtain the lock. But because no one can block this
The lock thread will continue to be executed until the mutex lock is acquired again, so that other threads cannot proceed.
Generally, thr_yield (3 T) can be called before the lock is re-acquired to solve this type of problem. It allows
Run other threads and obtain the lock.
Because the time slice required by the application changes a lot, the thread library cannot be forced. Only call
Thr_yield () to ensure that the thread shares resources as you need.
 
7.4.2 considerations for locking
 
The following are some simple considerations for locks.
· Do not lock long-time operations (such as I/O), which will negatively affect performance.
· Do not lock functions that may re-enter this module when calling them.
· Do not try extreme processor synchronizations. When system calls and I/O operations are not involved, the lock usually only
A thread occupies a short time, and conflicts rarely occur. It takes a long time to occupy
There is a lock.
· If multiple locks are used, use the lock level to prevent deadlocks.
 
7.5 follow basic precautions
 
· Identify the content you introduced and whether they are secure.
A thread program cannot enter non-thread code at will.
· Non-secure code can be called only by the thread code in the initial thread.
This ensures that the static storage associated with the initial thread can only be used by this thread.
· If the library provided by Sun is not explicitly identified as unsafe, it is defined as safe.
If man page does not claim that the function is MT-safe, it is safe. All MT-unsafe letters
The numbers are clearly marked in man page.
· Use the compilation flag to control uncompatible Binary source code changes.
Specify-d_reentrant during compilation or ensure that _ reentrant is defined in the header file.
· If a library is multi-threaded, do NOT thread global process operations .???
Do not change global operations (or operations that may affect the global operation) to the thread style. For example, if
File I/O operations are set to line-level operations. Multiple Threads cannot access the file correctly.
For line-level operations or thread cognizant operations, use the thread tool. For example
If the main () function only terminates the thread that is exiting the main function, the end of the main () function should be
Thr_exit ();
/* Not reached */
 
7.5.1 create thread
 
The Solaris thread package sets the cache for the thread data structure, stack, and lwp, so that non-bound lines are created repeatedly.
Cost is reduced.
The creation of Non-bound threads has much lower overhead than the creation of processes or bound threads. Actually,
This overhead is equivalent to the overhead of switching from one thread to another.
Therefore, the process of constantly creating and removing threads as needed is better than maintaining a thread pool waiting for independent tasks.
It is often more cost-effective.
A good example is that an RPC server works by creating
Thread, and clear this thread after providing the service, rather than maintaining a lot of threads to provide the service.
Although thread creation costs less than process creation, it does not cost less than several commands. Therefore, only
A thread is created only when thousands of machine commands are executed.
 
7.5.2 simultaneous thread
 
By default, the Solaris thread adjusts the execution Resource (lwp) of the non-bound thread to implement
Matches the number of active threads. If the Solaris thread package cannot be properly scheduled, it can at least ensure
The process continues to run.
If you want to keep a certain number of threads active at the same time (execute code or system call), you must
Use thr_setconcurrency (3 T) to notify the thread library.
 
For example:
· If a Database Server opens a service thread for each user, it should set the expected
The number of users tells the operating system Solaris.
· If a window Server opens a thread for each client, it should set the expected active client's
Number of notifications to Solaris.
· A file copy program has a read thread and a thread, which should notify Solaris of its level of Synchronization
Is 2.
Alternatively, the synchronization level can be increased by using the thr_new_lwp flag when the thread is created.
When calculating the synchronization of threads, you need to consider the threads in the blocking state because of the synchronization variables between processes.
Come in.
 
7.5.3 Efficiency
 
Creating a New thread with thr_create (3 T) takes less time to start a new thread. This means
Create a thread as needed and use thr_exit (3 T) to immediately kill the thread after the task ends.
It is much more cost-effective to switch among the many Idle threads.
 
7.5.4 bind a thread
 
The binding thread has a higher overhead than the non-binding thread. Because the bound thread can change its lwp attribute,
Lwp will not be cached after the binding thread exits. When a new binding thread is generated, the operating system will provide a new lwp.
Only when the thread needs only resources available in its lwp (such as a virtual timer or a specified
To realize real-time scheduling and make the thread visible to the kernel.
Thread.
Even if you want all threads to be active at the same time, you should also use non-bound threads. Because the thread is not bound
Allows Solaris to efficiently allocate system resources.
 
7.5.5 thread creation Guide
 
When using a thread, there are the following simple notes:
· Multi-threaded programming is used when multiple operations are performed for a large number of tasks.
· Use threads to make better use of the synchronization of CPU.
· Use the binding thread only when it is necessary, that is, when special support of lwp is required.
Use thr_setconcurrency (3 T) to tell Solaris how many threads you want to execute at the same time.
 
7.6 about multi-processor
 
The Solaris thread package enables you to take full advantage of the multi-processor. In many cases, programmers must be concerned about programs.
Whether to run in a single processor or multi-processor environment.
In this case, the multi-processor memory model is involved. You can't assume that a processor does something to the memory.
Changes can be seen by another processor immediately.
Another issue related to multi-processor is how to implement "multiple threads are executed down after reaching the same point"
Effective synchronization.
--------------------------------------
Note-if the synchronization primitive has been applied to the shared memory, the issues discussed here will not matter.
--------------------------------------
 
7.6.1 Basic Construction
 
If multiple threads use the Solaris thread synchronization function to access the shared storage area
The performance of a multi-processor environment is the same as that of a single-processor environment.
However, in many cases, some programmers want to give full play to the advantages of multi-processor, and want to use
Some "clever" methods to avoid thread synchronization functions. As shown in Examples 7-5 and 7-6, this method is dangerous.
Understanding the memory model supported by the general multi-processor structure helps to understand this risk.
The main multi-processor components are:
 
Processor itself
The CPU buffer (store buffers), which connects the processor and its high-speed cache (caches)
Cache stores recently accessed and modified storage addresses
Memory (memory), main memory, shared by all Processors
 
In a simple traditional model, multi-processor operations are like dealing with memory directly: A processor
After writing data to one memory unit, another processor B immediately reads the unit.
Newly written. High-speed cache can be used to accelerate the average memory access speed.
As a result, the expected results can indeed be achieved.
A problem with this simple method is that the processor must have a certain delay to ensure the expected semantic effect.
. Many new multi-processor architectures use various methods to reduce this latency, and the results have to change the memory.
The semantics of the model. In the following two examples, we will explain the two technologies and their effects.
 
7.6.1. 1 "Shared Memory" multi-processor system
 
Consider the producer/consumer solution for example 7-5. Despite the fact that this program is currently using
It is feasible on the system, but it assumes that all the multi-processor systems have a high degree of memory, so it is not
Portable.
 
Example 7-5 producer/consumer problems-multi-processor of shared memory
 
Char buffer [size];
Unsigned int in = 0;
Unsigned int out = 0;
Void producer (char item ){
Do
;/* Nothing */
While
(In-out = bsize );
Buffer [in % bsize] = item;
In ++;
}
 
Char consumer (void ){
Char item;
Do
;/* Nothing */
While
(In-out = 0 );
Item = buffer [out % bsize];
Out ++;
}

If this program has only one producer and one consumer, and runs in a shared memory
In a processor system, it seems to be correct. The difference between in and out is the number of products in the buffer zone. Producer
Wait until there is a free location, and the consumer waits until there is a product in the buffer zone.
For highly ordered memory (for example, one processor immediately changes the memory to another processor)
), This solution is correct (even if the in and out will eventually overflow, the program is still correct,
Because bsize is smaller than the maximum integer that can be expressed by word data ).
A multi-processor system with shared memory does not necessarily have highly ordered memory. One processor for memory
The change may not immediately notify other processors. If one processor changes the memory in two places
The device may not necessarily see two changes in the expected order, because the changes to the memory are not immediately executed.
Write operations are first stored in the CPU buffer, which is invisible to the cache. The processor Buffers
Data maintenance is reliable, but it is invisible to other processors. Therefore, when data is written to the cache
Other Processors believe that the write operation has not occurred.
Solaris synchronization primitives (see chapter 3) use special commands to write data from the CPU buffer to high speed
Cache. In this way, adding lock protection before and after access to shared data ensures memory consistency.
If the memory sequence protection is very loose, in Example 7-5, the consumer sees the in variable being producer
When the product item is added, it may not be placed in the product buffer. This situation is called weak ordering.
(Weak order), because the operation of one processor seems to disrupt the order of the other processor (but the memory
). The solution to this problem is to use mutex lock to forcibly update the height.
Cache speed.
The current processor tends to be "weakly ordered ". Therefore, the programmer must
Use locks. Locks are essential, as in Examples 7-5 and 7-6.
 
7.6.1.2 Peterson's Algorithm)
 
Example 7-6 is an implementation of Peterson's algorithm, which controls the mutual exclusion between two threads. This section
The Code tries to ensure that at most one thread can execute key code at a time, and then when a thread calls
Mut_excl (), it enters the key code at a very "near" moment.
 
It is assumed that the thread quickly exits after entering the key code.
 
Example 7-6 mutual exclusion between two threads?
 
Void mut_excl (INT me/* 0 or 1 */){
Static int loser;
Static int interested [2] = {0, 0 };
Int other;/* local variable */
 
Other = 1-Me;
Interested [Me] = 1;
Loser = me;
While (loser = Me & interested [other]);
/* Critical section */
Interested [Me];
}
This algorithm may run correctly when the multi-processor has a highly ordered memory.
Some multi-processor systems, including some iSCSI systems, all have CPU buffers. If a thread sends
A storage command is used to buffer data into the CPU. These data are eventually sent to the cache, but not immediately
(Note that the high-speed cache is visible and consistent for other processors, but data is not immediately written to the high-speed
Cache ).
If multiple memory addresses are written at the same time, these changes will arrive at the cache and memory in order, but there are
Delay. With this attribute, a multi-processor system is called full storage sequence.
(TSO: total store order ).
If one processor writes data to address a at a time and then reads address B, the other processor then writes data to address B.
Write, and then read the address. The expected result is that the first processor obtains the new value of B, or the second value.
The processor obtains the new value of B or both, but the old value does not exist for both processors.
However, this is impossible because of the latency in access from the CPU buffer zone.
What may happen in the Peterson's algorithm is that two threads are on two processors respectively.
Run, the data is stored in the CPU buffer, and then read the other one. They see the old value (0 ),
Indicates that other threads are not currently in the key code, so they enter the key code together (note:
The question may not be displayed during your test, but it may happen ).
If you use the thread synchronization primitive, you can avoid this problem. The synchronization primitive forces the CPU Buffer
Data is written to the cache.
 
7.6.1.3 parallel loop in a parallel computer with shared memory
 
In many applications, especially numerical computing, some parts of the algorithm can be parallel.
Some parts must be executed sequentially (as shown in Example 7-7)
 
Code Example7-7 multi-thread collaboration (barrier synchronization)
 
While (a_great_many_iterations ){
Sequential_computation
Parallel_computation
}
 
For example, you may obtain a matrix through strict linear calculation, and then perform
For row computing, create another matrix using the running results, and then perform parallel computing.
Parallel Algorithms for such computing do not require too much synchronization during computing, but the results must be guaranteed.
The Certificate results have been obtained.
If the execution of parallel computing takes longer than the creation and synchronization threads
Synchronization fails. However, if the computing time is not long, the thread creation and synchronization time are very important.
 
Conclusion 7.2
 
This programming guide contains the basic notes for Thread Programming. See Appendix A "sample application ".
We can see many of the features and styles discussed.
 
Recommended books:
Algorithms for mutual exclusion by micel Raynal (MIT Press, 1986)
Concurrent Programming by Alan burns & Geoff Davies
(Addison-Wesley, 1993)
Distributed algorithms and protocols by michelraynal (Wiley, 1988)
Operating system concepts by silberschatz, Peterson, & Galvin
(Addison-Wesley, 1991)
Principles of concurrent programming by M. Ben-Ari (Prentice-Hall, 1982)
 
--
※Source: · bbs.net.tsinghua.edu.cn · [from: sys11.cic. Tsing]

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.