In-depth understanding of computer system architecture-concurrent programming

Source: Internet
Author: User
Tags semaphore terminates

Concurrent programming

If logical control flows overlap in practice, then they are concurrency, a common phenomenon called concurrency, appearing on many different levels of a computer system.

Application-level concurrency is also useful in other situations:

    • Access slow I/O devices.
    • Interact with people.
    • Reduce latency by postponing work.
    • Service multiple network clients.
    • Parallel computing on multicore machines.

Applications that use application-level concurrency are called concurrent programs. The modern operating system provides three basic ways to construct concurrent programs:

    • Process. In this way, each logical control flow is a process that is dispatched and maintained by the kernel. Because the process has a separate virtual address space and wants to communicate with other flows, the control flow must use some explicit interprocess communication mechanism.
    • I/O multiplexing. In this form of concurrent programming, applications explicitly dispatch their own logical flows in the context of a process. The logical flow is modeled as a state machine, and after the data reaches the file descriptor, the main program explicitly transitions from one state to another. Because the program is a single process, all streams share the same address space.
    • Thread. A thread is a logical flow running in the context of a single process that is dispatched by the kernel. You can think of a thread as a mixture of two other ways, which is dispatched by the kernel like a process flow, and shares the same virtual address space like an I/O multiplexed stream.
Process-based concurrency programming

The simplest way to construct concurrent programming is to use processes, such as fork, exec, and waitpid, with functions that are familiar to everyone.

Steps:

1) The server listens for a connection request on a listener descriptor.

2) The server accepts a connection request from client 1 and returns a connected descriptor.

3) After accepting the connection request, the server derives a child process that obtains a full copy of the server descriptor. The child process closes the Listener Descriptor 3 in its copy, and the parent process closes its copy of the connected descriptor 4, because these descriptors are no longer needed.

4) The child process is busy servicing the client, and the parent process continues to listen for new requests.

Note: It is important for the child process to turn off the listener descriptor and the parent process to close the connected descriptor because the parent-child process is sharing the same file table, the reference count in the file table is incremented, and the file descriptor is actually closed only if the reference count is reduced to 0 o'clock. Therefore, if the parent-child process does not close the unused descriptor, these descriptors will never be released, eventually causing the memory to leak and eventually consume as much memory as possible, which is a system crash.

Issues to be aware of using process concurrency programming:

1) First, the server usually runs for a long time, so we have to include a SIGCHLD handler to reclaim the resources of the dead child process. Because when the SIGCHLD handler executes, the SIGCHLD signal is blocked while the UNIX signal is not queued, the SIGCHLD handler must be ready to reclaim the resources of several dead child processes.

2) Second, the sub-processes must close their respective CONNFD copies. As we have already mentioned, this is especially important for the parent process, which must close its connected descriptor to avoid memory leaks.

3) Finally, the connection to the client terminates because the reference count in the socket's file table entry is CONNFD until the parent-child process is closed.

Disadvantages:

For shared state information between parent and child processes, the process has a very clear model: share the File table, but do not share the user address space. The process has an independent address space that is both an advantage and a disadvantage. Thus, a process cannot accidentally overwrite the virtual memory of another process, which eliminates many confusing errors-a clear advantage.

On the other hand, separate address spaces make it more difficult for processes to share state information. In order to share information, they must use an explicit IPC (interprocess communication) mechanism. Another disadvantage of process-based design is that they tend to be slower because of the high overhead of process control and IPC.

Concurrent programming based on I/O multiplexing

1, facing the dilemma-the server must respond to two independent I/O events: 1) network client initiated connection request 2) The command that the user typed on the keyboard solves the I/O multiplexing technique. The basic idea is to use the Select function to require the kernel to suspend the process and return control to the application only after one or more I/O events have occurred.

You can use Select, poll, and epoll to achieve I/O multiplexing.

The pros and cons of I/O multiplexing technology:

1) Use event-driven programming, which gives the program more control over program behavior than a process-based design.

2) An I/O multiplexing-based event-driven server is run in a single process context, so each logical flow accesses the full address space of the process. This makes it easy to share data between streams. One of the advantages associated with running as a single process is that you can use familiar debugging tools, such as GDB, to debug your concurrent servers, just like in a sequential program. Finally, event-driven designs are often much more efficient than process-based designs because they do not require process context switches to dispatch new streams.

Disadvantages:

The disadvantage of a star in event-driven design is the complexity of coding. Our event-driven concurrent servers need to be three times times more than process-based. Unfortunately, complexity increases as the granularity of concurrency decreases. The granularity here refers to the number of instructions executed per time slice per logical stream.

Another major drawback of event-based design is that they cannot take full advantage of multicore processors.

Thread-based concurrent programming

In using process concurrency programming, we use a separate process for each stream. Each process is automatically called by the kernel. Each process has its own private address space, which makes it difficult for streams to share data. In concurrent programming using I/O multiplexing, we created our own logical flow and used I/O multiplexing to explicitly schedule the flow. Because there is only one process, all the streams share the entire address space. The thread-based approach, however, is a mixture of the two methods.

A thread is a logical flow that runs in the context of a process. Threads are automatically dispatched by the kernel. Each thread has its own thread context, including a unique integer thread ID, stack, stack pointer, program counter, general purpose register, and condition code. All threads running in a process share the entire virtual address space of the process.

Thread-based logical flow combines the characteristics of threads-based and I/O multiplexing-based streams. As with processes, threads are automatically dispatched by the kernel, and the kernel identifies threads by an integer ID. As with I/O multiplexing-based streams, multiple threads run in the context of a single process, thus sharing the entire contents of this thread's virtual address space, including its code, data, heaps, shared libraries, and open files.

Threading Execution Model

The multithreaded execution model is similar in some ways to a multi-process execution model. Each process starts its life cycle as a single thread, which is the main thread. At some point, the main thread creates a peer thread that, starting at this point in time, runs concurrently on two threads. Finally, because the main thread executes a slow system call, such as read and sleep, or because it is interrupted by the system's interval timer, the control switches to the peer thread through the context. The peer thread executes for a period of time, then the control is passed back to the main thread, and so on.

In some important ways, threads are executed differently from the process. Because the context of a thread is much smaller than the context of a process, the context switch of a thread is much faster than the context switch of a process. The other difference is that threads are not organized as a process, not by a strict parent-child hierarchy. A thread that is associated with a process makes up a peer (thread) pool, independent of the threads created by other threads. The main thread differs from other threads only in that it is always the first one running in the process. The primary impact of the concept of peer-pooling is that a thread can kill any of its peer threads, or wait for any of its peers to terminate. In addition, each peer thread can read and write the same shared data.

Posix Threads

Creating Threads

Threads create additional threads by calling the Pthread_create function:

The Pthread_create function creates a new thread and takes an input variable, arg, to run thread routine F in the context of the new thread. You can use the attr parameter to change the default properties of the newly created thread.

When Pthread_create returns, the parameter TID contains the ID of the newly created thread. A new thread can get its own thread ID by calling the Pthread_self function.

Terminating a thread

A thread is terminated in one of the following ways:

    • Line Cheng terminates when thread routines on the When top layer return
    • By calling the Pthread_exit function, the line terminates Cheng. If the main thread calls Pthread_exit, it waits for all other peer threads to terminate before terminating the main thread and the process, and the return value is Thread_return.
    • When a peer thread calls the Exit function, the function terminates the process and all threads associated with the process;
    • Another peer thread calls the function Ptherad_cancel with the current ID as a parameter to terminate the current thread.

Reclaim Resources for terminated threads

The Pthread_join function terminates until the thread TID terminates, assigns the (void*) pointer returned by the thread routine to the location pointed to by Thread_return, and then reclaims all memory resources that have been consumed by the terminated thread. Unlike wait, the function can only reclaim threads of the specified ID and cannot reclaim any thread.

Detach thread

At any point in time, threads are either associative or detached. A binding thread can be recalled by other threads and killed by its resources. Its memory resources (such as stacks) are not released until it is reclaimed by another thread. Instead, a disconnected thread cannot be recycled and killed by other threads. Its memory resource is automatically released by the system when it terminates.

By default, threads are created to be combined. To avoid memory leaks, each binding thread should either be Cheng by another thread or be separated by calling the Pthread_detach function.

Pthread_detach function separation can be combined with thread tid. Threads are able to detach themselves by using Pthread_detach pthread_self () as parameters.

Initialize thread: This function is used to initialize global variables shared by multiple threads.

Shared variables in multithreaded programs

From a programmer's point of view, one of the most compelling aspects of threading is that multiple threads can easily share the same program variables. However, this kind of sharing is also tricky. In order to write the correct threading program, we must have a clear understanding of what is called sharing and how it works.

To understand if a variable is shared, there are some basic questions to answer: 1) What is the underlying memory model of the thread? 2) According to this model, how are variable instances mapped to memory? 3) Finally, how many threads refer to these instances? A variable is shared when and only if multiple threads refer to an instance of the variable.

Thread memory Model:

A set of concurrent threads runs in the context of a process. Each thread has its own thread context, including thread ID, stack, stack pointer, program counter, condition code, and general purpose register value. Each thread shares the remainder of the process context with other threads. This includes the entire user virtual address space, which consists of read-only text (code), read/write data, heaps, and all shared library code and data regions. Threads also share the same set of open files.

From a practical point of view, it is not possible to have one thread read or write the register value of another thread. On the other hand, any thread can access any location of the shared virtual storage. If a thread modifies the location of the memory, then each of the other threads will eventually be able to see the change as it reads this position. Therefore, the registers are never shared, and the virtual memory is always shared.

The memory model of the respective independent line stacks is not so neat and clear. These stacks are stored in the stack area of the virtual address space and are usually accessed independently by the corresponding thread. We say that usually, not always, because different thread stacks are not fortified by other threads. So, if a thread gets a pointer to another line stacks in some way, it can read and write to any part of the stack.

To map a variable to storage:

In threaded C programs, variables are mapped to virtual memory according to their memory type:

    • Global variables. A global variable is a variable that is defined outside of a function. At run time, the read/write region of the virtual memory contains only one instance of each global variable that any thread can reference.
    • Local automatic variables. Local automatic variables are variables that are defined inside a function but do not have a static property. At run time, each thread's stack contains an instance of all its own local automatic variables. This is true even when multiple threads are executing the same thread routine.
    • Local static variable. A local static variable is a variable that is defined inside a function and has a static property. As with global variables, the read/write region of a virtual memory contains only one instance of each local static variable declared in the program.

Shared variables
We say that a variable v is shared when and only if one of its instances is referenced by more than one thread.

Synchronization and mutual exclusion of shared variables

1) Synchronizing Threads with semaphores

        Shared variables introduced synchronization errors.

        Progress Map:                                                                                                Track Line Example:                                                                        Critical Area(Unsafe zone):

Semaphore: is a signal to solve the synchronization problem, the semaphore s is a global variable with nonnegative integer values, there are two special operations to deal with (P and V):

P (s): if s nonzero, then p will subtract s by 1 and return immediately. If S is 0, then the thread is suspended until s becomes non-0;

V (s): v operation will add S 1.

Use semaphores to achieve mutual exclusion:

Dispatching shared resources with semaphores: In this scenario, one thread notifies another thread with a semaphore operation, and a condition in the program state is already true. Two Classic applications:

A) producer-consumer issues

Requirements: Access to the buffer must be guaranteed to be mutually exclusive, and access to the buffer needs to be dispatched, i.e. if the buffer is full (no empty slots), then the producer must wait until there is an empty slot, if the buffer is empty (that is, there is no desirable item), Then the consumer must wait until a project becomes available.

Note: 5~13 line, buffer initialization, mainly on the buffer structure of the related operations, 16~19 row, free buffer storage space, 22~29 line, production (empty slot, insert content in the empty slot); 32~4 line, consume (remove the contents of a slot to make the slot empty)

b) Reader-Writer's question

The thread that modifies the object is called the writer, and the thread of the read-only object is called the reader. It says that you must have exclusive access to the object, and that the reader can share the object with an unlimited number of other readers. Reader-Writer's question basically divides into two kinds: the first kind, the reader is preferred, asks not to let the reader wait, unless has already given the permission which uses the object to the writing person. In other words, the reader will not wait for a writer to wait, the second, the writer is preferred, the writer must be ready to write, it will do as much as possible to write the operation. Unlike the first type of question, a reader who arrives after a writer must wait, even if the writer is waiting. The following procedure gives the first category of readers-the answer to the writer's question:

Note: The semaphore W Controls access to the critical section of the shared object. Semaphore mutexes protect access to shared variable readcnt, readcnt count the number of readers in the current critical section. Whenever a writer enters the critical section, it locks the mutex W lock, and when it leaves the critical section, unlocks the W, which guarantees that there is a maximum of one writer at any point in the critical section; On the other hand, only the first reader to enter the critical section is locked to W, and only the last reader to leave the critical section unlocks the W.

Synthesis: Thread-based concurrent servers, which are based on pre-threading concurrent servers, need to create a new thread for each client, resulting in a small cost. A pre-threading server reduces this overhead by using the producer-consumer model shown. A server is composed of a main thread and a set of workgroup threads. The main thread continuously accepts the connection request from the client and places the resulting connection descriptor in a finite buffer. Each workgroup thread repeatedly removes the descriptor from the shared buffer, serves the client, and then waits for the next descriptor.

Examples of programs such as:

Note: 26~27 rows, generating workgroup threads, 29~32 rows, accepting connection requests from clients, putting these descriptors into buffers, 35~43 rows, the work to be done by each thread, 19 rows, initializing the thread-shared global variables. There are two ways to initialize it, one is to require the main thread to call an initialization function, and the second is to use the Pthread_once function to invoke the initialization function when a thread calls the ECHO_CNT function for the first time.

Other concurrency issues

1) Thread Safety

When writing programs with threads, we must carefully write functions that have properties called thread security. A function is called thread-safe, and it always produces the correct result when it is called repeatedly by multiple concurrent threads. If a function is not thread-safe, we say it is thread insecure.

We can define four (unwanted) thread unsafe function classes:

First Class: A function that does not protect shared variables.

The second class: a function that maintains a state that spans multiple calls. A pseudo-random number generator is a simple example of this type of thread unsafe function. The RAND function is thread insecure because the result of the scheduled call depends on the intermediate result of the previous call. When calling Srand to set a termination for Rand, we repeatedly call Rand from one but thread and can expect to get a repeatable sequence of random numbers.

Class III: Returns a function that points to a pointer to a static variable. Some functions, such as CTime and gethostbyname, place the result of the calculation in a static variable, and then return a pointer to the variable. If we call these functions from a concurrent thread, a disaster can occur because the results being used by one thread are silently overwritten by another thread.

There are two ways to handle this type of thread unsafe function. One option is to override the function so that the caller passes the address of the variable that holds the result. This eliminates all shared data, but it requires programmers to be able to modify the source code of the function.

If thread insecurity is difficult or impossible to modify, then another option is to use the lock-copy technique. The basic idea is to associate the thread unsafe function with the mutex, lock the mutex at each call location, invoke the thread unsafe function, copy the result returned by the function to a private memory location, and then unlock the mutex. To minimize the changes to the caller, you should define a thread-safe wrapper function that executes the locking-copy and then replaces the call to the thread unsafe function by calling this wrapper function.

Class Fourth: A function that invokes a thread unsafe function. If the function f calls the thread unsafe function g, then f is the thread unsafe? Not necessarily. If G is a second-class resource that relies on a state that spans multiple invocations, then f is also thread insecure, and in addition to overriding G, there is no way. However, if G is the first or third class function, then as long as you protect the call location and any resulting shared data with a mutex, F may still be thread safe.

2) can be re-entered

There is a class of important thread-safe functions called reentrant functions, characterized by the fact that they have a property that does not reference shared data when they are called by multiple threads. Although thread safety and reentrant are sometimes used as synonyms, there are clear technical differences between them. Represents a collection relationship between reentrant functions, thread-safe functions, and thread-unsafe functions. A reentrant function collection is a true subset of thread-safe functions.

Reentrant functions are generally more efficient than non-reentrant functions because no synchronous operation is required.

If all the function arguments are passed (without pointers), and all data references are local auto-stack variables (without referencing static or global variables), then the function is explicitly reentrant, regardless of the invocation, no problem.

Allows some parameters in an explicit reentrant function to be passed with a pointer, implicitly reentrant. When invoking a thread, be careful to pass a pointer to unshared data, which is reentrant. such as Rand_r.

Reentrant is the property of both the caller and the callee.

3) Competition

Competition occurs when the correctness of a program relies on one thread to reach the X point in its control flow before another thread reaches the y point.

4) Deadlock

The semaphore introduces a potentially offensive runtime error called a deadlock, which refers to a set of threads that are blocked, waiting for a condition that will never be true.

It is difficult to avoid deadlocks. When using binary semaphores to achieve mutual exclusion, the following rules can be used to avoid:

If you are using each pair of mutexes (s,t) in your program, each thread that contains both s and T also locks them in the same order, and the program is deadlock-free.

In-depth understanding of computer system architecture-concurrent programming

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.