The 11th Chapter Network programming
- All network applications are based on the same basic programming model with a similar overall logical structure and rely on the same programming interface.
- Web applications rely on many of the concepts that have been learned in system research, such as processes, signals, byte-map mappings, and dynamic storage allocations, all of which play an important role.
- We need to understand the basic client-server programming model and how to write client-server programs that use the services provided by the Internet.
- We will combine all these concepts to develop a small but fully functional Web server that provides both static and dynamic text and graphics content for real-world web browsers.
11.1 Client-Server programming model
1. Each network application is based on the client-server model . With this model, an application consists of a server process and one or more client processes. The server manages a resource and provides some kind of service to its clients by manipulating that resource.
Eg: a Web server manages a set of disk files that it stores and retrieves for clients. One FTP manages a set of disk files. Similarly, an e-mail server manages a number of files that are read and updated for the client.
2. The basic operation in the client-server model is a transaction .
0
3. A client-server transaction consists of four steps
1)当一个客户端需要服务时,它向服务器发送一个请求,发起一个事务。例如,当Web览器需要一个文件时,它就发送一个请求给Web服务器2)服务器收到请求后,解释它,并以适当的方式操作它的资源。例如,当Web服务器收到浏览器发出的请求后,它就读一个磁盘文件3)服务器给客户端发送一响应,并等待下一个请求。例如,Web服务器将文件发送回客户端;4)客户端收到响应并处理它。例如,当Web浏览器收到来自服务器的一页后,它就在屏幕上显示此页。
11.2 Network
- Clients and servers typically run on different hosts and communicate through the computer's hardware and software resources.
- For a host, the network is just another I/O device, as a data source and data receiver.
0
- An adapter that plugs into an I/O bus expansion slot provides a physical interface to the network. The data received from the network is copied from the adapter through the I/O and the memory bus to the memory, typically via DMA (translator Note: Direct memory access). Similarly, data can be copied from memory to the network.
1. One Ethernet segment, including cables and hubs.
Each cable has the same maximum bit bandwidth
The hub replicates each bit received on one port to all other ports without a resolution
Therefore, each host can see each bit.
2. Each Ethernet adapter has a globally unique 48-bit address.
It is stored on the non-volatile memory of this adapter. This frame is visible to each host adapter, but only the destination host actually reads it.
3. Bridging Ethernet
It consists of a cable and a bridge connecting multiple Ethernet segments to form a larger local area network. The cable transfer rates for connecting bridges can vary (e.g. 1gb/s between bridges and bridges, 100mb/s between bridges and hubs).
4. Bridge function: Connect different network segments.
When a to B transmits data in the same network segment, the frame arrives at the bridge input port, and the bridge discards it and does not forward it. A when transferring data to C in another network segment, the bridge copies the frame to the port connected to the corresponding network segment. Thus saving the bandwidth of the network segment
5. Basic capabilities of the Protocol software:
The naming mechanism assigns at least one Internet address to each host, thereby eliminating differences in the format of different host addresses, which uniquely identifies the host.
The transport mechanism encapsulates data in different formats so that it has the same format.
11.3 Global IP Internet
The global IP Internet is the most famous and successful internet implementation. Since 1969, it has existed in such or such form. Although the internal architecture of the Internet is complex and changing, the organization of client-server applications has remained fairly stable since the early the 1980s. Shows a basic hardware and software organization for an Internet client-server application. Each Internet host runs software that implements the TCP/TP protocol, which is supported by almost every modern computer system. The Internet's client and server mix uses socket interface functions and UNIX I/O functions to communicate. Socket functions are typically implemented as system calls that fall into the kernel and invoke TCP/IP functions of various kernel modes.
0
11.3.1 IP Address
An IP address is a 32-bit unsigned integer. The network program stores the IP address in the IP address structure shown.
0
11.3.2 Internet domain name
1. The Internet client and server communicate with each other using an IP address. To facilitate memory, the internet also defines a more user-friendly domain name, and a mechanism for mapping domain names to IP addresses. A domain name is a string of words separated by a period (letters, numbers, and dashes).
2. The Domain name collection forms a hierarchical structure in which each domain name encodes its position in this hierarchy. With an example you will easily understand this. Part of the domain name hierarchy is shown below. Hierarchies can be represented as a tree. The node of the tree represents the city name, and the path to the root forms the domain name. A subtree is called a subdomain. The first layer in the hierarchy is an unnamed root node. The next layer is a group of first-level domain names defined by non-profit organizations (the Internet Wine Name Number Association). Common first-tier domain names include com, edu, gov, org, net, which are assigned by ICANN's various authorized agents on a first-come-first-served basis. Once an organization has a two-level domain name, it can create any new domain name in the subdomain.
11.3.3 Internet connection
Internet clients and servers communicate by sending and receiving byte streams on the connection. In the sense of connecting a pair of processes, the connection is point-to-point. It is full-duplex from the point of view where the data can flow in both directions. and from (except for some, such as careless tiller operators cut off the cable causing disaster-to-sexual failure), the stream of bytes emitted by the source process is ultimately trusted by the destination process to receive it in the order in which it is issued.
11.4 Socket Interface
0
11.4.1 Socket Address Structure
From the Unix kernel's point of view, a socket is an endpoint of communication.
11.4.2 Socket function
0
11.4.3 Connect function
The client establishes a connection to the server through the Connect function.
0
11.4.4 open_clientfd function
0
11.4.5 bind function
11.4.6 Listen function
11.4.7 OPEN_LISTENFD function
11.4.8 Accept function
11.4.9 echo Client and server Example 11.5 Web server 11.5.1 Web base
The interaction between the Web client and the server is a text-based application-level protocol called HTTP.
HTTP is a simple protocol. A Web client (that is, a browser) opens an Internet connection to the server. The browser reads the content and requests some content. The server responds to the requested content, and then closes the connection. The browser reads it and displays it inside the screen
The main difference is that Web content can be written in HTML. An HTML Program (page) contains directives (tags) that tell the browser how to display various text and graphics objects on this page.
11.5.2 Web content
The Web server provides content to the client in two different ways:
1.取一个磁盘文件,并将它的内容返回给客户端。 2.运行一个可执行文件,并将它的输出返回给客户端。
11.5.3 HTTP Transactions
HTTP request
HTTP response
11.5.4 Service Dynamic Content
1. How the client passes program parameters to server 2. How the server passes parameters to the child process 3. How the server passes additional information to the child process 4. Where does the child process send its output
11.6 Synthesis: Tiny Web server
Tiny's main program
doit function
Clienterror function
Readrequestthdrs function
Parseuri function
servestatic function
servedynamic function
12th Chapter Concurrent Programming
Three basic ways to construct concurrent programs:
1. Process
Each logical control flow is a process that is dispatched by the kernel and the process has a separate virtual address space
2.I/O multiplexing
The logical stream is modeled as a state machine, and all streams share the same address space
3. Threads
A logical stream running in a single process context, dispatched by the kernel, sharing the same virtual address space
12.1 Process-based concurrent programming 12.1.1 process-based concurrent servers
Use the SIGCHLD handler to reclaim the resources of the zombie child process.
The parent process must close their respective CONNFD copies (attached descriptors) to avoid memory leaks.
The connection to the client terminates because the reference count in the socket's file table entry is until the CONNFD of the parent-child process is closed.
12.1.2 about the pros and cons of the process
1. Advantages: Prevent virtual memory from being overwritten incorrectly
2. Disadvantages: High overhead, the sharing of state information requires the IPC mechanism
12.2 Concurrent programming based on I/O multiplexing
Using the Select function requires the kernel to suspend the process and return control to the application only after one or more I/O events have occurred.
int Select (int n,fd_set *fdset,null,null,null); Returns the number of non-0 of the prepared descriptor, or1 If there is an error
The Select function handles a collection of type fd_set, called a descriptor set, as a vector of n-bit size:
bn-1,......, b1,b0
12.2.1 concurrency Event-driven server based on I/O multiplexing
I/O multiplexing can be used as the basis for event concurrency drivers.
State machine: A set of states, input events, output events, and transitions.
Self-loop: the transfer between the same input and output states.
Attention:
Init_pool: Initialize Client pool add_client: Add a new client to the active client pool check_clients: Loopback a text line from each prepared connected descriptor
Advantages and disadvantages of 12.2.2 I/O multiplexing Technology 1. Benefits
Compared to the process-based design, the programmer is given more control over the program
Runs in a single process context, so each logical stream can access the full address space of the process, and shared data is easy to implement
You can use GDB to debug
Efficient
2. Disadvantages
12.3 Thread-based concurrency programming
Each thread has its own thread context, including a thread ID, stack, stack pointer, program counter, general purpose register, and condition code. All threads running in a process share the entire virtual address space of the process. Because the thread runs in a single process, it shares the entire contents of the process's virtual address space, including its code, data, heap, shared library, and open files.
1. Threading Execution Model
Each process starts its life cycle as a single thread (the main thread), creates a peer thread at a time, starts running concurrently, and finally, because the main thread executes a slow system call or is interrupted, the control is passed to the peer thread through context switches.
2.Posix Threads
POSIX threads are a standard interface for processing threads in the C language, allowing programs to create, kill, and reclaim threads that share data securely with peer threads.
The code of the thread and the local data are encapsulated in a thread routine.
3. Create a thread
Threads create additional threads by calling Pthread_create.
int pthread_create (pthread_t *tid,pthread_attr_t *attr,func *f,void *
When the function returns, the parameter TID contains the ID of the newly created thread, and the new thread can obtain its own thread ID by calling the Pthread_self function.
pthread_t pthread_self (void); Returns the caller's thread ID.
4. Terminating a thread
A thread is terminated in one of the following ways.
When the thread routine of the when top layer returns, the line terminates Cheng.
By calling the Pthread_exit function, the line terminates Cheng
void Pthread_exit (void *thread_return);
5. Reclaim Terminated Thread Resources
The thread waits for another thread to terminate by calling the Pthread_join function.
int pthread_join (pthread_t tid,void * *thread_return); Success returns 0, error is not 0
6. Separating threads
At any point in time, threads are either associative or separable. A binding thread can be recovered by another thread and killed, and its memory resource is not released until it is recycled. The detached thread is the opposite, and the resource is freed automatically when it terminates.
int pthread_deacth(pthread_t tid);成功则返回0,出错则为非零
7. Initializing threads
Pthread_once allows initialization of state related to thread routines.
pthread_once_t once_control=pthread_once_init; int pthread_once (pthread_once_t *once_control,void (*init_routine) (void)); always returns 0
12.4 Shared variables in multi-threaded programs
A variable is shared. When and only if multiple threads refer to an instance of this variable.
12.4.1 Thread Memory model
- Each thread has its own separate thread context, including a unique integer thread ID, stack, stack pointer, program counter, general purpose register, and condition code.
- Registers are never shared, and virtual storage is always shared.
- Separate line stacks are stored in the stack area of the virtual address space and are usually accessed independently by the respective thread.
12.4.2 Mapping variables to memory
Global variables: Variables defined outside the function
Local automatic variable: A variable that is defined inside a function but does not have a static property.
Local static variable: A variable that is defined inside a function and has a static property.
12.4.3 Shared variables
A variable v is shared when and only if one of its instances is referenced by more than one thread. For example, the variable CNT in the sample program is shared because it has only one runtime instance, and the instance is referenced by two peer threads on the other hand, myID is not shared because each of its two instances is referenced by only one thread. However, it is important to realize that local automatic variables such as msgs can also be shared.
12.5 Synchronizing Threads with semaphores
Shared variables also introduce synchronization errors, that is, there is no way to predict whether the operating system chooses a correct order for threads.
0
12.5.1 Progress Chart
A progress map is the execution of n concurrent threads modeled as a trace line in an n-dimensional Cartesian space, where the origin corresponds to the initial state of no thread completing an instruction.
When n=2, the state is relatively simple, is more familiar with the two-dimensional coordinate diagram, the horizontal ordinate each represents a thread, and the conversion is represented as a forward edge
Conversion rules:
The legitimate conversion is to the right or up, that is, one instruction in a thread is completed
Two directives cannot be completed at the same time, i.e. diagonal lines are not allowed
The program cannot run in reverse, i.e. it cannot appear down or to the left
The execution history of a program is modeled as a trace line in the state space.
12.5.2 Signal Volume
- P (s): if S is nonzero, p will subtract s by one and return immediately. If S is zero, then the thread is suspended until s becomes nonzero.
- V (s): add s plus one, if there is any thread blocking in P operation waiting for s to become nonzero, then the V operation restarts one of the threads, then the thread will subtract s one and finish his P operation.
Semaphore invariance: A correctly initialized semaphore has a negative value.
Semaphore operation function:
int sem_init (sem_t *sem,0int value); // initialize semaphore to value int sem_wait (sem_t *s); // P (s) int sem_post (sem_t *s); // V (s)
12.5.3 using semaphores to achieve mutual exclusion
Semaphores provide a convenient way to ensure mutually exclusive access to shared variables. The basic idea is to associate each shared variable (or a set of related shared variables) with a semaphore. The semaphore that protects a shared variable in this way is called a two-dollar semaphore because its value is always 0 or 1. The two-dollar semaphore, which is intended to provide mutual exclusion, is often also referred to as a mutex. Performing P operations on a mutex is known as locking the mutex. Similarly, performing a V operation is known as unlocking the mutex lock. A thread that has a lock on a mutex but has not yet been unlocked is called an exclusive lock. A semaphore that is used as a counter for a set of available resources is called a count semaphore. The key idea is that this combination of P and V operations creates a set of states called the Forbidden Zone. Because of the invariance of the semaphore, there is no practical trajectory line that can contain the state in the forbidden area. Moreover, since the Forbidden Zone completely includes the unsafe zone, there is no practical track line that can touch any part of the unsafe zone. As a result, each practical trajectory is safe, and the program correctly increments the value of the counter regardless of the order of the runtime directives.
12.5.4 using semaphores to dispatch shared resources
The semaphore has two functions:
12.5.5 Synthesis: Pre-threading-based concurrent servers
In the concurrent server, we create a new thread for each new client The disadvantage of this approach is that we create a new thread for each new client, which results in a small price. A pre-threaded server attempts to reduce this overhead by using a producer-consumer model. A server is composed of a main thread and a set of worker threads. The main thread continuously accepts the connection request from the client and places the resulting connection descriptor in an unlimited buffer. Each worker thread repeatedly removes the descriptor from the shared buffer, serves the client, and then waits for the next descriptor.
0
12.6 using threading to improve parallelism
So far, in the study of concurrency, we have assumed that concurrent threads are in a single place many modern machines have multicore processors. Concurrent programs are usually executed on such a machine on the handler system. However, scheduling these concurrent threads on multiple cores in parallel, rather than dispatching them sequentially in a single kernel, is critical to exploiting this parallelism in applications such as busy Web servers, database servers, and large scientific computing code.
12.7 other concurrency issues 1. Thread Safety
Defines a four (disjoint) thread unsafe function class:
Functions that do not protect shared variables.
Maintain functions that span multiple call states.
Returns a function that points to a static variable pointer.
A function that calls the thread unsafe function.
2. Re-entry Accessibility
When they are called by multiple threads, no shared data is referenced.
1. Explicit re-entry:
All function arguments are pass-through, no pointers, and all data references are local automatic stack variables, not static or full-play variables.
2. Implicit re-entry:
The calling thread carefully passes pointers to non-shared data.
3. Competition
1. Why the competition occurs:
The correctness of a program relies on the x point of one thread to reach its control flow before another thread reaches the y point. That is, the programmer assumes that the thread will follow a particular trajectory through the execution state space, forgetting a guideline that the threaded program must work correctly for any viable trajectory.
2. Elimination Method:
Dynamically assigns a separate block to each integer ID, and passes to the thread routine a pointer to the block
4. Deadlock
1. A set of threads is blocked, waiting for a condition that will never be true.
Programmers use P and V to operate improperly so that the forbidden areas of two semaphores overlap.
Overlapping forbidden areas cause a set of states called deadlock zones.
Deadlock is a rather difficult problem because it is unpredictable.
2. Mutex lock Order rule: If for each pair of mutex (S,T) in the program, assign a whole order to all the locks, each thread requests the lock in this order, and is released in reverse, the program is deadlock-free.
Summarize
This week studied two chapters of the content, the 11th chapter in Liu Nian teacher's class has learned, has a certain foundation, this study more deeply. The 12th chapter of the content and operating system of the course has a certain connection, I have just finished gdb in-depth practice, where multithreading multi-process part also let me have some knowledge here.
Resources
1. "In-depth understanding of computer systems"
2.linux fork Function and child process parent process process successively http://blog.csdn.net/wu_zf/article/details/7640970
Information Security System Design Foundation 13th Week study summary--20135308