Web Server and HTTP protocol

Last Update:2016-11-30 Source: Internet

Author: User

Tags sprintf

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Https://toutiao.io/posts/xm2fr/preview

have been looking for internships, a bit of something directly in Evernote inside remember, there is no time to update here. To find an internship is really a pain in the egg, has been looking for the difficult mode of C + + background development of this post, mainly because the other is not. Although I am looking for a C + + job, but my CV has two projects are PHP, because the Lao Zhao's project is to use PHP to do the site. Recently more and more feel this kind of resume is not reliable, want to change a C + + network-related multi-threaded project it. So recently prepare to point a few network and multi-threaded skill points. So I looked at the husky of tinyhttpd, Lightcgiserver and Wu Yi. Basically to the Wu Yi of the husky copied a paekdusan, but also can not be pure copy again Ah, so still changed some small things, big frame unchanged. The major changes include the following:

In the thread pool section, the C++11 thread is used instead of the pthread to achieve cross-platform goals
In a queue that supports concurrency, C++11 's mutexes and lock are used to replace Pthread's mutexes and lock to achieve cross-platform goals
In the socket section, a precompiled macro is used to achieve a cross-platform goal
The receiving data section is more robust to face the situation where an HTTP header cannot be read at once;
Implements an HTTP server with a simple keepalive policy
An HTTP server that implements a static file

TINYHTTPD and Lightcgiserver

First, let's introduce tinyhttpd. The online evaluation is still very high, allowing people to understand the nature of the HTTP server only from 500-600 lines of code. Post a TINYHTTPD Flowchart:

About TINYHTTPD More detailed information, we still go directly to see the code, because it is really easy to read, easy to understand. TINYHTTPD code to People's feeling is, how easy to read, understand how to, such as the server reply to a 501 Method not implemented response is so written, see I was stunned, only blame I have seen the code too little, My first reaction was to sprintf into a long buff and send it together, but it was a much more understandable and easy-to-read notation.

void unimplemented (int client) {Char buf[1024];sprintf (BUF,"http/1.0 501 Method not implemented\r\n"); Send (client, buf,Strlen (BUF),0);sprintf (buf, server_string); Send (client, buf,Strlen (BUF),0);sprintf (BUF,"content-type:text/html\r\n"); Send (client, buf,Strlen (BUF),0);sprintf (BUF,"\ r \ n"); Send (client, buf,Strlen (BUF),0); sprintf (buf,  "strlen (BUF), 0); sprintf (buf,  "</title>strlen (BUF), 0); sprintf (buf,  "<body><p>http Request method not Supported.\r\n "); Send (client, buf, strlen (BUF), 0); sprintf (buf,  "</body>strlen (BUF), 0);}

In addition, it is worth mentioning that the TINYHTTPD implementation is a CGI server functionality, but the functionality of the CGI implementation is relatively simple, lightcgiserver implementation more complete, for more detailed information on CGI server, see CGI server

Husky and Paekdusan

As stated at the beginning of this article, the large program structure, Paekdusan basically is the husky copy, just made a few small changes. The program can be seen as a producer-consumer model in large structures.

First look at a nondescript flowchart:

As you can see, the main thread is the producer, and the threads in the thread pool are consumers who communicate through the task queue. The main thread is the producer, and after the accept successfully returns, the task that handles the client is added to the task queue, and then the accept waits for the client to arrive; thread pool threads, as consumers, constantly pull the task from the task queue, Call the task's run interface.

It is worth noting that the task queue is a boundedblockingqueue, that is, the task queue is a capacity-constrained and blocked queue. When a consumer attempts to fetch a task from the task queue, if the task queue is empty, the consumer is blocked until the producer puts a task into the task queue and wakes the consumer. Similarly, when a producer attempts to put a task into a task queue, if the task queue is full, the producer is blocked until the consumer wakes up the producer by removing the task from the task queue.

Then look at a nondescript timing diagram:

This requires the task to implement the run interface, and the design of the task queue can be thought of as the practice of command mode. There appear to be many classes, but in fact, because each class function is relatively single, the program just put some functions of a single class together, in fact, the coupling between classes is relatively low.

Specific implementation of the code see here Paekdusan

Issue record

Basic format of HTTP protocol
The first part of the request for HTTP is request line, and the three parts that are separated by a space are Method,uri and version in turn.
The second part of the request for HTTP is header,header with \ r \ n, and each row in the header ends with \ r \ n, which means that when the header is empty, it ends with a \ r \ n; When the header is not empty, it must be two consecutive \r\ n End of. Each row in the Heder format is Key:value, where value can be empty, so simply put, the header is a map, between the key and the value is separated, the key value pairs are separated by \ r \ n, and at the end of the map there is a \ r \ n. It is important to note that the cookie is inside the header.
The third part of the HTTP request is the body, the Protocol stipulates that the body can no longer have other characters, so the body can not rely on to find \ r \ n to end, to rely on the content-length in the header to specify, Content-length is the number of bytes in the body.
The first part of HTTP response is response line, and the three parts that are separated by a space are version,status code and reason Phrase
The second part of the HTTP response is the header, similar in format and request
The third part of HTTP response is body, which is similar in format to request
In addition, there is a point, I do not know if there is any possibility in the request, anyway in the response inside will appear, that is if the header indicates that transfer-coding is chunked, then the body will be a bunch of chunked blocks. In the rfc2616 of the HTTP protocol, the CHUNK-BODY format is so defined:
```
Chunked-body   = *chunk                 last-chunk                 trailer                 crlfchunk          = chunk-size [chunk-extension] CRLF                  Chunk-data crlfchunk-size     1*hexlast-chunk     1* ("=" chunk-ext-val]) Chunk-ext-name = Tokenchunk-ext-val  = Token | quoted-string chunk-data = Chunk-size (OCTET) trailer = * (Entity-header CRLF) 
```
That is to say, chunk consists of four parts, the first is a number of chunk blocks (each chunk block by chunk-size, optional chunk-extension,\r\n, chunk-data and \ r \ n), Next is the Last-chunk block (chunk-size is 0, no special chunk block of Chunk-data), then trailer (a number of data in the same format as the header), and finally a \ r \ n. In fact, not to see the middle "optional chunk-extension" is still relatively simple.
The realization of keepalive
In the previously husky code, after the server sends response, it closes the socket, and if the client needs to send the HTTP request again, a new TCP connection needs to be established. While opening a common Web page, usually with a lot of HTTP request sent from the client to the server, it takes a lot of TCP setup and disconnection to be inefficient. The use of keepalive is to avoid repeated requests to establish a TCP connection, that is, after the server sends the response, does not close the connection, but on the connection to continue to wait for data. KeepAlive is enabled by default in HTTP1.1, and if you want to close it, you need to declare connection:close in the header.
But the simple keepalive will cause some problems, such as the client has been continuously open connection, then the connection with the client has been maintained, when the client is more, the new client cannot get the resources of the server, so need some other compromises. For example, if no data is received in the next 5s, the connection is disconnected, or the server in the next 5s receives 100 client requests.
Because "the server in 5s received 100 client requests to disconnect" This needs to be based on the information outside of a thread to control the running of the thread, so that the thread runs in the process for the external too much, so paekdusan is not so implemented, but on the same connection received 50 HTTP Disconnect after request. In addition, Paekdusan also realized "a connection duration of more than 5s disconnect", specifically, recv time-out period of 1s, each time after recv the data to determine whether the first recv time is more than 5s, more than the disconnection. See the following code:
```
For simplicity, some code to handle incomplete HTTP requests is removed, and the settings for now and StartTime are simplified//see https://github.com/aholic/paekdusan/blob/master/KeepAliveWorker.hpp while ((now-starttime <= 5) && requestcount > 0) {Recvlen = recv (SOCKFD, Recvbuff, recv_buffer_size, 0); if (recvlen <= 0 && getlasterrorno () = = Error_timeout) {Loginfo ( "recv returns%d", Recvlen); continue;} if (recvlen <= 0) {LogError ( " Recv failed:%d ", Getlasterrorno ()); break;} //do with Recvbuff, get a HTTP request requestcount--;}      
```
However, due to the use of blocking recv, the implementation is not very reasonable, there are some problems. For example, at this point the first time recv only 4.99 seconds, so (Now-starttime <= 5) Satisfied, continue to enter while loop, and then block on recv, recv set the timeout is 1 seconds, In fact, the last time to jump out of the while loop is 5.99 seconds past the first recv. For the moment there is no good way, because the recv is set to understand the return, while the number of cycles too much, the efficiency is not high. So it's better to have a mechanism for notification.
CGI Server
CGI server is typically to fork a process to execute the CGI script specified in the URI of the HTTP request, and pass the information of this request to the CGI script through the environment variables, so you can see this article. But notice that Paekdusan is a multi-threaded server, so there are multithreading and multi-process, this is a very painful situation. Multi-threaded and multi-process mixing will have a lot of problems, this article has a detailed introduction, well, you may find it is a wall, then I would simply introduce what will be the problem.
The first thing to note is what happens when you call fork in a child thread: there will only be one thread in the resulting child process, that is, the thread that called the fork.
So the question is, if the other thread in the parent process acquires a lock that is sharing data between threads, the shared data is in a semi-modified state. But in a child process, the other threads are gone, what about the changes to the shared data? Also, the state of the lock has to be undefined. Also, even if your code is thread-safe, you cannot guarantee that the implementation of the lib you are using is thread-safe.
So the only reasonable use of multi-process in the multi-threaded environment, only the fork immediately after the exec, that is, the child process is immediately replaced by a new program, so that all the data in the child process becomes unimportant, are discarded. So, in fact, multi-threaded CGI server is reasonable, but need to pay attention to security issues. Because an open child process inherits the file descriptor of the parent process by default, that is, the child process can have read and write permissions to the file by the parent process.
I probably know so much, more detailed or FQ to read the original.
C++11 's Thread
C++11 's thread used to feel more useful than the previous Linux pthread or beginthread on Windows, and a short code to show the basics.
```
voidSayword(Const string& Word) {for (int i =0; I <1000; i++) {cout << word << Endl;}}voidSaysentence (const string& sentence) {for (int i = 0; i < 1000; i++) {cout < < sentence << Endl; }} int main  () {string word =  "hello"; string sentence = " This is a example from cstdlib.com "; thread t1 (Std::bind (SayWord, ref ( Word)); thread t2 (saySentence, ref ( sentence)); T1.join (); T2.join (); return 0;}         
```
Running the above code, you will find alternating output "Hello" and "This is a example from cstdlib.com". Note that the construction parameters of T1 and T2 look Strange, "Std::bind (Sayword, ref (word))" and "saysentence, ref (sentence)", the main thread function arguments are references, there is a template inside the bind here, I am also very difficult to explain clearly, the feeling of the template in the mouth. In addition, when you create a thread with a non-static class member function, you need to take this or a ref instance of the object in the parameter, otherwise it cannot be called.
C++11 's Mutex and lock
Notice that the code on the thread above is actually problematic, and that the two threads alternating output may be mixed, so lock. C++11 's mutex and lock are also very useful.
```
Mutex MTX;voidSayword (const string& word) {for ( Span class= "Hljs-keyword" >int i = 0; i < 1000; i++) {lock_guard<mutex> lock (MTX); cout < < word << Endl; }} void saysentence (const string& sentence) {for (int i = 0; i < 1000; i++) {lock_guard<mutex> lock (MTX); cout < < sentence << Endl; }} 
```
In the code, Lock_guard,lock_guard is the lock method that calls the mutex in the constructor, and the unlock method of invoking the mutex inside the destructor is convenient to use. Unique_lock and Lock_guard are similar, but many other member functions are used to match other mutex classes. On the use of mutexes and lock, [this blog] (http://www.cnblogs.com/haippy/p/3237213.html) said more detailed. The difference between the main and Pthread lock is that, in the pthread, repeated acquisition of an already acquired lock will not error, and in the c++11 will be an error.
C++11 's condition_variable
The task queue in Paekdusan is Boundedblockingqueue, which means that there is a blocking and waking operation. So it involves condition_variable. Condition_variable consists of two functions: Wait (unique_lock<mutex>& lck) and Notify_one (). When a thread is blocked by wait, the Lck.unlock () function is called to release the lock, and when the thread is awakened by Notify_one, Lck.lock () is called to retrieve the lock to reply to the look before the wait. About the use of condition_variable, this blog said more detailed
Different points of the socket API on Windows and Linux
1. You need to include cerrno,sys/socket.h,sys/types.h,arpa/inet.h and unistd.h on the winsock.h;linux on Windows
2. Windows calls WSAStartup before calling the socket, and links ws2_32.lib with #pragma comment (lib, "Ws2_32")
3. The function of closing the socket on Windows is called close on Closesocket;linux.
4. Get error code on Windows with GetLastError (); View global variable errno on Linux, the meaning of the error code is not the same
5. When setting the So_rcvtimeo and So_sndtimeo options on Windows, the units are milliseconds; Linux is seconds
6. The prototype of the accept on Windows is accept (SOCKET, struct sockaddr, int), and the prototype of accept on Linux is accept (int, struct sockaddr, Socklen_t)
That's all I've ever met, and there's probably a lot more, but I haven't met yet.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More