Web Server and HTTP protocol (RPM)

Last Update:2016-11-07 Source: Internet

Author: User

Tags sprintf

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Transferred from: http://www.kuqin.com/shuoit/20150809/347488.html

have been looking for internships, a bit of something directly in Evernote inside remember, there is no time to update here. To find an internship is really a pain in the egg, has been looking for the difficult mode of C + + background development of this post, mainly because the other is not. Although I am looking for a C + + job, but my CV has two projects are PHP, because the Lao Zhao's project is to use PHP to do the site. Recently more and more feel this kind of resume is not reliable, want to change a C + + network-related multi-threaded project it. So recently prepare to point a few network and multi-threaded skill points. So I looked at the husky of tinyhttpd, Lightcgiserver and Wu Yi. Basically to the Wu Yi of the husky copied a paekdusan, but also can not be pure copy again Ah, so still changed some small things, big frame unchanged. The major changes include the following:

In the thread pool section, the C++11 thread is used instead of the pthread to achieve cross-platform goals
In a queue that supports concurrency, C++11 's mutexes and lock are used to replace Pthread's mutexes and lock to achieve cross-platform goals
In the socket section, a precompiled macro is used to achieve a cross-platform goal
The receiving data section is more robust to face the situation where an HTTP header cannot be read at once;
Implements an HTTP server with a simple keepalive policy
An HTTP server that implements a static file

tinyhttpd and Lightcgiserver

First, let's introduce tinyhttpd. The online evaluation is still very high, allowing people to understand the nature of the HTTP server only from 500-600 lines of code. Post a TINYHTTPD Flowchart:

About TINYHTTPD More detailed information, we still go directly to see the code, because it is really easy to read, easy to understand. TINYHTTPD code to People's feeling is how easy to read, easy to understand how to, such as the server reply to a 501 Method Not Implemented response is so written, see I was stunned, only blame I have seen the code too little, my first reaction is to sprintf to a long buff inside, and send it together, but it's a much more understandable and readable way to do it.

void unimplemented(int client) {     char buf[1024];        sprintf(buf, "HTTP/1.0 501 Method Not Implementedrn");     send(client, buf, strlen(buf), 0);     sprintf(buf, SERVER_STRING);     send(client, buf, strlen(buf), 0);     sprintf(buf, "Content-Type: text/htmlrn");     send(client, buf, strlen(buf), 0);     sprintf(buf, "rn");     send(client, buf, strlen(buf), 0);     sprintf(buf, "<HTML><HEAD><TITLE>Method Not Implementedrn");     send(client, buf, strlen(buf), 0);     sprintf(buf, "</TITLE></HEAD>rn");     send(client, buf, strlen(buf), 0);     sprintf(buf, "<BODY><P>HTTP request method not supported.rn");     send(client, buf, strlen(buf), 0);     sprintf(buf, "</BODY></HTML>rn");

In addition, it is worth mentioning that the TINYHTTPD implementation is a CGI server functionality, but the functionality of the CGI implementation is relatively simple, lightcgiserver implementation more complete, for more detailed information on CGI server, see CGI server

Husky and Paekdusan

As stated at the beginning of this article, the large program structure, Paekdusan basically is the husky copy, just made a few small changes. The program can be seen as a 生产者消费者模型 large structure.

First look at a nondescript flowchart:

As you can see, the main 生产者 thread is that the threads in the thread 消费者 pool are, and they communicate through the task queue. The main thread 生产者 acts as, after the accept is successfully returned, the task that processes the client is added to the task queue, and then the accept waits for the client to 消费者 arrive, and the threads of the thread Pool Act as Continuously pulls the task from the task queue and invokes the task's run interface.

It is worth noting that the task queue is BoundedBlockingQueue One, that is, the task queue is a capacity-constrained and blocked queue. When a consumer attempts to fetch a task from the task queue, if the task queue is empty, the consumer is blocked until the producer puts a task into the task queue and wakes the consumer. Similarly, when a producer attempts to put a task into a task queue, if the task queue is full, the producer is blocked until the consumer wakes up the producer by removing the task from the task queue.

Then look at a nondescript timing diagram:

This requires that the task implements the run interface, and the design command模式 of the task queue can be considered a practice. There appear to be many classes, but in fact, because each class function is relatively single, the program just put some functions of a single class together, in fact, the coupling between classes is relatively low.

Specific implementation of the code see here Paekdusan

Issue Record

Basic format of HTTP protocol
The first part of the request for HTTP is request line, and the three parts that are separated by a space are Method,uri and version in turn.
The second part of the request for HTTP is the end of Header,header with RN, and each row in the header ends with RN, that is, when the header is empty, end with an RN, and when the header is not empty, it must end with two consecutive RN. Each row in the Heder format is Key:value, where value can be empty, so simply put, the header is a map, between the key and the value is separated, the key value pairs are separated by RN, and at the end of the map there is an RN. It is important to note that the cookie is inside the header.
The third part of the HTTP request is the body, the Protocol stipulates that the body can no longer have other characters, so the body can not rely on to find RN to end, to rely on the content-length inside the header to specify, Content-length is the number of bytes in the body.
The first part of HTTP response is response line, and the three parts that are separated by a space are version,status code and reason Phrase
The second part of the HTTP response is the header, similar in format and request
The third part of HTTP response is body, which is similar in format to request
In addition, there is a point, I do not know if there is any possibility in the request, anyway in the response inside will appear, that is if the header indicates that transfer-coding is chunked, then the body will be a bunch of chunked blocks. In the rfc2616 of the HTTP protocol, the CHUNK-BODY format is so defined:
```
Chunked-Body   = *chunk                  last-chunk                  trailer                  CRLF  chunk          = chunk-size [ chunk-extension ] CRLF                   chunk-data CRLF chunk-size     = 1*HEX last-chunk     = 1*("0") [ chunk-extension ] CRLF  chunk-extension= *( ";" chunk-ext-name [ "=" chunk-ext-val ] ) chunk-ext-name = token chunk-ext-val  = token | quoted-string chunk-data     = chunk-size(OCTET) trailer        
```
That is to say, chunk consists of four parts, the first is a number of chunk blocks (each chunk block is composed of chunk-size, optional Chunk-extension,rn, Chunk-data and RN), Next is the Last-chunk block (chunk-size is 0, no special chunk block of Chunk-data), then trailer (a number of data in the same format as the header), and finally an RN. In fact, not to see the middle "optional chunk-extension" is still relatively simple.
the realization of keepalive
In the previously husky code, after the server sends response, it closes the socket, and if the client needs to send the HTTP request again, a new TCP connection needs to be established. While opening a common Web page, usually with a lot of HTTP request sent from the client to the server, it takes a lot of TCP setup and disconnection to be inefficient. The use of keepalive is to avoid repeated requests to establish a TCP connection, that is, after the server sends the response, does not close the connection, but on the connection to continue to wait for data. KeepAlive is enabled by default in HTTP1.1, and if you want to close it, you need to declare connection:close in the header.
But the simple keepalive will cause some problems, such as the client has been continuously open connection, then the connection with the client has been maintained, when the client is more, the new client cannot get the resources of the server, so need some other compromises. For example, if no data is received in the next 5s, the connection is disconnected, or the server in the next 5s receives 100 client requests.
Because "the server in 5s received 100 client requests to disconnect" This needs to be based on the information outside of a thread to control the running of the thread, so that the thread runs in the process for the external too much, so paekdusan is not so implemented, but on the same connection received 50 HTTP Disconnect after request. In addition, Paekdusan also realized "a connection duration of more than 5s disconnect", specifically, recv time-out period of 1s, each time after recv the data to determine whether the first recv time is more than 5s, more than the disconnection. See the following code:
```
//简单起见 删除了一些处理不完整http请求的代码，并且简化了now 和 startTime的设置 //详见https://github.com/aholic/paekdusan/blob/master/KeepAliveWorker.hpp while ((now - startTime <= 5) && requestCount > 0) {     recvLen = recv(sockfd, recvBuff, RECV_BUFFER_SIZE, 0);     if (recvLen <= 0 && getLastErrorNo() == ERROR_TIMEOUT) {         LogInfo("recv returns %d", recvLen);         continue;     }                if (recvLen <= 0) {         LogError("recv failed: %d", getLastErrorNo());         break;     }      //do with recvBuff, get a http request      
```
However, due to the use of blocking recv, the implementation is not very reasonable, there are some problems. For example, at this point the first time recv only 4.99 seconds, so (Now-starttime <= 5) Satisfied, continue to enter while loop, and then block on recv, recv set the timeout is 1 seconds, In fact, the last time to jump out of the while loop is 5.99 seconds past the first recv. For the moment there is no good way, because the recv is set to understand the return, while the number of cycles too much, the efficiency is not high. So it's better to have a mechanism for notification.
CGI Server
CGI server is typically to fork a process to execute the CGI script specified in the URI of the HTTP request, and through the environment variables, to the CGI Script to pass this request information, specifically how to do you can look at this article. But notice that Paekdusan is a multi-threaded server, so there are multithreading and multi-process, this is a very painful situation. Multi-threaded and multi-process mixing will have a lot of problems, this article has a detailed introduction, well, you may find it is a wall, then I would simply introduce what will be the problem. The first thing

needs to say is what happens when you call fork in a child thread: only one thread in the resulting child process, that is, the thread that called fork.

So the problem is, if another thread in the parent process acquires a lock that is sharing data between threads, the shared data is in a semi-modified state. But in a child process, the other threads are gone, what about the changes to the shared data? Also, the state of the lock has to be undefined. Also, even if your code is thread-safe, you cannot guarantee that the implementation of the lib you are using is thread-safe.

So the only reasonable way to use multiple processes in a multithreaded environment is to exec immediately after the fork, that is, to immediately say that the child process is replaced by a new program, so that all the data in the subprocess becomes unimportant and discarded. So, in fact, multi-threaded CGI server is reasonable, but need to pay attention to security issues. Because an open child process inherits the file descriptor of the parent process by default, that is, the child process can have read and write permissions to the file by the parent process.

I probably know so much, in more detail, FQ to read the original.
The thread
C++11 of the
c++11 is used to feel much better than the pthread on the previous Linux or Beginthread on Windows. A short code to show the basic usage.

void sayword (const string& Word) {for (int i = 0; i <; i++) {Co UT << word << endl; }} void Saysentence (const string& sentence) {for (int i = 0; i <; i++) {cout << sentence << Endl; }} int main () {string word = "Hello"; String sentence = "This is a example from cstdlib.com"; Thread T1 (std::bind (Sayword, ref (Word))); Thread T2 (saysentence, ref (sentence)); T1.join (); T2.join (); return 0;

To run the above code, you will find alternating output "Hello" and "This was an example from cstdlib.com". Note that the construction parameters of T1 and T2 look Strange, "Std::bind (Sayword, ref (word))" and "saysentence, ref (sentence)", the main thread function arguments are references, there is a template inside the bind here, I am also very difficult to explain clearly, the feeling of the template in the mouth. In addition, when you create a thread with a non-static class member function, you need to take this or a ref instance of the object in the parameter, otherwise it cannot be called.
C++11 's Mutex and lock
Notice that the code on the thread above is actually problematic, and that the two threads alternating output may be mixed, so lock. C++11 's mutex and lock are also very useful.
In the code, Lock_guard,lock_guard is the lock method that calls the mutex in the constructor, and the unlock method of invoking the mutex inside the destructor is convenient to use. Unique_lock and Lock_guard are similar, but many other member functions are used to match other mutex classes. On the use of mutexes and lock, [this blog] (http://www.cnblogs.com/haippy/p/3237213.html) said more detailed. The difference between the main and Pthread lock is that, in the pthread, repeated acquisition of an already acquired lock will not error, and in the c++11 will be an error.
C++11 's condition_variable
The task queue in Paekdusan is BoundedBlockingQueue , that is, a blocking and waking operation. So it involves condition_variable. Condition_variable mainly uses two functions: wait(unique_lock<mutex>& lck) and. notify_one() When a thread is blocked by wait, the Lck.unlock () function is called to release the lock, and when the thread is awakened by Notify_one, Lck.lock () is called to retrieve the lock to reply to the look before the wait. About the use of condition_variable, this blog said more detailed
Different points of the socket API on Windows and Linux
1. You need to include cerrno,sys/socket.h,sys/types.h,arpa/inet.h and unistd.h on the winsock.h;linux on Windows
2. Windows calls WSAStartup before calling the socket, and links ws2_32.lib with #pragma comment (lib, "Ws2_32")
3. The function of closing the socket on Windows is called close on Closesocket;linux.
4. Get error code on Windows with GetLastError (); View global variable errno on Linux, the meaning of the error code is not the same
5. When setting the So_rcvtimeo and So_sndtimeo options on Windows, the units are milliseconds; Linux is seconds
6. The prototype of the accept on Windows is accept (SOCKET, struct sockaddr, int), and the prototype of accept on Linux is accept (int, struct sockaddr, Socklen_t)
That's all I've ever met, and there's probably a lot more, but I haven't met yet.

Web Server and HTTP protocol (RPM)

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More