OpenVPN mobility transformation-identify the client by the new session iD instead of IP/Port

Source: Internet
Author: User
Tags connection reset

Challenges to device mobility 1. devices often change IP addresses due to cell or mode switching. This kind of address update is a normal behavior of mobile networks and should not be regarded as a fault or accident. Therefore, applications should be transparent and applications should not be disturbed by such incidents, so they are not responsible for handling the aftermath. 2. When a mobile device has multiple 3G/4G/2.75G NICs, you want these NICs to send and receive data at the same time. Because these NICs generally belong to different carrier networks, their network architecture is different, generally, data packets must carry the IP address of the carrier's Nic as the source (this is generally used to perform NAT at the carrier's core network endpoint). Therefore, to support multi-carrier multi-nic load balancing, an application business flow data packet must support different IP addresses as the source. Unfortunately, even for UDP, most applications only support a single source (they call the bind for UDP socket) to reduce server complexity. 3. Frequent lost elevator, high-speed trains, blind spots in mountainous areas, the company's toilet... you will suddenly lose contact, and then suddenly appear! However, the application does not want to be so frustrated. For OpenVPN, after testing, a reconnection takes about five seconds, and the cost is high, push again ,... in fact, as long as your ping-restart time is small enough, the lack of signal will be quickly perceived by OpenVPN. The solution is to enlarge ping-restart, however, you do not know how long your connection has been lost.
4. RRC-related extra latency sometimes, even if you are in a good position, you will find that opening a webpage is very slow, and then it will quickly become faster. This is actually the essence of mobile networks, to reduce power consumption, the device is not always connected to the network, but runs the same mechanism as the Linux NOHZ algorithm. When the device does not send or receive data for a long time, close the connection. Different from the Linux NOHZ, the departure time of the NOHZ state is clear, which is determined by the expiration time of the next timer and the minimum value of any interruption beyond the clock, however, the RRC mechanism is different. when data is sent, it depends entirely on the same user. Therefore, when data is to be sent, you must re-access the mobile network and negotiate parameters, this will undoubtedly consume time. This jitter cannot be solved by the user application, because it depends on the implementation of the device manufacturer and the specifications of the mobile network. This is a pure network problem, so this article will not involve too much content in this regard.
The Session Layer is really too important. In view of the preemptive evolution of the TCP/IP stack, its opponent will never lose the opportunity. Therefore, applications generally use direct interfaces over the transport layer protocol. This is a fact! For the application development interface, the data sending and receiving of an application is directly based on an INET socket, and the "connection" of a socket is identified by a quintuple. Therefore, any element of the quintuple changes, or any network event will affect the corresponding socket. The socket I/O interface manual clearly specifies the return value and error code, applications that directly call these interfaces must handle such errors. Therefore, network events directly affect applications! However, network events should not affect applications. For example, if the network is disconnected, the application may not have to take care of the problem and re-connect the application. This may be a temporary event, for example, the IP address has changed. What an application needs to do is to generate and send business data. It does not need to directly obtain and process error codes from the socket interface. The application only needs to know how much data is sent. Even if a serious event needs to be completely exited, it should not be a notification from TCP/IP. A new layer is required. For the sake of history, I would like to call it a Session Layer.
In the story of OpenVPN, I hope that the processing layer of OpenVPN is completely different from the network status. Even if the IP address of the client changes, I can continue to communicate with the server with the new IP address, communication continues, that is, the network status does not disturb the processing of the OpenVPN process. To meet this requirement step by step, let's take a look at the current behavior of OpenVPN. After the two ends are connected, I tried to change the Client IP address and the server reported an error:
Wed Jan 1 00:58:46 2014 us = 439027 get inst by virt: 0e: fe: bc: a3: 6f: fe-> zhaoya/192.168.1.197: 33512 via 0e: fe: bc: a3: 6f: fe
Wed Jan 1 00:58:46 2014 us = 439981 zhaoya/192.168.42.197: 33512 UDPv4 WRITE [133] to 192.168.1.197: 33512: P_DATA_V1 kid = 0 DATA len = 132
Wed Jan 1 00:58:46 2014 us = 822941 TLS State Error: No TLS state for client 192.168.1.199: 33512, opcode = 6
Wed Jan 1 00:58:46 2014 us = 823912 get inst by real: 192.168.1.199: 33512 [failed]
Wed Jan 1 00:58:47 2014 us = 197871 MULTI: REAP range 128-> 144
Wed Jan 1 00:58:47 2014 us = 198861 TLS State Error: No TLS state for client 192.168.1.199: 33512, opcode = 6
Wed Jan 1 00:58:47 2014 us = 198887 get inst by real: 192.168.1.199: 33512 [failed]
...
In the above principle, the unfamiliar IP/Port is regarded as a new TLS session, but the TLS handshake of OpenVPN has nothing to do with the network. It is the TLS completed on BIO and uses the Reliable layer to ensure the reliability of the transmission process. But forgive me for this error. The original intention of the Code may be to prevent Dos attacks rather than anything else, because a connection cannot be normally inserted without a successful TLS handshake, otherwise, TLS will be discarded. Now let's start with our own implementation. The idea is very clear, just add a field in the OpenVPN protocol header: session ID, which is used by the server to identify and distinguish different clients, the Client IP/Port is no longer used to identify and distinguish different clients. In this way, as long as the OpenVPN packet sent by the client is received by the server, the resolved session ID can correspond to a multi_instance, so this packet is valid.
Therefore, the data sending and receiving of OpenVPN is completely isolated from the underlying network status. You only need to use the OpenVPN protocol to construct data packets. If the network condition is poor, the data cannot be sent out, but as long as the network recovers, it can be sent out. As long as it is sent and received by the server, it can be identified and resolved and correspond to a multi_instance. If the client IP address changes, as long as it maintains the accessibility to the server IP address, the data can be sent to the server. As long as the data can be sent to the server, the server can parse the session ID from the OpenVPN protocol package to correspond to a multi_instance.
The idea is clear, so how can we change it?
This article is not intended to express how the OpenVPN program is used on mobile devices. The main purpose of this article is to demonstrate a way to solve the problem, how can I verify the feasibility of the above ideas? I didn't get into the boiling code to implement the final solution, for example, directly modifying the OpenVPN protocol, but writing the code to death first, instantly draw a line or failure conclusion. This process requires the minimum modification of code! In order to locate the modification location, you must start with the above error. In ssl. the tls_pre_decrypt_lite function of c reports an error. This function does not have any information about multi_instance. Therefore, I know that the program has entered an abnormal stream before calling the tls_pre_decrypt_lite function, so I will find the call code of tls_pre_decrypt, in mudp. multi_get_create_instance_udp in c found:

Struct multi_instance * multi_get_create_instance_udp (struct multi_context * m ){... if (mroute_extract_openvpn_sockaddr (& real, & m-> top. c2.from. dest, true) {struct hash_element * he; const uint32_t hv = hash_value (hash, & real); struct hash_bucket * bucket = hash_bucket (hash, hv); hash_bucket_lock (bucket ); he = hash_lookup_fast (hash, bucket, & real, hv); if (he) {mi = (struct multi_instance *) h E-> value;} else {// exception stream processing if (! M-> top. c2.tls _ auth_standalone | tls_pre_decrypt_lite (m-> top. c2.tls _ auth_standalone, & m-> top. c2.from, & m-> top. c2.buf) {// exception stream processing }}...}

The key is that multi_instance is not found. Why? I found that the input parameter real of mroute_extract_openvpn_sockaddr is initialized based on the source IP address and port of the received data packet. When querying the multi_instance hash table, this real is the key value, after the IP address of the client is changed, of course no value can be found. Even if the value is also the value of the conflicting chain, NULL is returned! Next is the key point. Since the query hash table is not found and the source IP/Port of the data packet is changed, the query key is ignored, that is to say, we can find a query result. This idea is the key to quickly solve the problem. It is because the value corresponding to the key cannot be found that it fails. If the key can find the value, if it succeeds, the problem is converted to how to let the key find the value. We have a way to get the key from the buffer. In fact, this is another problem, isn't this a profound journey of results? I used this method in the high school physics competition! This idea is very simple, but there are not many people using it. Many people modify the OpenVPN protocol from the very beginning, and then debug it together at the end. This is a common practice for R & D tasks passed by the design scheme, but for pre-research or extreme development, this is absolutely undesirable! You don't even know whether your idea works in the existing framework of OpenVPN. How can you change the code from the beginning? R & D is not written as RD because they are not actually a department. At least employees have different ideas for solving problems. R departments focus on causal derivation, Zhiguo suain, Feasibility verification, testing, D department focuses on design, code quality, progress control, project management and various models (iterative waterfall ...). Therefore, we need to redefine hash_function and hash_compare so that they can return a value! The idea behind it is that after the hash key and hash compare results are fixed, if the client IP address is changed at this time, there is still no error, it indicates that the hash search process has nothing to do with the source IP address and port of the received data packet, the rest is to change the hash key from the fixed value to the protocol header of the received OpenVPN data. My new hash functions are as follows:
m->hash = hash_init (t->options.real_hash_size,               fake_addr_hash_function,               fake_addr_compare_function);

Where:
uint32_t fake_addr_hash_function(const void *key, uint32_t iv){    return 0x10101010;}bool fake_addr_compare_function(const void *key1, const void *key2){    return true;}

Just change this! The server can resolve the encapsulated IP packet to the unique multi_instance. However, it finds that the packet sent from the server to the client fails. I successfully connected the client to OpenVPN through 192.168.1.199, then, change the address of the OpenVPN client to 192.168.1.197. The machine where the OpenVPN client is located can ping the virtual IP address of the server. The server log is printed as follows:
Wed Jan 1 00:02:11 2014 us = 389812 zhaoya/192.168.1.199: 38310 UDPv4 WRITE [77] to 192.168.1.199: 38310: P_DATA_V1 kid = 0 DATA len = 76
I found that the destination address to be written is still 192.168.1.199. Why didn't I switch to my new address? I think this is a small problem. I just need to find the location where the above log is printed, and this is very simple. In the process_outgoing_link function of forward. c, pay attention to the following code:
ASSERT (link_socket_actual_defined (c->c2.to_link_addr));
It can be seen that this to_link_addr is the key. This value is generated when the OpenVPN client is connected and will not change in the future. I just need to change it to real-time update, that is, use the from address of the last packet unconditionally, which are in the context_2 struct:
struct context_2{...  struct link_socket_actual *to_link_addr;    /* IP address of remote */  struct link_socket_actual from;               /* address of incoming datagram */...}

The notes are clear! How to change it? You can use all the places where to_link_addr is used & from. Of course, I won't do this, because it is just a proof of feasibility. It's not just a simple code change. It's wrong to be so reckless, my approach is to add a temporary code:
Voidprocess_outgoing_link (struct context * c) {struct gc_arena gc = gc_new (); perf_push (PERF_PROC_OUT_LINK); # if 1 // scolded when I spoke, in fact, I often play {c-> c2.to _ link_addr = & c-> c2.from;} # endif ...}

So far, I think the original idea is feasible. Afterwards, I tried to add a 32-bit session ID in the OpenVPN protocol, and then completely changed the hash function. The input key is from the BPTR (buf) the session ID that is extracted from the client, and I have increased the number of OpenVPN clients to three. The same result is that the time is set at 01:35. If it is a rainy night, I may make more modifications, but it is a hot early summer night! The idea of modifying OpenVPN to adapt to mobility was suddenly thought of, because recent work is basically irrelevant to the network, and there is a messy and strict deadline, in this way, I cannot have so many leisurely and elegant moments at work time. I can only choose one rainy night. But the rain didn't come, so I had to make a half-pull. What else did I do? Isn't that all the above? Not all. Many details are not processed. For example, after the client changes the IP address, the server must update the data structure in the multi_instance corresponding to the client. For example, the ping of the echo mechanism replaces the ping mechanism of each sending, in addition, various restart reconnection mechanisms need to be replaced. in mobile environments, many network events that cause re-connection and reset of applications in non-mobile environments are normal, therefore, the re-Connection Reset operation must be minimized. There are two things to be done: one is to continue sending, and the other is to wait! The extension of this article cannot prevent the occurrence of the problem, so that the problem is ignored, so that it has no impact on itself. You can isolate the problem by adding one more layer! You can't keep tiangu from rain, but you can take an umbrella or put on rain boots, or change the activity to indoor, or cheer in the rain like me... if you study the atmospheric operating principles and shells that let the clouds scatter in order not to rain, you will be biased. Although you may eventually become a great scientist, at present, you may be affected by the rain.
In the process of locating the problem, do not add code details too early, and verify the feasibility and rationality in the fastest way. Then, we need to learn about the primary and secondary problems, simplify the nature of the primary problems, simulate the symptoms of secondary problems, and avoid wasting a lot of time and effort on secondary problems. It is impossible for you to deal with all the problems at a time. Learning to simulate the phenomena caused by current non-essential problems is a kind of skill. It is a kind of reasoning skill to draw real conclusions from a simplified environment. Let me give an example of my early years. One time, I failed to solve the file writing error for more than a week, but I finally failed to solve the problem. However, I learned how to recover the data in the ext2 file system, which is a problem that cannot be reached by eight pole, it is not implemented using debugfs. It is self-programmed! For personal curiosity, I have no loss, but it has delayed the progress. In fact, the most unfortunate thing is that coder, which is not curiosity, gets down to the kernel driver, then I got nothing or my face was swollen... ask me why I like rainy days. Maybe it can accumulate unstable small tuberculosis on the sky. That's a nonsense! I like rainy days because, in fact, I don't know... it's just like chatting about simulation! Some people especially do not like simulation. They especially like restoration of real scenes that are meaningless and time-consuming. In fact, this is absolutely impossible. You can only restore them as much as possible. In fact, you are also simulating them, you are also simulating. The Simulation for sublimation has removed the simulation content! If you have a layered model, you are lucky because it tells you what to simulate. A Web service fails. First, you will telnet instead of checking for any Web service faults! In order to implement the network port from which the HTTP request comes in and the HTTP response will respond to this requirement from the same network port on the server, you can use the ping with the source to test. There is no need to set up any HTTP server at all, this is the responsibility of IP routing and has nothing to do with HTTP. It is also because of this that the main problem can be grasped before this task can be handed over to network engineers who do not understand HTTP. When it comes to innovation, we need to simulate non-simulated environments, that is, simulate secondary environments to solve major problems. For the OpenVPN modification example mentioned in this article, if you want to implement a complete version at the beginning, it will be disgusting to understand the code, and then modify it. The debugging process will be time-consuming and painful, but it may not be necessary at last... be sure to control the variable variables that are easy to change. You can only manipulate one handle at a time! Sometimes, many people do not like to hear when I come up with a definite conclusion. They would rather say a conclusion with some room, because they all know that I can't simulate it. In fact, the scientific idea is not to simulate, just to simulate it! Of course, this is not suitable for software engineering, because software is more like social engineering than science! You will never be sure that this software has no vulnerabilities. You cannot smooth the plane or conduct ideological experiments in software engineering. The software that runs normally for 1000000 days may crash in the next second! Software people who have been brainwashed are all in a tense and worrying state. Of course, it is hard to understand the truth that cannot be simulated to draw a certain conclusion. But even so, when solving problems in the point rather than the engineering sense, the idea of simulating and not simulating can never be lost!

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.