Ngx_lua Application Best Practices

Source: Internet
Author: User
Tags ffi

Introduction:

The following text, is Upyun system development engineer Timebug in Segmentfault D-Day Nanjing Station Technology salon to share the content of the essence of refinement, the theme for the Upyun system development team in the business logic from the C module to the Ngx_lua of the migration process of experience, And a best practice scenario based on the Ngx_lua aspect of Nginx.

Upyun Public Number: Upaiyun

---------------------------------------------------------------------

The Ngx_lua is an nginx third-party extension module that embeds LUA code into nginx for execution.

Upyun's CDN uses Nginx as a reverse proxy server, and most of the business logic is already driven by Lua.

About this topic, previously in the OSC 2014 Beijing Station and Segmentfault D-Day 2015 Nanjing Station has done a simple sharing, slide in the "read the original" can be seen. However, two times share due to the lack of personal time arrangement, the content of the second half of the keynote is not done too much to unfold, it is somewhat regrettable, therefore, this article as a supplement will try to talk about this piece of text in the form of the content.

Ngx_lua and Openresty

Openresty is a relatively complete Web application development framework based on Nginx core, including many third-party excellent nginx C modules in Ngx_lua, and also integrates a series of commonly used lua-resty-* libraries such as Redis, MySQL, etc. In particular, Openresty relies on the Nginx core and the Luajit version are very well tested, but also hit a lot of necessary patches.

Upyun CDN is not directly based on the openresty to develop, but to learn from the Openresty organization, Ngx_lua and we need to use the Lua-resty-* class library directly integrated into their own maintenance. The reason for this is because we have many of our own C modules exist, at the same time the Nginx core will occasionally have some two development needs, but directly with Openresty will feel a bit inconvenient. In addition, the need to Ngx_lua place, or strongly recommend the direct use of openresty.

Performance of Lua

Compared to the C module, the LUA module has a natural advantage in the development efficiency, the language expression ability is also stronger, we currently in addition to some business-independent Foundation module preferred c to achieve, the other can use Lua basically use LUA. Here you may be concerned about the scripting language performance issues, in this regard, from our practice, in fact, we do not have to worry about, we have a few large business modules such as anti-theft chain, such as the use of Lua rewrite, on-line under pressure measurement and online operation process, there is no obvious signs of performance degradation. Of course, a big part of the credit is attributed to Luajit, which has a significant performance boost compared to the official Lua Vm,luajit, and you can also use the Luajit ffi to directly invoke C-level functions to optimize the performance hotspots that may exist in LUA code.

We currently use the online is Luajit the latest 2.1 development version, performance compared to stable version and a lot of promotion, specific reference Luajit this NYI list. In particular, it is recommended to use the Fork version maintained by Openresty, which is more secure.

As shown, Luajit will translate the hot LUA bytecode directly into the machine code cache at runtime.

In addition, through the Techempower website to openresty performance evaluation, compared to node. js, Cowboy, Beego, NGINX + Ngx_lua + Luajit This combination performance is very strong.

Meta-data synchronization and caching

Upyun CDN Line through the Redis master-slave replication by the central node to the peripheral node synchronization user Configuration, in addition, because Redis itself does not support encrypted transmission, we also based on the use of Stunnel to encrypt the transmission channel, to ensure the security of data transmission.

1) cache is balm!

Of course, it is not said that Redis on the node can be used directly as the main cache layer to use, you know from Nginx to Redis to obtain data is to consume a network request, and this millisecond level of network requests for the peripheral node huge traffic is unacceptable. So, here Redis assumes more of the role of data storage, while the main cache layer is on nginx shared memory.

According to the business characteristics, our cache content and data sources do not need to be strictly consistent, can tolerate a certain degree of delay, so it is simple to use the passive update cache policy. Ngx_lua provides a series of shared memory-related APIs (Ngx.shared.DICT) that make it easy to expire cache passively by setting the expiration time, and it is worth mentioning that when the cache capacity exceeds the pre-requested memory pool size, The Ngx.shared.DICT.set method attempts to phase out part of the content in the form of LRU.

The following code fragment gives a rudimentary implementation, of course, we will mention that this implementation is actually a lot of problems, but the basic structure is basically the same, you can see the following distinguish between 4 states, namely: Hit and miss, Hit_negative and No_data, The first two are for cases where there is data, and the latter is for situations where the data does not exist, in general, for No_data we will give a relatively short expiration time, because the data does not exist this situation is not a fixed boundary, easy to fill the capacity.

Local metadata = Ngx.shared.metadata

--local key, bucket = ...

Local value = Metadata:get (key)

If value ~= nil then

if value = = "404" Then

Return--Hit_negative

Else

Return value--hit

End

End

Local RDS = Redis:new ()

Local OK, err = Rds:connect ("127.0.0.1", 6379)

If not OK then

Metadata:set (key, "404",)--Expires 2 minutes

Return--No_data

End

Res, err = Rds:hget ("Upyun:"). Bucket, ": something")

If not res or res = = Ngx.null Then

Metadata:set (key, "404", 120)

Return--No_data

End

Metadata:set (Key, RES, +)--Expires 5 minutes

Rds:set_keepalive ()

return RES--MISS

2) What is the Dog-pile effect?

In the cache system, when the cache expires, if there is a large number of concurrent requests coming in, then these requests will fall to the backend database at the same time, may cause the server to lag or even down.

It is obvious that the above code also has this problem, when a large number of requests come in to query the same key cache returns nil, all requests will be connected to Redis until one of the requests to cache the value of the key again, and there is a time window between the two operations, there is no guarantee of atomicity:

Local value = Metadata:get (key)

If value ~= nil then

--Hit or hit_negative

End

--Fetch from Redis

Avoid dog-pile effect A common method is to use proactive update cache policy, with a scheduled task to actively update the cache value needs to change, so there will be no cache expiration, the data will always exist, but this is not suitable for our scene, and another common method is to lock, Only one request is allowed to update the cache at a time, and other requests are locked before the update is complete, ensuring the atomicity of both the query and update cache operations, without which the effect is not generated by the time window.

Lua-resty-lock-a non-blocking lock implementation based on shared memory.

First of all, let's eliminate the resistance to lock, in fact, the shared memory lock is very lightweight. First, it is non-blocking, that is, the lock waits does not cause the Nginx Worker process to block, second, because the lock implementation is based on shared memory, and always set an expiration time, so there is no need to worry about a deadlock, even if it is holding the lock of the Nginx worker It's crash.

So, next we just use this lock to update the cache as follows:

1, check the cache of a key is hit, if miss, then go to step 2.

2, initialize the Resty.lock object, call the lock method to lock the corresponding key, check the first return value (that is, the time to wait for the lock), if the return nil, the corresponding error treatment, and then go to step 3.

3, check again this key cache whether hit, if still miss, then go to step 4; Conversely, the lock is freed by calling the Unlock method.

4, through the data source (here is a redis) query data, the results of the query to cache, and finally by calling the Unlock method to release the current hold of the lock.

For specific code implementation please refer to: https://github.com/openresty/lua-resty-lock#for-cache-locks

What happens when a data source fails? No_data?

Similarly, we use the code snippet above as an example, when the Redis returns err, when the state is not miss is not no_data, and here unified it into the no_data, which may cause a serious problem, assuming that the line so a redis hangs, at this time, All updates to the cache will be marked as No_data state, and the old copy may still be available, but it may not be up to date, but now it's all empty data caches.

So what if we could get the cache out of the picture in this situation? The answer is yes.

Lua-resty-shcache-Implements a full cache state machine based on Ngx.shared.DICT and provides an adapter interface

Well, this library almost solves all of the problems we mentioned above: 1. Built-in cache lock implementation 2. Using stale copies in case of failure-STALE

So, do not want to toss the words, directly with it is. In addition, it provides a serialized, deserialized interface, taking Upyun as an example, our metadata raw format is JSON, in order to reduce the memory size, we introduced the Messagepack, So the final cache on Nginx shared memory is a binary byte stream that is further compressed by Messagepack.

Of course, we have added something on this basis, such as Shcache cannot differentiate between data in the data source and the data source is not connected in two states, so we added a new net_err state to mark the connection is not the case.

serialization, deserialization too time-consuming?!

Since Ngx.shared.DICT can only hold values in string form (Lua's strings and byte streams are the same thing), even if the cache is hit, it needs to be deserialized into LUA table before it is used. For both JSON and Messagepack, serialization and deserialization operations require some CPU consumption.

If your business scenario cannot tolerate this level of consumption, then you might want to try this library: Https://github.com/openresty/lua-resty-lrucache. It is implemented directly from the Luajit FFI, which caches the LUA table directly so that no additional serialization deserialization process is required. Of course, we have not yet tried to do so, if you want to do, it is recommended to add a layer of LRUCache on top of the Shcache shared memory cache layer, that is, a multi-level cache layer, and this layer of cache is worker independent, of course, cache expiration time should be set shorter.

Node Health Check

Passive health check and active health check

Let's look at the basic passive Health check mechanism of Nginx:

Upstream api.com {

Server 127.0.0.1:12354 max_fails=15 fail_timeout=30s;

Server 127.0.0.1:12355 max_fails=15 fail_timeout=30s;

Server 127.0.0.1:12356 max_fails=15 fail_timeout=30s;

Proxy_next_upstream error timeout http_500;

Proxy_next_upstream_tries 2;

}

Mainly by the Max_failes and fail_timeout two configuration items to control, indicating that in fail_timeout time if the server exception count cumulative reached Max_failes times, then in the next Fail_timeout time, We think that this server is down, that is, it will not forward the request to it during this time period.

Where the decision to forward to the back-end of the request is determined by the Proxy_next_upstream this directive, the default is only error and timeout, here is the new http_500, that is, when the backend response 500, we also think of the exception.

Proxy_next_upstream_tries is a directive introduced after the Nginx 1.7.5 version, which allows the custom retry count to be equal to the number of servers configured in upstream (except of course marked down).

But with a passive health check, we can never avoid the problem that we always have to forward real online requests to the back end that might have been down, otherwise we won't be able to sense the downtime of the machine that is currently recovering. Of course, NGINX Plus commercial version is active monitoring and inspection function, it through the Health_check this command to achieve, of course, we do not expand here said, said more are tears. In addition Taobao Open source Tengine also support this feature, suggest you can also try.

Lua-resty-checkups-the node Health check module implemented by pure LUA

This module is now highly customizable according to our own business characteristics, so there is no open source for the time being. Agentzh maintained Lua-resty-upstream-healthcheck (https://github.com/openresty/ Lua-resty-upstream-healthcheck) Module It's a lot like ours. But in many places the use of the habit is not the same, of course, if there is such a module, it might not be re-built wheels:-)

--App/etc/config.lua

_m.global = {

Checkup_timer_interval = 5,

Checkup_timer_overtime = 60,

}

_m.api = {

Timeout = 2,

Typ = "General",--HTTP, Redis, MySQL etc.

Cluster = {

{--Level 1

try = 2,

Servers = {

{host = "127.0.0.1", port = 12354},

{host = "127.0.0.1", port = 12355},

{host = "127.0.0.1", port = 12356},

}

},

{--Level 2

Servers = {

{host = "127.0.0.1", port = 12360},

{host = "127.0.0.1", port = 12361},

}

},

},

}

The above simply gives a configuration example of this module, checkups also includes both active and passive health check mechanisms, we see the above checkup_timer_interval configuration items, is used to set the active health check interval time.

In particular, we will create a globally unique timer timer at the initial stage of the Nginx worker, which polls the monitored backend nodes for heartbeat checks based on the set interval, and proactively removes the node from the available list if an exception is found and, conversely, re-joins in. The Checkup_timer_overtime configuration item, which is related to our use of shared memory locks, is used to ensure that even if the worker on the timer is crash by an exception, the other worker can start a new timer after the time expires, Of course, the surviving timer will always update the state of this shared memory lock.

Other passive health checks, with the Nginx core provided by the mechanism of similar, we also modeled their design, the only difference is that we provide a multi-level server configuration policy, such as the configuration of the above two server level, the default is always to use levels 1, when and only When all 1 of the nodes are down, they switch to Level 2, and in particular, multiple nodes per tier are polled by default, and of course we also provide a balanced strategy where configuration items can be specially set to consistent hashing. This way, both load balancing and primary and standby switching modes are met.

In addition, based on the Lua-upstream-nginx-module module, Checkups can also directly access the upstream configuration in the nginx.conf, or modify the state of a server so that active health checks can be used on the upstream module of the Nginx core.

Other

Of course, Ngx_lua in the Upyun there are many aspects of the application, such as streaming upload, the access rate control across multiple nginx instances, and so on, here does not start, this keynote also did not mention, we have the opportunity to talk about later.

-the end-

Ngx_lua Application Best Practices

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.