Hide a pit under each radish.
The number of daily redirects of 955 short URLs is up to 4 million, the main cost is the user data storage and analysis of redirecting request. After experiencing the memory bottleneck, IO bottleneck, the peak reached the CPU limit, almost drained the machine, the following is a summary of experience sharing.
Predecessor condition
Because the short URL is very difficult to profit, hardware especially shabby, with shackles dancing instead, of course, human input, technical aspects can not be compared with other large sites, so if you want to shoot bricks please light hands-ouch.
We use the hardware: Sheng miniature, 1G memory, single core shared CPU. Later added an equivalent configuration of the Intranet machine to do MongoDB replset.
Startup Hardware costs:
Since the project itself is basically unable to bring benefits, to survive can only fully crush the hardware, bold use of new technology. According to the domestic cloud billing method, the general charge of the dimension is
Memory: Use asynchronous mode instead of synchronizing multiple processes.
Bandwidth: 2M dual-line, 301 does not require too much bandwidth overhead
Hard disk: Cloud hard drive, charge by capacity
CPU: Single Core
Therefore, we do the corresponding technology selection:
Nginx: No need to say more, right?
Tornado:facebook Open Source Python asynchronous micro-framework
MongoDB: Good performance, little memory overhead and less thermal data
Redis: In fact, MongoDB writes too much IO overhead
Nodejs (with Coffeescript): Late additions, node.js are inherently asynchronous
Supervisord: Monitoring Process
Let's take a picture--clicks.
Development and Operation Dimension
Since the current project investment in the development and operation of the only one person, it can be beautiful name Yue: DevOps. It sounds very high-end atmosphere internationalization.
User characteristics
The 28 rule is applicable: 20% of the URL takes up 80% of the resources (especially after we have started the statistics by default for all short URLs).
First-mover monitoring
The first thing a lot of small teams do is not monitoring, and it's too late to wait until the user tells you that the site can't be opened. For the sake of convenience we use the surveillance treasure and Aliyun monitoring (main Aliyun monitor has free SMS).
You should locate the cause of this problem every time you appear to be unable to open the site. If the frequency increases, you should consider coping strategies. Loadavg a good response to the system load, you can determine whether the hardware bottlenecks.
In the event of the incident, we can use these tools to view the system Status: Htop (which process to locate the problem), iftop (whether there are abnormal traffic and IP), iotop (positioning io bottleneck). In addition to reading the log.
If you are sleeping at the time of the incident, watch the monitoring history.
A painful lesson: hard drive capacity--leaving behind for the future
MongoDB will refuse to boot when the hard drive is not enough. Without the use of LVM as a tool, it will not be able to quickly expand capacity, while the domestic cloud is not as intelligent as Linode in the background to provide capacity of a key resize (although this feature has made the file system error). The consequence is likely to be a few hours ' downtime.
Bitter lesson Two: Maximum Open file descriptor
New problems are unavoidable in asynchronous mode--Maximum open file descriptor. We have met the maximum open file descriptor problem for Tornado and Nginx. Tornado performance is: CPU 100%, the log appeared in 500; Nginx in the log error, open slowly.
To avoid this type of problem, make the appropriate ulimit settings.
Only the current session (!important) is displayed with Ulimit-n. The correct approach is to view the process's limits:cat/proc/{$pid}/limits
You also need to set two parameters in the Nginx configuration file:
Worker_connections 9999; #根据自己的情况设置
Worker_rlimit_nofile 60000; #根据自己的情况设置
The following figure is the nginx to reach the upper limit of the monitor map, it is obviously stuck in the 1000 or so--linux default limit of 1024.
Three painful lessons: Python is not a language that is inherently asynchronous
To be honest, the process of designing with Python is not a pleasant process. To avoid potential coding problems, we used Python3. The following questions are:
Lack of asynchronous support:
Redis asynchronous driver only supports Python2 (of course, after about half a year, Tornado-redis's author has finally updated support for Python3).
Many components still can't support Python3, pip install directly after the feeling of the error is: dumbfounded.
Bitly's Asyncmongo is simply no document, and finally can only choose Motor.
Tornado's own documentation is not exhaustive.
Later, some of the components used Nodejs development, is simply encounter, Coffeescript grammatical sugar performance is also excellent.
Bitter lesson Four: careful selection of databases
Database is almost the most critical part of Web applications, the more the more the technical staff will be cautious selection. The fact that we put all the pressure on MongoDB is too radical.
The paradigm of MongoDB and the paradigm of inversion.
Almost anyone who has a smattering of MongoDB will tell you not to think in SQL MongoDB, and to use inline documentation to implement the requirements. But they forgot to tell you that the growing embedded document will cause IO bottlenecks (refer to the "Deep learning MongoDB" 73 pages).
In fact, there are a number of factors to consider in the form of normalization and inversion (embedded documentation).
MongoDB weakness in complex queries
In the face of queries that need to be computed, the map-reduce of MongoDB is slow; in complex cases, it is difficult to embed document processing; documents are less than MySQL. Young people, don't think about replacing a database the first time you encounter a problem with MySQL.
As far as this project is concerned, it is obviously difficult for the statistics section to quickly produce multiple reports.
Don't wait for the fire to remember MongoDB Replset
If the MongoDB write pressure is large and does not make a fragment, then simply adding the machine will not ease the write pressure. It helps if the reading pressure is down.
The database must be locked at least from stand-alone to Replset. The program code also needs to be modified. If you plan to switch to Replset, you need to prepare ahead of time.
Finally, our approach is to put the frequently updated data Redis, the timing of the brush into the database, the effect is obvious.
Correct use of Redis
Control the memory, control the starting cost
If you're going to save money, don't put everything in the redis, even if it doesn't seem to be much of a lot of data--and it takes up quite a bit of memory for a long time. In MongoDB, only hot data accounts for memory. The 28 rule also applies: Heat data accounts for only 20%.
Of course, if you are tyrants, please go away!
Don't use pub/sub as a queue.
Do not use Pub/sub as a queue if you do not want to lose data. Information about the subscription pipeline will be lost when the process restarts. You can use Lpush and brpop to implement queues.
I'm sick and tired of big clouds.
Intranet hosts are completely unable to access the extranet. Do you want to apt-get update? Buy the bandwidth temporarily.
A DDOS attack? A direct break in the net, without any notice, you still baffled.
IO performance is too bad, read and write about 5-6m/s time to hang up. Of course the Aliyun seems worse.
The last Advice
"Young man, read more reading more newspapers, more thinking and more learning--Wanfeng
See there must be a lot of people want to spray me, come on, my microblog is: @dai-jie, I changed the wrong, I changed ...
Original link: http://www.pmtoo.com/opinion/2013/0909/3553.html