Introduction to OneAPM's practices in express Projects
[Editor's note] The OneAPM operation team recently found an article on github, dedicated to everyone. The author of this article, Mr. Wang Yu, has been using our products since the beginning of 2015 and is a loyal user of OneAPM.
OneAPM is an excellent performance monitoring platform. Why should we use performance monitoring? It's not just to show off how cool I am, just because we want to know it as soon as the problem occurs. The first time I found the problem and solved the problem in an invisible way, it was more comfortable than working overtime.
However, some people like to say, "some problems won't be affected if you keep them .」 But I think everything on the server will catch fire wherever it is smoke.
Some people like to say, "My code is definitely not prone to bugs .」 However, this is just a blow.
If you don't want to talk about it, just get it done.
The monitoring service of OneAPM mainly consists of the following parts:
Application Insight: Browser client monitoring Mobile Insight: Mobile client monitoring Infrastructure Insight: Server monitoring
To use OneAPM to monitor your project, you must first register a developer account at OneAPM.com.
Application Insight Application monitoring
After logging on to the platform, select a probe based on the language of your project. I use express here, so I chose nodejs. In OneAPM, I wrote a lot of details about how to install the probe, it is probably running in the project directory.
npm install OneAPM --registry http://npm.OneAPM.com
Copy the configuration file from node_modules/OneAPM and change the License Key.
After we have installed the probe, we will be able to see various icons in the Panel after several minutes for the plug-in to collect data.
The first thing to note is the response time chart.
This chart will give you a general impression on the server's time consumption. We can find that the slowest time of our project was around the evening of March 13, August 18, with a request about 1.25 s. Purple occupies the vast majority, which are the time consumed by external services.
The window in the upper-right corner is called apdex.
This is an indicator for evaluating user satisfaction. From this indicator, we can see whether users are satisfied with our response speed, in the top-right corner, we can see 1 [0.5] that 100% of our users are satisfied with our response speed. If the request is less than 0.5 seconds, we call it satisfactory. Here we use the default OneAPM settings. If it is less than 0.5 seconds, it indicates satisfaction. If it is 0.5-2 seconds, it is tolerable. If it is more than 2 seconds, it is not satisfied.
Cpm chart
This chart indicates the throughput.
We can see that when the project is the highest, there are about 80 requests per minute, with an average of 17.88 requests per minute.
Web transaction chart
This is a very important chart. Here we can see the worst-performing web transactions. We can find the controller function in the Code through the url to find the performance bottleneck in this interface.
Let's take a closer look at a request, the first express/POST/api/ex... (You can move the cursor to display all the URLs. In fact, this one is Expressjs/POST/api/exams/signup-all)
Click here to view the details of the interface.
There are some charts that only show the throughput and execution time of this interface. The specific meaning is similar to that described earlier, but the object to be investigated is changed to the unique interface. <喎?http: www.bkjia.com kf ware vc " target="_blank" class="keylink"> VcD4NCjxwPs7SyM/incubator + DQo8cD48aW1nIGFsdD0 = "breakdown" src = "http://www.bkjia.com/uploads/allimg/150830/0455406334-4.jpg" title = "\"/>
This chart reflects our interface calls to external applications (external) and databases. From the chart, we can find that every time we call this interface, we call 37 times an external service called xxxxxtct.com, which is an http protocol. The execution time accounted for 96.88%, and the other two databases were queried. Accounts for 0.49% and 0.07% respectively
Here, we will know how to optimize it ~~ Taking this interface as an example, the bottleneck here is mainly stuck in the location where 37 http requests are sent to xxxxxtct.com. This xxxxxtct.com is actually our own sub-system, if I write an interface in the subsystem and merge the content of the current 37 requests, this performance problem will be perfectly solved.
In addition, the Application Insight of OneAPM also provides us with the system topology, the bottleneck search function by web transactions, the bottleneck search function by SQL, and the specific execution time of external services (this is very important, check who is dragging us back) and monitor background services.
Finally, let's talk about the error rate table. This is my personal experience.
When a system exception is thrown, express may fail. Here are two chestnuts.
exports.show = function(req, res) { a.b //a == undefined}
Throw an exception
exports.show = function(req, res) { request.post({ url: xxx-service.com }, function(err, response, body) { a.b //a == undefined })}
Throw an exception and the service fails.
OneAPM is started by the express program. It is a sub-process of the express process. If express crashes, OneAPM also crashes. Therefore, it is impossible to send back the error message. The conclusion is that as long as the exception thrown in the callback, no probe can collect the error because it cannot be done at this layer.
Of course, although we have excellent process management tools such as pm2 to help us, we will automatically restart the service after it crashes... But we need to get the error message immediately .... Even if an exception is retained in the errpr. log of pm2, who will be able to stare at the error log.
To solve this problem, I wrote a piece of code to collect error logs, hoping to help you.
Var pm2 = require ('pm2'); var Slack = require ('slack-node'); pm2.launchBus (function (err, bus) {console. log ('connected'); bus. on ('Log: err', function (data) {var webhookUri = "{your slack webhook}"; var slack = new Slack (); slack. setWebhook (webhookUri); slack. webhook ({channel: "# general", username: "cq-tct", icon_emoji: ": ghost:", text: data. data}, function (err, response) {console. log (response );});});});
Save this section as err_notifier.js in the project root directory and run it after each service start.
Node err_notifier.js so that the Server Load balancer can receive an error immediately. Even if the service fails, it can be sent.
Another tool called slack is used here. slack is an office collaboration tool for instant messaging. I believe you have heard of it more or less (that is, the half-year valuation of a startup is USD 1.1 billion, the guy who changes to 2.8 billion in a year ). Hipchat is similar in foreign countries. I don't know much about it in China.
First, go to slack to apply for a team, create a room, open a webhook for the room, and assign the webhook address to webhookUri. In this way, no matter where we are, if the project reports an error, you can immediately receive error logs pushed through slack.
Of course, you can change the push tool to hipchat, email, and text message.
Last
Although OneAPM can help you pave the way in the early stages of development, it does not mean that with monitoring, you can be confused (as long as the project is smothered, OneAPM is clear at a glance ).
I think the most reliable approach is: strictly abide by various style guides to write code + A monitoring system + 100% coverage unit test + several integration tests + A set of reliable release processes.
Conclusion: OneAPM is very grateful to Mr. Wang for his support for our products. In the future, we will work harder to provide more value to our users.