The Java Thread Monitoring tour of Acer software

Last Update:2016-11-03 Source: Internet

Author: User

Tags apm acer

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Acer software Xu to

Hi, I'm on Haihong. Information Technology Co., Ltd. (hereinafter referred to as: Macro-wei software) mutually, Acer software was founded in 2005, is an e-commerce ERP software development-based high-tech software company, is committed to large network and e-commerce enterprises to provide professional, comprehensive, Tailor-made enterprise ERP management software and application solutions.

650) this.width=650; "height=" 283 "src=" http://static.oschina.net/uploads/space/2016/1103/143541_FS24_1792703.png "Width=" 479 "style=" Border:none;margin:auto;height:auto;/>

Acer e-commerce ERP software is the use of their SLB in the Ali system, and then on the ECS with Haproxy JBOSS (multi-process) to do the cluster, the ERP end and interface systems are implemented in the Java language. With the explosion of e-commerce business in recent years, Java in the performance of the problem is gradually prominent, the ERP end sometimes for no reason, the interface system ran ran no, or the process appears stuck.

650) this.width=650; "height=" 551 "src=" http://static.oschina.net/uploads/space/2016/1103/143710_meV4_1792703.png "Width=" "style=" Border:none;margin:auto;height:auto;/>

Acer e-commerce ERP software architecture

We have tried many ways to monitor the state of Java threads, such as starting with Jstat, JPS and other tools to read the state of the JVM, but also tried to use the Zabbix Java proxy, but can not meet the requirements of e-commerce ERP product performance monitoring. Later through the script call these tools read state, through the API write back to Zabbix to record and alarm, but still can not fully solve the Java thread stuck to the problem, because these data only the normal memory state, GC recovery status, what the thread is doing, operations is not known.

So we try to use the Jprofile to analyze the specific situation, but jprofile is very resource-intensive, unable to debug in the production environment, which led to a lot of systems without data through the cannot debug, and this situation lasted for a long time. Finally, our solution is to write a monitoring crawler, running on each machine, found that the hook does not crawl to trigger the restart of the Java process, and to carry out email alerts.

Later met the Cloud Intelligent Application Performance Management product perspective Bao Http://www.toushibao.com Engineers, to us to customize the e-Commerce ERP task system based Java thread APM monitoring, and the original scheduled task used by the Java quartz to achieve seamless docking.

650) this.width=650; "height=" 209 "src=" Http://static.oschina.net/uploads/space/2016/1103/143748_SOre_1792703.png "Width=" 533 "style=" Border:none;margin:auto;height:auto;/>

During the JBoss boot process, insert the perspective Treasure Agent, and by the business startup script through the configuration file to determine whether the business is to start monitoring, when the monitoring is started, we can see in the dashboard of the perspective treasure of each Java process detailed operation of the situation, From the main process to its derived sub-process, you can know exactly which part of the method is slow, what method is dead.

650) this.width=650; "height=" 416 "src=" http://static.oschina.net/uploads/space/2016/1103/143808_hwPN_1792703.png "Width=" "style=" Border:none;margin:auto;height:auto;/>

The perspective treasure realizes the performance data collection and analysis of running time code, SQL execution, API call process, deeply to the code level to locate performance bottleneck, analyze the cause of performance degradation, Help technology and OPS to crawl and analyze code behind real user actions from a large number of business requests execution of logical relationships and state such as the longest execution time and slow query.

Unlike the traditional Agent installation and configuration mode, the smart agent provided by the perspective Treasure provides a true one-click installation, for different operating system versions and various services, users do not have to face a variety of complex download and parameter configuration, Smart Agent will automatically discover all the services of the machine, Application and runtime code environment, the user confirmed that the system will automatically install the corresponding version of the monitoring plug-in, the entire process is fully automated implementation, the user does not need to do any manual configuration operations. For the maintenance and upgrading of complex systems, the perspective Bao also provides a convenient update solution to the Smart agent's health status of real-time monitoring, when the system environment changes, no need to reinstall and deployment, can effectively reduce the overall system maintenance costs in the cluster environment.

Qa:

Q: is the agent insertion of perspective treasure simple?

For:

650) this.width=650; "height=" 447 "src=" http://static.oschina.net/uploads/space/2016/1103/143844_Jlev_1792703.png "Width=" "style=" Border:none;margin:auto;height:auto;/>

The Perspective Treasure Agent insert is very simple, as long as the supported Java version, without manual configuration and download can automatically match the corresponding probe file, the insertion process is fully automated.

650) this.width=650; "height=" 359 "src=" http://static.oschina.net/uploads/space/2016/1103/143915_FAWZ_1792703.png "Width=" "style=" Border:none;margin:auto;height:auto;/>

Of course you can decide whether or not to insert the probe as needed, which is the switch we made in the script for reference.

Q: What do your crawlers crawl through?

For:

650) this.width=650; "height=" 383 "src=" http://static.oschina.net/uploads/space/2016/1103/143935_IugL_1792703.png "Width=" "style=" Border:none;margin:auto;height:auto;/>

650) this.width=650; "height=" src= "http://static.oschina.net/uploads/space/2016/1103/143949_Dh5T_1792703.png "Width=" 997 "style=" Border:none;margin:auto;height:auto;/>

Our crawlers crawl their health pages according to the configuration files, and the cluster profiles on each machine are created and distributed by our operations system, and each machine is synchronized with the script via slat.

Q: How to implement Java Quartz timing? Just see you say crawling hooks are health page, scheduled tasks also count?

A : timing Task Monitoring has always been a headache for us, we used a compromise approach, the quartz ran to JBoss serverlet inside, before the perspective of the treasure I can only know the process is gone, or the process is dead, If there is a problem with the internal code of the dispatch task of quartz, we do not know, because the return is normal at this time, we can only be judged by the pool of the conversion task.

Q: is there a specific strategy for crawling health pages with Python? such as how to trigger the alarm, when the alarm?

A : the page hook, script through cron or run in the background, script can support.

Q: is there a process recovery policy after the alarm?

A : simple rough, kill restart. Because the front-end has lb, when a problem is not big, we have another set of business in the test Tbschedule,taobao open source of a set of task system: Https://github.com/taobao/TBSchedule, later found that the task or stuck to die, In order to dig deeper into the code problem, it pushed a perspective treasure. OPS is now using the Tornado Flower celery, with a very happy and very stable. I think the key is not in what framework, but in what code performance, so to find code problems, APM must be on.

Q: How is the JVM out-of-heap memory monitored on the server?

A : at present, the heap outside memory is not monitored, because we have less things to do, simple rough can deal with the problem on the line.

Q: if the server JVM heap memory is stable, but the memory is a little higher, this will not be a problem?

A : this should be combined with business, if the business growth curve is synchronized, there are not too many problems. If there is no business growth, the memory is still increasing, then full GC, it is necessary to consider the code has a memory leak. Generally speaking, Linux memory is very normal, and full GC is not very frequent. Slow growth, running for a few days may not happen, if the JVM suddenly comes to a full GC for a long time, the application will suddenly get stuck.

Q: in the case of a service cluster deployment, will it run for a period of time to automatically restart the application, freeing up the memory consumed by the JVM?

A : No, the JVM itself will be recycled, just find the right memory configuration for your business.

Q: will your cluster restart some applications at regular intervals?

A : Our cluster does not restart regularly, but we publish very frequently, with 2 release windows a week, which is equal to 2 restarts a week.

650) this.width=650; "height=" 430 "src=" http://static.oschina.net/uploads/space/2016/1103/144447_zhXN_1792703.png "Width=" 430 "style=" Border:none;margin:auto;height:auto;/>

Cloud Intelligence is the business operation and maintenance solution service provider, its product monitoring Bao (www.jiankongbao.com), Perspective Bao (www.toushibao.com), Pressure measurement bao (www.yacebao.com), has been accumulated for e-commerce, mobile internet, advertising media, Online games, education, medical, financial securities, enterprises and other industries hundreds of thousands of users to provide a one-stop application performance monitoring, management and testing services.

The Java Thread Monitoring tour of Acer software

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More