The first five minutes of troubleshooting on the server

Source: Internet
Author: User
When our team was responsible for O & M, optimization, and expansion for the previous company, we met various systems and basic devices with poor performance in different scales (most of large systems, such as CNN or the World Bank System ). If we catch up with the fixing time, the wonderful technical platform, the lack of information and documents, this process will be painful and leave us with deep memories. In the event of server faults, there are few possible causes. We will basically start with the following steps: 1. Try to figure out the cause and effect of the problem and do not immediately jump to the front of the server.

When our team was responsible for O & M, optimization, and expansion for the previous company, we met various systems and basic devices with poor performance in different scales (most of large systems, such as CNN or the World Bank System ). If we catch up with the fixing time, the wonderful technical platform, the lack of information and documents, this process will be painful and leave us with deep memories.

In the event of server faults, there are few possible causes. We will start with the following steps:

I. Clarify the cause and effect of the problem as much as possible

Do not immediately jump to the front of the server. First, you need to understand the number of known conditions on the server and the specific fault conditions. Otherwise, you will probably be in the trouble.

The following problems must be clarified:

  • What is the fault? No response? Error?

  • When was the fault discovered?

  • Can the fault be reproduced?

  • Is there a pattern that appears (for example, once every hour)

  • What is the last update of the entire platform (Code, server, etc )?

  • What are the specific user groups affected by the fault (logged-on, exited, in a certain region ...)?

  • Can basic architecture (physical and logical) documents be found?

  • Is there a monitoring platform available?(For example, Munin, Zabbix, Nagios, New Relic... Everything works)

  • Is there any log for viewing?. (For example, logugly, Airbrake, Graylog ...)

The last two are the most convenient sources of information, but don't hold too much hope. Basically, they don't have either. I can only continue to explore.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.