Site failure-Troubleshooting steps

Source: Internet
Author: User

As a medium-sized web site operations engineer, the real experience of the website GG, seeking the ideal troubleshooting steps, their own experience, add Netizen point of view

Website hung up,

1, ping my website main station IP, may be forbidden Ping, not pass, may be the computer room network problem, then go to ping Room of Gateway!

2, Computer room Network If there is no problem, then I will go to see what is the situation, server exception or Nginx error,

Then I will check the hardware, my site is a simple nginx load + external firewall, then I will look at access. Log

Statistics of this phase suspicious IP and behavior, if there is an attack, first pull black suspicious blacklist

3, tracert, I will look at my access to the site routing problems, can not cross-domain issues, unicom network access hung? or the telecom? See if DNS was hijacked.

4, this time I look at the server, my website program is Tomcat run, see if the Tomcat process is zombie, look at the log situation, in general,

As long as the load (LVS troubleshooting LVS---A little), there is no problem, generally do not stack HTTP requests on a server, that may load weight problems

, or my Tomcat (or other web container, memory setting issues)

5, yes, you can try single-point login a node to see, encountered internal program forwarding. Internal Curl Look,

Or use HttpRequest to see the post and get access put back that status code 200 is OK

Great God Explanation: the best solution:

"Senior" Royal Park--Big bro 2016/8/2 21:54:06

I'll take a look at the monitoring first, because monitoring basically you these tests, I have done.

By monitoring the data, first reduce the scope of the investigation. Targeted to find fault points, troubleshooting. You have this set down, it is estimated that the business interruption for some time.


"Senior" Royal Park--Big bro 2016/8/2 21:55:54

Fast response, minimizing the impact first. That's what you should do.

"Senior" Royal Park--Big bro 2016/8/2 21:56:09

The problem can be put back first, the business to restore up.

"Senior" Royal Park--Big bro 2016/8/2 21:56:23

Business is the key, problems can be slowly checked.

"Senior" Royal Park--Big bro 2016/8/2 21:56:41

Because there are logs, and monitoring data, you can slowly analyze where the specific business interruption is caused.

"Senior" Royal Park--Big bro

The whole work when you take over, it should be pre-consideration, the website hangs, how can immediately restore up, big company is user no sense of recovery. Small companies may have a slight impact because of various restrictions.

"Senior" Royal Park--Big bro 2016/8/2 21:59:55

Wait until the website hangs up, you are going to all sorts of check questions, you are already late.

"Senior" Royal Park--Big bro 2016/8/2 22:00:56

Personal opinion, for reference only.


Site failure-Troubleshooting steps

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.