Linux troubleshooting Issues in the first 5 minutes

Source: Internet
Author: User

As far as possible to understand the cause and the cause of the problem do not immediately stick to the server front, you need to understand how much of this server is known, and the specifics of the failure, otherwise you are likely to be in the aimless must be clear of the problem:
    • What is the performance of the fault? No response? Error?
    • When was the fault discovered?
    • Can the fault be reproduced?
    • There is no pattern of occurrence (e.g. once an hour)
    • What is the last update to the entire platform (code, server)?
    • What are the specific user groups affected by the failure (logged in, exited, a region ...) )?
    • Can the infrastructure (physical, logical) documents be found?
    • Is there a monitoring platform available?
    • Is there a log to view?
The last two is the most convenient source of information, but do not hold too much hope, basically they will not have, can only continue to explore who is in?
    1. $ w
    2. $ last

Use these two commands to see who is online and which users have visited. This is not a critical step, but it is best not to debug the system while other users are working. Saying goes a mountain not two tigers what happened before?
    1. $ history

Look at the commands that were executed on the previous server. It's always right to look at it, plus the information you've seen in front of you, should be a bit of a use. In addition, as the admin to pay attention, do not use their own rights to violate the privacy of others! Here to remind you that when viewing history, you need to update the HISTTIMEFORMAT environment variable to show when these commands were executed. (PS: Personal supplement) configuration Histtimeformat can use the following command:
    1. $ export histtimeformat="%F%T"


The process that is running now
    1. $ pstree-a
    2. $ps aux

This is all about viewing an existing process. PS aux results compared to clutter (PS: Individuals do not agree, PS aux can be used to filter the pipeline +grep, Pstree can not have this function), pstree-a the results are relatively simple and clear, you can see the running process and the relevant user monitoring network services
    1. $ NETSTAT-NTLP
    2. $ netstat-nulp
    3. $ netstat-nxlp

I usually run these three commands separately, and don't want to see a bunch of all the services listed at once. Netstat-nalp can also. Find all running services and check if they should run. View individual listening ports. Usually we recommend that you run fewer services on each server, and you can increase the server if necessary. If you see a server with three or four listening ports open, then make a record, go back to the free time to clean up, reorganize the server CPU and memory
    1. $ free-m
    2. $ uptime
    3. $ top

Note the following issues:
    • Do you have any free memory? is the server swap between memory and hard disk?
    • Are there any remaining CPUs? How many cores does the server have? Are some CPU cores overloaded?
    • Where does the server's maximum load come from? What is the average load?
Hardware
    1. $ lspci
    2. $ dmidecode
    3. $ ethtool

I think the main use of Ethtool to see if the network card is set up? Are you running in half duplex state? Is the speed 10MBps?  Are there any tx/rx errors? IO performance
    1. Dstat--top-io--top-bio

I only wrote a dstat that I would use and feel best. Use it to see who is doing IO: is MySQL eating all the system resources?  Or is it your PHP process? system logs and kernel messages
    1. $ dmesg
    2. $ less/var/log/auth.log

    • Check for error and warning messages, such as to see if the number of connections is too large?
    • See if there is a hardware error or a file system error?
    • Analyze if you can compare these error times to the doubts found earlier
Scheduled Tasks
    1. $ ls/etc/cron* | Cat
    2. $ for user in $ (cat/etc/passwd | awk-f ":" ' {print $} '); Do crontab-l-u $user; Done
(PS: Here I rewrite the use of some Linux commands, haha, although is reproduced, but also to have my own style in the inside)
    • Does a timed task run too often?
    • Are some users submitting hidden scheduled tasks?
    • In the event of a failure, is there a spare share task executing?
Application log There's a lot more to be analyzed here, but I'm afraid you don't have the time to study it carefully as an OPS person. Focus on the obvious issues, such as in a typical lamp application environment:
    • apache&nginx; query access and error log, directly looking for 5xx error, and then see if there are limit_zone errors
    • Mysql; Find the error message in Mysql.log, see if there is a corrupted table, whether there is a InnoDB repair process is running, whether there is disk/index/query problem
    • PHP-FPM: If you set up a php-slow log, just look for the error message
Conclusion after these 5 minutes, you should be more clear about the following situations:
    • What are the things that run on the server?
    • This fault appears to be related to io/hardware/network or system configuration
    • Does this malfunction have some characteristics that you are familiar with? such as improper use of database indexes, or too many Apache background processes

Linux troubleshooting Issues in the first 5 minutes

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.