33 Tips for troubleshooting and handling Linux operations frequently

Last Update:2018-03-03 Source: Internet

Author: User

Tags nameserver

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

As a Linux operation, more or less will encounter such problems or failures, from which to experience, find problems, summarize and analyze the cause of the failure, this is a Linux operations engineer good habits. Every technology breakthrough, are experiencing depression, with happiness, but we are still persistent efforts, from which also accumulated more experience, this is the practice to give us a rich return.

The following summarizes the possible failures and workarounds for my project, see if they resonate with you and help you?

First: FAQ Solution Highlights

1.shell script does not execute
problem: one day to develop a colleague to ask me to help him look at his shell script, dead or alive do not execute, error. I looked at the script is very simple, there is no conventional error, reported ": Badinterpreter:nosuchfileordirectory" wrong.
Looking at this mistake, I asked him if he was writing a script under windows and then uploading it to a Linux server. Sure enough
cause: in the dos/windows, the text file newline character is RN, and in the *nix system is n, so the dos/windows edited text file into the *nix, each line is more than a ^m.
Solve:
1) write the script again under Linux;
2) vi:%s/r//g:%s/^m//g (^m input with ctrl+v,ctrl+m)
Attached: sh-x script file name, you can step into and echo the results, to help troubleshoot complex scripting problems.

2.crontab output Result control

     problem:
    /var/spool/clientmqueue directory takes up more than 100G
      Cause:
    cron The program executed in the output content, the output will be sent to the cron user, and SendMail did not start so it produced a/var/spool/ Those files in the Clientmqueue directory may accumulate over the next few disks.
     FIX:
    1) Delete directly manually: ls|xargsrm-f;
     2) Complete solution: Add >/dev/null2>&1

After Cron automatically executes the statement

   3.telnet very slow/ssh very slow
     issues:
    one day to develop a colleague said 10.50 Access 10.52memcached Service exception, let us check the network/service/system if there is an exception. Check found that the system is normal, the service is normal, 10.50ping10.52 is normal, but the 10.50telnet10.52 is very slow. It is also found that the namesever of the machine is not working.
     Reason:
    becauseyourpcdoesn ' Tdoareversednslookuponyouripthen ... whenyoutelnet/ftpintoyourlinuxbox,it ' lldoadnslookuponyou.
     FIX:
    1) modify/etc/hosts to make hostname and IP correspond;
    &NBSP;2) Comment out nameserver in/etc/resolv.conf or find a "live" nameserver.

    4.read-onlyfilesystem
     issues:
    colleagues build a table in MySQL is not successful, prompted as follows:
    mysql>createtablewosontest ( Colddname1char (1));
    error1005 (HY000): Can ' tcreatetable ' wosontest ' (errno:30)
     Check MySQL user rights and related directory permissions no problem; PERROR30 prompt message: Oserrorcode30:read-onlyfilesystem
     possible causes:
    1) file system corruption;
    2) disk is bad;
    3) fstab file configuration error , such as incorrect partition format (write NTFS as FAT), configuration instructions, spelling errors, and so on.
     FIX:
    1) because it is a test machine, restart the machine after recovery;
     2) Online say with mount can be solved.

    5. File deleted disk space not released
     problem:
      one day found a machine df-h has used disk space of 90G, and du-sh/* Show all the use of space added up only 30G, embarrassing.
     Reason:
     Someone may delete a file that is being written directly with RM, causing the file to be deleted but disk space is not released
    Resolution:
    1) The simplest to restart the system or restart related services.
    2) Kill process
    /usr/sbin/lsof|grepdeleted
    &NBSP;ORA25575DATA33UREG65,654294983680/ORADATA/DATAPRE/UNDOTBS009.DBF (deleted)
     From the output of the lsof, we can find that the PID 25575 process holds a file/oradata/datapre/undotbs009.dbf opened with a file description number (FD) of 33. After we find this file, we can release the occupied space by ending the process: ECHO>/PROC/25575/FD/33
    3) Delete the file that is being written generally with cat/dev/ Null>file

    6.find file boost performance
     issues:
      There are a large number of temporary files in the TMP directory that contain picture_*, which cleans up files a day before 2:30 every night. Before running the following script under Crontab, but found that the script is very inefficient, each execution of the load soared, affecting other services.
    #!/bin/sh
    find/tmp-name "picture_*"-mtime+1-execrm-f{};
     reason: There are a lot of files in the
     directory, and using find is very resource-consuming.
     FIX:
    #!/bin/sh
    cd/tmp
    time= ' date-d "2dayago" "+%b%d" '
    ls-l|grep "picture" |grep "$time "|awk ' {print$nf} ' |xargsrm-rf

7. Unable to get the gateway MAC address
Problem:
From 2.14 to 3.65 (map address 2.141) The network is out of line, but from the other machines on the 3 end to the 3.65 network OK.
Reason:
#arp
Addresshwtypehwaddressflagsmaskiface
192.168.3.254etherincompletcmbond0
Surface phenomenon is the machine automatically get the gateway MAC address, network engineer said is the problem of network equipment, specific unclear.
Solve:
ARP binding, arp-ibond0-s192.168.3.25400:00:5e:00:01:64

8.http Service cannot start a case
Problem: One day to develop a colleague said the site front-end environment HTTP can not start, I went up to see the next. Report the following error:
/etc/init.d/httpdstart
Startinghttpd:[satjan2917:49:002011][warn]moduleantibot_moduleisalreadyloaded,skipping
Useproxyforwardasremoteip:true.
antibotexcludepattern:.*. [(Js|css|jpg|gif|png)]
Antibotseedcheckpattern:login
(98) addressalreadyinuse:make_sock:couldnotbindtoaddress[::]:7080
(98) addressalreadyinuse:make_sock:couldnotbindtoaddress0.0.0.0:7080
Nolisteningsocketsavailable,shuttingdown
Unabletoopenlog[failed]
Reason:
1) The port is occupied: the surface is 7080 port is occupied, so netstat-npl|grep7080 looked under the discovery 7080 is not occupied;
2) The port is repeatedly written in the config file, if the following two files are written at the same time Listen7080
/etc/httpd/conf/http.conf
/etc/httpd/conf.d/t.10086.cn.conf
Solve:
Comment out the Listen7080 of/etc/httpd/conf.d/t.10086.cn.conf, reboot, OK.

9.toomanyopenfile
Problem:
Report Toomanyopenfile Error
Solve:
The ultimate Solution
echo "" >>/etc/security/limits.conf
echo "*softnproc65535″>>/etc/security/limits.conf
echo "*hardnproc65535″>>/etc/security/limits.conf
echo "*softnofile65535″>>/etc/security/limits.conf
echo "*hardnofile65535″>>/etc/security/limits.conf
echo "" >>/root/.bash_profile
echo "Ulimit-n65535″>>/root/.bash_profile
echo "Ulimit-u65535″>>/root/.bash_profile
Finally restart the machine or execute the ulimit-u655345&&ulimit-n65535

disk space problems caused by 10.ibdata1 and Mysql-bin
Problem:
2.51 disk space Alarm, found ibdata1 and Mysql-bin logs occupy too much space (where ibdata1 more than 120g,mysql-bin more than 80G)
Reason:
IBDATA1 is a storage format in which IBDATA1 stores the data and indexes of the file in the InnoDB type data state, whereas the table files in the library name's folder are simply structures.
The InnoDB storage engine has two types of table-space management, namely:
1) shared table space (can be split into a number of small tablespace files), this is the majority of our current database use method;
2) stand-alone table space, each table has a separate tablespace (disk file)
For the two ways of management, each has its merits and demerits, as follows:
① Shared table spaces:
Pros: You can divide a tablespace into multiple files on a different disk (the Tablespace file size is not limited by the size of the table, and a table can be distributed on files that are not in sync)
Disadvantage: All data and indexes are stored in a file, then as the data increases, there will be a large file, although a large file can be divided into several small files, but multiple tables and indexes in the table space mixed storage, so that if a table after a large number of deletions, there will be a large number of gaps in the table space. For shared table space management, once the tablespace is allocated, it cannot be retracted. When a temporary index is created or the table space for creating a temporary table expands, it is not possible to remove the associated table or to shrink that part of the space.
② standalone tablespace: Set in configuration file (my.cnf): innodb_file_per_table
Features: Each table has a self-contained table space, and the data and indexes for each table exist in the table space themselves.
Advantage: The disk space corresponding to the tablespace can be retracted (the droptable operation automatically reclaims the table space, if the table after deleting large amounts of data can pass: ALTERTABLETBL_NAMEENGINE=INNODB;
Disadvantage: If the single table is increased too large, such as more than 100G, performance will be affected. In this case, if you use shared tablespace, you can separate files, but there is also a problem, if the scope of access is too large to access multiple files, it will be slower. If you use a stand-alone table space, consider using partitioned tables to mitigate the problem to some extent. In addition, when you enable standalone tablespace mode, you need to adjust the settings of the Innodb_open_files parameter reasonably.
Solve:
1) ibdata1 data is too large: only through dump, to export the SQL statement to build the database, and then rebuild the method.
2) Mysql-binlog too big:
① Manual removal:
Delete a log: Mysql>purgemasterlogsto ' mysql-bin.010′;
Delete a day before the log: Mysql>purgemasterlogsbefore ' 2010-12-2213:00:00′;
② in/etc/my.cnf to save only n days of Bin-log logs
Expire_logs_days=30//binarylog Auto-deleted days

Second, troubleshooting summary table

Serial number	Point of failure	Analysis and resolution
1	Linux system installation initial state, the hard disk cannot be found, and cannot go to the next installation	Enter COMs settings, find the relevant options for hard drive settings, and set to compatibility mode
2	When the Linux system is installed, the installation cannot continue after the hard disk partition is completed	The hard disk partition does not meet the installation requirements, you may forget to create a root partition or swap partition, which is different from the installation of Windows system
3	Linux system installation, the establishment of the installation, the choice of the package is confused, after the installation of the discovery does not meet our requirements, some components are not installed, and the unwanted components are loaded	Understanding of the Linux system is too little, repeatedly installed after many times, natural grasp freely
4	During the configuration of the proxy server, it was found that some filtering plans did not work	(1) First check whether the corresponding function module is loaded successfully (2) The default policy is set appropriately (3) iptables command syntax is wrong (4) The filtering planning order may be inappropriate, you need to adjust
5	After the configuration of the proxy server and the firewall is complete, the service can be started, the Internet will be accessible, but the service in the DMZ area cannot be accessed	(1) Close the Iptables service, see if you can access, if not, check connectivity, if you can access, explain the iptables rules have problems, centralized check filtering rules configuration and order
6	Once again configured iptables filtering rules, after restarting the Iptables service, found that the original rules are all lost	(1) Modify the/etc/sysconfig/iptables-config configuration file, change iptables_save_on_restart= "no" to Yes (2) with Iptables-save >/etc/sysconfig /iptables Command Save
7	No access to the extranet after VLAN partitioning on the switch	VLAN gateway is not set or set incorrectly
8	The named service failed to start in the configuration DNS service	The possibility of causing the problem: (1) The/etc/named directory is missing the necessary files (2)/var/named directory is missing the necessary files (3) named account permissions issues. Workaround: The missing files must be copied into place, and the boot file must have permissions set to named account and group account
9	The domain name or IP address cannot be resolved correctly in the configuration DNS service	(1) Check and modify the syntax and logging settings (2) in the Forward parsing zone file and the reverse parse zone file under/var/named/etc/ Zone Zone declaration in named.conf Configuration error (3) Check if the Bind-chroot package is installed, as installed, the zone database file should be checked in the/var/named/chroot/var/named directory (4)/etc/ Is the resolv.conf configuration file set up with the correct nameserver
10	DHCPD service starts, "No Subnet declaration for eth0 (10.10.10.2)" is prompted	Description The IP address of the eth0 is not set and is not scoped to the scope of the DHCP service, the IP address of the eth0 must be set to a scope range of IP addresses
11	When you configure the DHCP service, multiple scopes are configured, and only one scope address can be assigned, and the other cannot be assigned successfully	Description of the host network interface card only one, if there are 3 scopes, you need to configure 3 NIC interface eth0, eth1 and eth2, respectively, corresponding to 3 scopes. This is a configuration method that uses a superscope
12	MySQL database installation is not successful, always prompt for software dependencies, causing the package to be installed does not install smoothly	Note that the package to be installed requires additional components or shared library support, MySQL RPM package installation method itself is cumbersome, requires more packages installed, the dependencies between the package is very obvious, according to the prompt to find the required package and installation, the installation should pay attention to the package sequence
13	Test Web Service, when accessing primary site, no Web page appears, but already connected to the server	The "documentroot" option in the httpd.conf Master profile is improperly set, such as/var/www/html/, and the last "/" cannot be added
14	The remote client cannot access the Samba shared directory, and the shared directory is successfully tested locally	Close Iptables Service
15	Samba's SMB service has started successfully, when accessing a shared directory of Samba, the error message "Nt_status_bad_network_name" is prompted	Description The shared directory is not created or does not exist
16	The SMB service for Samba has started successfully, prompting the error message "Nt_status_access_denied"	Prompt access is denied, may be incorrect login username or password, or iptables boot, turn off firewall
17	The SMB service for Samba has started successfully, prompting the error message "Nt_status_logon_failure"	Does not allow the current user to access the current shared directory, indicating that this shared directory setting allows only specific users to access
18	The FTP service configures local user uploads, but the prompt is rejected when uploading data to the corresponding directory	The user account may not have write access to the upload directory
19	Configuration allows a local account to log in to FTP, the root account cannot log on, and the "Oops:cannot Change directory:/root" error message is indicated, while other local accounts can log in to FTP	Check if the SELinux security system is enabled and SELinux is disabled, you can edit the/etc/selinux/config file and change the configuration item selinux=enforcing to Disabled
20	Use mail clients to send messages, but not to receive messages	Check if the POP3 service is started
21st	The Mount command mounts the shared Directory for NFS services and is not responding for a long time, and the NFS service is normal	Portmap service does not start, the service must be started
22	Local Test Mount Mount NFS share succeeded, but was unsuccessful on other client host Mount connections	Turn off the Iptables service and test again

33 Tips for troubleshooting and handling Linux operations frequently

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More