33 Tips for troubleshooting and handling Linux operations frequently

Source: Internet
Author: User
Tags nameserver

As a Linux operation, more or less will encounter such problems or failures, from which to experience, find problems, summarize and analyze the cause of the failure, this is a Linux operations engineer good habits. Every technology breakthrough, are experiencing depression, with happiness, but we are still persistent efforts, from which also accumulated more experience, this is the practice to give us a rich return.

The following summarizes the possible failures and workarounds for my project, see if they resonate with you and help you?

First: FAQ Solution Highlights

1.shell script does not execute
problem: one day to develop a colleague to ask me to help him look at his shell script, dead or alive do not execute, error. I looked at the script is very simple, there is no conventional error, reported ": Badinterpreter:nosuchfileordirectory" wrong.
Looking at this mistake, I asked him if he was writing a script under windows and then uploading it to a Linux server. Sure enough
cause: in the dos/windows, the text file newline character is RN, and in the *nix system is n, so the dos/windows edited text file into the *nix, each line is more than a ^m.
Solve:
1) write the script again under Linux;
2) vi:%s/r//g:%s/^m//g (^m input with ctrl+v,ctrl+m)
Attached: sh-x script file name, you can step into and echo the results, to help troubleshoot complex scripting problems.


2.crontab output Result control

     problem:
    /var/spool/clientmqueue directory takes up more than 100G
      Cause:
    cron The program executed in the output content, the output will be sent to the cron user, and SendMail did not start so it produced a/var/spool/ Those files in the Clientmqueue directory may accumulate over the next few disks.
     FIX:
    1) Delete directly manually: ls|xargsrm-f;
     2) Complete solution: Add >/dev/null2>&1

After Cron automatically executes the statement


     3.telnet very slow/ssh very slow
     issues:
      one day to develop a colleague said 10.50 Access 10.52memcached Service exception, let us check the network/service/system if there is an exception. Check found that the system is normal, the service is normal, 10.50ping10.52 is normal, but the 10.50telnet10.52 is very slow. It is also found that the namesever of the machine is not working.
     Reason:
    becauseyourpcdoesn ' Tdoareversednslookuponyouripthen ... whenyoutelnet/ftpintoyourlinuxbox,it ' lldoadnslookuponyou.
     FIX:
    1) modify/etc/hosts to make hostname and IP correspond;
     2) Comment out nameserver in/etc/resolv.conf or find a "live" nameserver.


    4.read-onlyfilesystem
     issues:
      colleagues build a table in MySQL is not successful, prompted as follows:
    mysql>createtablewosontest ( Colddname1char (1));
    error1005 (HY000): Can ' tcreatetable ' wosontest ' (errno:30)
     Check MySQL user rights and related directory permissions no problem; PERROR30 prompt message: Oserrorcode30:read-onlyfilesystem
     possible causes:
    1) file system corruption;
    2) disk is bad;
    3) fstab file configuration error , such as incorrect partition format (write NTFS as FAT), configuration instructions, spelling errors, and so on.
     FIX:
    1) because it is a test machine, restart the machine after recovery;
     2) Online say with mount can be solved.


    5. File deleted disk space not released
     problem:
      one day found a machine df-h has used disk space of 90G, and du-sh/* Show all the use of space added up only 30G, embarrassing.
     Reason:
     Someone may delete a file that is being written directly with RM, causing the file to be deleted but disk space is not released
      Resolution:
    1) The simplest to restart the system or restart related services.
    2) Kill process
    /usr/sbin/lsof|grepdeleted
     ORA25575DATA33UREG65,654294983680/ORADATA/DATAPRE/UNDOTBS009.DBF (deleted)
     From the output of the lsof, we can find that the PID 25575 process holds a file/oradata/datapre/undotbs009.dbf opened with a file description number (FD) of 33. After we find this file, we can release the occupied space by ending the process: ECHO>/PROC/25575/FD/33
    3) Delete the file that is being written generally with cat/dev/ Null>file


    6.find file boost performance
     issues:
      There are a large number of temporary files in the TMP directory that contain picture_*, which cleans up files a day before 2:30 every night. Before running the following script under Crontab, but found that the script is very inefficient, each execution of the load soared, affecting other services.
    #!/bin/sh
    find/tmp-name "picture_*"-mtime+1-execrm-f{};
     reason: There are a lot of files in the
     directory, and using find is very resource-consuming.
     FIX:
    #!/bin/sh
    cd/tmp
    time= ' date-d "2dayago" "+%b%d" '
    ls-l|grep "picture" |grep "$time "|awk ' {print$nf} ' |xargsrm-rf


7. Unable to get the gateway MAC address
Problem:
From 2.14 to 3.65 (map address 2.141) The network is out of line, but from the other machines on the 3 end to the 3.65 network OK.
Reason:
#arp
Addresshwtypehwaddressflagsmaskiface
192.168.3.254etherincompletcmbond0
Surface phenomenon is the machine automatically get the gateway MAC address, network engineer said is the problem of network equipment, specific unclear.
Solve:
ARP binding, arp-ibond0-s192.168.3.25400:00:5e:00:01:64


8.http Service cannot start a case
Problem: One day to develop a colleague said the site front-end environment HTTP can not start, I went up to see the next. Report the following error:
/etc/init.d/httpdstart
Startinghttpd:[satjan2917:49:002011][warn]moduleantibot_moduleisalreadyloaded,skipping
Useproxyforwardasremoteip:true.
antibotexcludepattern:.*. [(Js|css|jpg|gif|png)]
Antibotseedcheckpattern:login
(98) addressalreadyinuse:make_sock:couldnotbindtoaddress[::]:7080
(98) addressalreadyinuse:make_sock:couldnotbindtoaddress0.0.0.0:7080
Nolisteningsocketsavailable,shuttingdown
Unabletoopenlog[failed]
Reason:
1) The port is occupied: the surface is 7080 port is occupied, so netstat-npl|grep7080 looked under the discovery 7080 is not occupied;
2) The port is repeatedly written in the config file, if the following two files are written at the same time Listen7080
/etc/httpd/conf/http.conf
/etc/httpd/conf.d/t.10086.cn.conf
Solve:
Comment out the Listen7080 of/etc/httpd/conf.d/t.10086.cn.conf, reboot, OK.


9.toomanyopenfile
Problem:
Report Toomanyopenfile Error
Solve:
The ultimate Solution
echo "" >>/etc/security/limits.conf
echo "*softnproc65535″>>/etc/security/limits.conf
echo "*hardnproc65535″>>/etc/security/limits.conf
echo "*softnofile65535″>>/etc/security/limits.conf
echo "*hardnofile65535″>>/etc/security/limits.conf
echo "" >>/root/.bash_profile
echo "Ulimit-n65535″>>/root/.bash_profile
echo "Ulimit-u65535″>>/root/.bash_profile
Finally restart the machine or execute the ulimit-u655345&&ulimit-n65535


disk space problems caused by 10.ibdata1 and Mysql-bin
Problem:
2.51 disk space Alarm, found ibdata1 and Mysql-bin logs occupy too much space (where ibdata1 more than 120g,mysql-bin more than 80G)
Reason:
IBDATA1 is a storage format in which IBDATA1 stores the data and indexes of the file in the InnoDB type data state, whereas the table files in the library name's folder are simply structures.
The InnoDB storage engine has two types of table-space management, namely:
1) shared table space (can be split into a number of small tablespace files), this is the majority of our current database use method;
2) stand-alone table space, each table has a separate tablespace (disk file)
For the two ways of management, each has its merits and demerits, as follows:
① Shared table spaces:
Pros: You can divide a tablespace into multiple files on a different disk (the Tablespace file size is not limited by the size of the table, and a table can be distributed on files that are not in sync)
Disadvantage: All data and indexes are stored in a file, then as the data increases, there will be a large file, although a large file can be divided into several small files, but multiple tables and indexes in the table space mixed storage, so that if a table after a large number of deletions, there will be a large number of gaps in the table space. For shared table space management, once the tablespace is allocated, it cannot be retracted. When a temporary index is created or the table space for creating a temporary table expands, it is not possible to remove the associated table or to shrink that part of the space.
② standalone tablespace: Set in configuration file (my.cnf): innodb_file_per_table
Features: Each table has a self-contained table space, and the data and indexes for each table exist in the table space themselves.
Advantage: The disk space corresponding to the tablespace can be retracted (the droptable operation automatically reclaims the table space, if the table after deleting large amounts of data can pass: ALTERTABLETBL_NAMEENGINE=INNODB;
Disadvantage: If the single table is increased too large, such as more than 100G, performance will be affected. In this case, if you use shared tablespace, you can separate files, but there is also a problem, if the scope of access is too large to access multiple files, it will be slower. If you use a stand-alone table space, consider using partitioned tables to mitigate the problem to some extent. In addition, when you enable standalone tablespace mode, you need to adjust the settings of the Innodb_open_files parameter reasonably.
Solve:
1) ibdata1 data is too large: only through dump, to export the SQL statement to build the database, and then rebuild the method.
2) Mysql-binlog too big:
① Manual removal:
Delete a log: Mysql>purgemasterlogsto ' mysql-bin.010′;
Delete a day before the log: Mysql>purgemasterlogsbefore ' 2010-12-2213:00:00′;
② in/etc/my.cnf to save only n days of Bin-log logs
Expire_logs_days=30//binarylog Auto-deleted days

Second, troubleshooting summary table

Serial number

Point of failure

Analysis and resolution

1

Linux system installation initial state, the hard disk cannot be found, and cannot go to the next installation

Enter COMs settings, find the relevant options for hard drive settings, and set to compatibility mode

2

When the Linux system is installed, the installation cannot continue after the hard disk partition is completed

The hard disk partition does not meet the installation requirements, you may forget to create a root partition or swap partition, which is different from the installation of Windows system

3

Linux system installation, the establishment of the installation, the choice of the package is confused, after the installation of the discovery does not meet our requirements, some components are not installed, and the unwanted components are loaded

Understanding of the Linux system is too little, repeatedly installed after many times, natural grasp freely

4

During the configuration of the proxy server, it was found that some filtering plans did not work

(1) First check whether the corresponding function module is loaded successfully (2) The default policy is set appropriately (3) iptables command syntax is wrong (4) The filtering planning order may be inappropriate, you need to adjust

5

After the configuration of the proxy server and the firewall is complete, the service can be started, the Internet will be accessible, but the service in the DMZ area cannot be accessed

(1) Close the Iptables service, see if you can access, if not, check connectivity, if you can access, explain the iptables rules have problems, centralized check filtering rules configuration and order

6

Once again configured iptables filtering rules, after restarting the Iptables service, found that the original rules are all lost

(1) Modify the/etc/sysconfig/iptables-config configuration file, change iptables_save_on_restart= "no" to Yes (2) with Iptables-save >/etc/sysconfig /iptables Command Save

7

No access to the extranet after VLAN partitioning on the switch

VLAN gateway is not set or set incorrectly

8

The named service failed to start in the configuration DNS service

The possibility of causing the problem: (1) The/etc/named directory is missing the necessary files (2)/var/named directory is missing the necessary files (3) named account permissions issues. Workaround: The missing files must be copied into place, and the boot file must have permissions set to named account and group account

9

The domain name or IP address cannot be resolved correctly in the configuration DNS service

(1) Check and modify the syntax and logging settings (2) in the Forward parsing zone file and the reverse parse zone file under/var/named/etc/ Zone Zone declaration in named.conf Configuration error (3) Check if the Bind-chroot package is installed, as installed, the zone database file should be checked in the/var/named/chroot/var/named directory (4)/etc/ Is the resolv.conf configuration file set up with the correct nameserver

10

DHCPD service starts, "No Subnet declaration for eth0 (10.10.10.2)" is prompted

Description The IP address of the eth0 is not set and is not scoped to the scope of the DHCP service, the IP address of the eth0 must be set to a scope range of IP addresses

11

When you configure the DHCP service, multiple scopes are configured, and only one scope address can be assigned, and the other cannot be assigned successfully

Description of the host network interface card only one, if there are 3 scopes, you need to configure 3 NIC interface eth0, eth1 and eth2, respectively, corresponding to 3 scopes. This is a configuration method that uses a superscope

12

MySQL database installation is not successful, always prompt for software dependencies, causing the package to be installed does not install smoothly

Note that the package to be installed requires additional components or shared library support, MySQL RPM package installation method itself is cumbersome, requires more packages installed, the dependencies between the package is very obvious, according to the prompt to find the required package and installation, the installation should pay attention to the package sequence

13

Test Web Service, when accessing primary site, no Web page appears, but already connected to the server

The "documentroot" option in the httpd.conf Master profile is improperly set, such as/var/www/html/, and the last "/" cannot be added

14

The remote client cannot access the Samba shared directory, and the shared directory is successfully tested locally

Close Iptables Service

15

Samba's SMB service has started successfully, when accessing a shared directory of Samba, the error message "Nt_status_bad_network_name" is prompted

Description The shared directory is not created or does not exist

16

The SMB service for Samba has started successfully, prompting the error message "Nt_status_access_denied"

Prompt access is denied, may be incorrect login username or password, or iptables boot, turn off firewall

17

The SMB service for Samba has started successfully, prompting the error message "Nt_status_logon_failure"

Does not allow the current user to access the current shared directory, indicating that this shared directory setting allows only specific users to access

18

The FTP service configures local user uploads, but the prompt is rejected when uploading data to the corresponding directory

The user account may not have write access to the upload directory

19

Configuration allows a local account to log in to FTP, the root account cannot log on, and the "Oops:cannot Change directory:/root" error message is indicated, while other local accounts can log in to FTP

Check if the SELinux security system is enabled and SELinux is disabled, you can edit the/etc/selinux/config file and change the configuration item selinux=enforcing to Disabled

20

Use mail clients to send messages, but not to receive messages

Check if the POP3 service is started

21st

The Mount command mounts the shared Directory for NFS services and is not responding for a long time, and the NFS service is normal

Portmap service does not start, the service must be started

22

Local Test Mount Mount NFS share succeeded, but was unsuccessful on other client host Mount connections

Turn off the Iptables service and test again

33 Tips for troubleshooting and handling Linux operations frequently

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.