Troubleshooting of Linux operating systems of Online Entertainment Platform

Source: Internet
Author: User
Tags network troubleshooting
For more information about how to troubleshoot Linux-based online entertainment platform-Linux Enterprise applications-Linux server applications, see the following. Author: parallel lines of crossover
Troubleshooting during system installation

Troubleshooting during system installation is mainly to solve the problem of Linux system installation failure and system installation errors. It mainly includes the compatibility of the motherboard chip and support for external device drivers.

Motherboard chip compatibility
The compatibility with the motherboard chip mainly refers to whether the Linux operating system can correctly identify the chip integrated on the motherboard and install the correct driver for it. Compatibility can be roughly divided into two layers. One is completely unsupported, that is, the Linux operating system cannot identify the chip, or the chip on the motherboard does not have a driver in the Linux operating system; the second case is that the operating system can identify the chip, but requires additional Kernel patches or drivers. However, the driver is only in the test phase and the stability and performance are not guaranteed.
There are several solutions to the problem of motherboard chip compatibility:
First, choose a motherboard from a well-known manufacturer and try to choose an Intel series chipset. In general, Intel is much better at driver development than other Motherboard chipset providers such as VIA and SIS. Generally, in terms of stability and performance, the drivers of Intel chipset in Linux are much better than those of other vendors.
2. Select a Linux operating system with a higher version. The installation disc of a later version of Linux operating system is attached to the kernel Compression Image boot from the CD, with more drivers for the new chipset, therefore, it is more convenient to support new chipset and other hardware devices. However, it should be noted that the development progress of drivers in Linux is generally relatively slow. When a newer version of Linux adds drivers that are not included in the system, it is difficult to find the driver for this version.

External Device Driver support problems
The external devices here mainly refer to the NIC, array card or scsicard issues, which will be discussed separately.
The driver for the NIC mainly supports the core chip of the NIC. Not all NICs have drivers in Linux. There are many poor chip manufacturers who do not want the chip to be used in Linux when developing it. Therefore, they only want to develop drivers for Windows operating systems, you will not be able to use this device in Linux unless you are willing to develop your own driver. There are also some chips, because too old, and unable to use, typical such as TP-LINK TG-3220 used by the tmi tamarack TC9020 chip, manufacturers have closed down, no longer have new driver support.
Common Nic chips, such as realtek 8139/8169, Intel series chipset, and 3COM series chipset, can be directly supported without additional driver installation, because the driver is included with the release version of the operating system. The driver provided by the release version is generally a kernel module that ends with. o, such as eepro100.o. These built-in drivers are located in the/lib/modules/kernel version/kernel/drivers directory. If you need to install a device, check whether the system has the driver attached to the device according to the device type. If not, you need to download the driver of the device in Linux for installation.
The general driver installation process involves three steps: Configuring the kernel version, compiling the source program, and generating the kernel module; the second step is inserting the generated kernel module into insmod, the third step is to copy the kernel module to the system's default kernel module storage directory and edit/etc/modules. conf file so that the module can be automatically inserted when the system starts. The following describes how to install a gigabit Nic of the BCM 9.0 series chipset in Redhat 5700.
Step 1: Obtain the driver package from the drive disc or from the network. For example, bcm5700-6.0.2.tar.gz can be copied to a Linux host through sftp protocol (A 8139 Nic can be inserted first ). Unlock driver package
Tar zxf bcm5700-6.0.2.tar.gz
Go to the source code directory
Cd src
Bcm5700.o file will be generated

Step 2: insert the kernel module

Step 3: copy the kernel module to the/lib/modules/2.4.24/kernel/drivers directory.
Edit the/etc/modules. conf file so that the system can insert the kernel at startup.
Vi/etc/modules. conf
Add a row
Alias eth0 bcm5700

Restart the machine. When kudzu service is started, the NIC is checked and can be configured.
For the installation process of drivers for other external devices, see install the NIC.

Installation of RAID card, scsicard, and SATA driver
The installation of storage devices is special. Therefore, we separate them from the External Device Driver Installation. The installation of storage devices is generally divided into two categories: the first category, the system is guided by the IDE hard disk, and the SCSI devices (RAID cards and SATA devices are both simulated or similar to SCSI devices, these devices are collectively referred to as SCSI devices. The same below) are only installed after the system is installed. The second type means that the boot needs to be directly installed on the SCSI device, and the driver needs to be installed when the system is installed. Otherwise, the installer cannot find a valid storage device to install the system.

Install the SCSI device driver under the guidance of the IDE.
That is, the system is guided by the IDE, while the SCSI device is only used as an external device to install the driver. In this case, you can refer to the driver Installation Method for general external devices in the previous chapter. Note that if the SCSI device is connected to the system without the driver installed, the system may crash. Because the system cannot correctly drive the SCSI device, it is easy to cause kernel confusion. Therefore, we recommend that you unplug the scsicard before installing the driver. If it is a SCSI device integrated with the motherboard, disable the SCSI device from the motherboard BIOS settings. After the driver is installed, shut down the computer, plug in the SCSI device, or enable the SCSI device from the BIOS, and then restart the computer.
For how to install the SCSI driver in this case, refer to the driver installation section of the common external device. We will not describe it here.

Driver Installation on the SCSI Device
If the system needs to be installed on a SCSI device, such as a SCSI hard disk, an ide raid card, or a SATA hard disk, A special method is required to install the driver of the SCSI device. Of course, this step is not required for general SCSI devices. For example, the aic7xxx series SCSI interface card of Adptec, or Intel's ata_piix series SATA interface chipset and other general Linux releases all come with these drivers, these SCSI devices can be identified and supported in the kernel image decompressed during the boot of the operating system installation CD. If some chips are not very common, you need to manually install the driver. We have installed the driver of the VIA VT8237/6420 series sata raid control chip for your specific second.
Step 1: Get the driver package, such as from the drive disc or from the network. Currently, the driver package provided by VIA official website for the RedHat system is Unbind the package through Winrar on Windows, and copy the Driver directory in the installation package to a server with Redhat9.0 installed through sftp.
Step 2: generate the drive disk image. imgfile.
Chmod + x dd. sh
./Dd. sh
In this way, a viamraid. img image file is generated under the driverdisk directory. Note that currently, VIA only supports the Redhat 9.0 driver, which cannot be installed on RHAS. In addition, this process can only run on a Linux machine with Redhat 9.0 installed. If it is running on another version of the operating system, the installer cannot read the driver information from the generated. imgfile. If your existing Linux server is not Redhat 9.0, you can directly use the viamraid. imgfile already provided in the Driverdisk subdirectory in the Driver directory.
Step 3: Write the generated imgfile to the disk.
Insert a blank disk into the drive to disable write protection.
Dd if = driverdisk/viamraid. img of =/dev/fd0
If you do not have an existing Linux operating system, you can use the Disk Mirroring software in Windows to write the viamraid. IMG files provided in the Driverdisk subdirectory to the disk.
Step 4: Use Redhat 9.0 to install the CD boot. When the boot prompt is displayed, enter the linux dd or do not enter the boot information so that the system prompts you to insert the drive disk. If you enter the linux dd boot installer, the system will prompt, Do you have a dirver disk? Select Yes. Then select/dev/fd0, and the system will read the driver information to the kernel. The installer will be able to find the SATA hard disk normally, and the installation can follow the usual steps.
It must be noted that the VT8237 chip driver provided by VIA currently has poor performance in Linux. This is caused by driver design and cannot be set to improve performance. Therefore, we do not recommend that you use this motherboard to install Linux servers.

Troubleshooting during System Configuration

Troubleshooting in the system configuration process mainly includes the configuration of services that the online entertainment platform depends on and the configuration of several commonly used configuration files in the system.

/Etc/fstab Configuration
The/etc/fstab file is the configuration file automatically mounted by the file system. During the startup of the Linux operating system, the file system is automatically mounted Based on the fstab and mtab files. Fstab is common during normal use.
The file structure of/etc/fstab is as follows:
LABEL = // ext3 defaults 1 1
None/dev/pts devpts gid = 5, mode = 620 0 0
LABEL =/home/download ext3 defaults 1 2
LABEL =/home/menu ext3 defaults 1 2
LABEL =/home/mp3/home/mp3 ext3 defaults 1 2
None/proc defaults 0 0
None/dev/shm tmpfs defaults 0 0
/Dev/hda8 swap defaults 0 0
/Dev/hdc1/game ext3 defaults 1 2
/Dev/hdd1/game2 ext3 defaults 1 2

Each row indicates a mount point. The format is
Device Name mount point File System Format option backup check
Note the relationship between the device name and the mount point. If the mount point does not exist, you need to manually create the mount point. The method for creating a mount point is simple. You can directly use the Directory Creation command.
In this way, you can create a mount point named/test.

Another issue that needs to be noted is that if a mount problem occurs in the name marked in fstab, the system will not be able to start normally, generally, you will be prompted to enter the root password to enter the file system repair mode or press ctrl + D to restart. In this case, you only need to enter the root password and then edit the fatab file to comment out a "#" before the mounting project.

/Etc/modules. conf configuration file
The/etc/modules. conf configuration file is the configuration file that the system automatically inserts into the kernel module. If the driver of some peripheral devices is designed as a dynamic module type and is not included in the kernel, You need to manually use the insmod command to insert the kernel module to drive these devices. If you want the system to automatically Insert the dynamic kernel module at startup and set the parameters of the module, You need to configure the module in this file.
The/etc/modules. conf file structure is as follows:
Alias eth0 e1000
Alias scsi_hostadapter ata_piix
Alias usb-controller usb-uhci
Alias sound-slot-0 i810_audio
Alias usb-controller1 ehci-hcd

Each line in the/etc/modules. conf file indicates the insertion of a kernel module or the parameter settings of the module. Here we only discuss how to configure automatic insertion of the kernel dynamic module. In the first example, alias eth0 indicates that after this module is inserted into the kernel, the corresponding device alias is eth0, and e1000 indicates inserting the kernel module File e1000.o under the system's default kernel module storage directory.

/Etc/sysconfig/network-script/directory and/etc/sysconfig/networking Directory
The/etc/sysconfig/network-script/directory stores system network-related configuration files in the/etc/sysconfig/networking Directory. It mainly refers to the IP address, subnet mask, MAC address, and gateway of the NIC. Note that files such as ifcfg-ethx are stored in the/etc/sysconfig/network-script/directory; /etc/sysconfig/networking/devices directory and/etc/sysconfig/networking/profiles/default directory. Make sure that all ifcfg-ethx files are consistent. Otherwise, the Network may fail to be connected.
The Ifcfg-ethx file corresponds to ethx. For example, the configuration file for the first ENI eth0 is a ifcfg-eth0. If you need to specify multiple IP addresses for a network card, you just need to create a file like ifcfg-eth0: 1 in this directory. File Content, with reference to the ifcfg-eth0 to modify it.
The specific structure of the Ifcfg-eth0 is as follows
# 3Com Corporation | 3c905C-TX/TX-M [Tornado]
DEVICE = eth0
ONBOOT = yes
BOOTPROTO = static
HWADDR = 00: 06: 5B: BE: 0C: AD

Note that if a computer has two NICs during installation and a NIC is removed after installation, sometimes the system cannot detect hardware changes in the system, therefore, you need to manually modify the configuration file. At this time, you need to manually delete all the ifcfg-ethx files under the three directories. Otherwise, it will be messy.

/Etc/samba/smb. conf configuration file
/Etc/samba/smb. conf is the main configuration file of the samba service. It is also a file that we usually deal. Although the installation can complete all the configurations through the installation program, we still hope that you will be familiar with the configuration method of this configuration file to cope with special situations.
The content format of Smb. conf is as follows:
Unix charset = cp936
Dos charset = cp936
Display charset = cp936
Netbios name = ECOFE2
Server string = Samba Server % v
Encrypt passwords = Yes
Map to guest = Bad User
Log file =/var/log/samba/log. % m
Max log size = 50
Socket options = TCP_NODELAY SO_RCVBUF = 8192 SO_SNDBUF = 8192
Printcap name = cups
Dns proxy = No
Guest account = admin
Valid users = admin, user, super, oface, cface, update
Admin users = admin, super, update
Write list = admin, super, update
Printer admin = @ adm
Printing = cups
[Menu $]
Path =/home/menu
Write list = admin, user, update
Read only = No
Browseable = No

[Mp3 $]
Path =/home/mp3
Browseable = No

[Game $]
Path =/game
Browseable = No

[Game2 $]
Path =/game2
Browseable = No
As this file is important, I will explain in detail the various settings.
Unix charset = cp936
Dos charset = cp936
Display charset = cp936
These three lines are used to ensure that Samba 3.0 can normally display Chinese characters. If the three lines are removed, Chinese characters are garbled.
Netbios name = ECOFE2
This setting specifies the server's netbios machine name. Do not change this setting. Otherwise, other machines cannot access the server normally.
Valid users = admin, user, super, oface, cface, update
Admin users = admin, super, update
Write list = admin, super, update
These three lines are very important. Only the user name specified in valid users can access the server. Otherwise, access will be denied. Admin users specifies which users have management permissions and write list specifies which users can have write permissions. These two rows are generally the same.
[Game $]
Path =/game
Browseable = No
This section specifies a share. If you need to specify a share, use the [] sign to expand the share name. If you end with the $ symbol, it is an implicit share. Then, use path =/game to specify the shared mount point or directory.
Other parts of the configuration are dispensable, but they are written for rigorous consideration.

Troubleshooting of emergencies

Some unexpected faults may occur during the operation of the server, for example, the server cannot be started. In general, if it can be repaired quickly, it will be repaired as soon as possible. If it is troublesome to fix, it is more efficient to reinstall it. Instead of spending a few hours debugging, it would take 20 minutes to reinstall it.

Failed to start due to file system damage or hard disk damage
When the server is shut down abnormally multiple times or the hard disk shows a bad track, the system may fail to start normally, and the system prompts you to enter the root user password to enter the file system repair mode or press Ctrl + D to restart the system.
Generally, when the file system is not normally detached, the system will record this information and automatically check the file structure at startup. In addition, the ext3 file format specifies that, even if the file system is normally mounted and detached, the system forcibly checks the file system once every 20 operations. These checks can generally pass normally, but the check time varies depending on the number of files. Normally, the time for a file check may exceed 10 minutes. Please be patient. Do not forcibly restart the machine at this time, which will cause greater damage to the file system.
If the Automatic File System check fails, the system will prompt you to enter the file system Repair Mode for manual file check or press Ctrl + D to restart the computer. At this time, enter the password of the root user and the system will boot into the file system repair mode. At this time, follow the system prompts to manually check the hard disk partitions that cannot complete the automatic check. For example, if the system prompts/dev/hdc1 that cannot complete the automatic check, enter:
Fsck. ext3/dev/hdc1
Then wait patiently until the file system check is complete, and then enter exit to restart the computer. If the system prompts that the file system has been mounted and whether to forcibly execute the check, select N to manually partition the file into umount and then execute the fsck command.
In some cases, the file system is severely damaged and cannot be repaired. When the file system check is executed, errors will be continuously reported, so you do not have to check them again, even if the check is complete, all the data will be stored in the lost + found, which cannot be restored and all data will be lost. In this case, copy the hard disk directly. To solve the problem.
If a bad track occurs on the hard disk, the system may crash. The kernel will trigger an alarm on the screen, prompting a DMA error, or directly stopping the response. The keyboard light will flash continuously. In this case, replace the hard disk as prompted. When changing the hard disk, you can continue to work without any damage to the hard disk, unless the system disk is damaged. Of course, when You unmount a hard disk without changing the fstab file, the system will not be able to start, report the loss of the hard disk in the file system, and prompt you to enter the file system repair mode, you only need to unmount the hard disk you have removed from fstab.

Direct to single-user mode
In many cases, you need to enter the single-user mode to repair the system. For example, the root password is lost, or an error occurs in an auto-start service, causing the process to become stiff when the server starts the service.
Generally, the file server system we have installed is guided by grub. If you want to boot to single-user mode under grub boot, Please appear in the system boot interface, wait 10 seconds for you to choose to boot the kernel, select the kernel you want to boot into the single-user mode, and press the "e" key, the editing interface will appear, and the prompt is similar to the following:
Kernel/boot/vmlinuz-2.4.21-15.EL ro root = LABEL =/
Enter single, as shown in the following figure.
Kernel/boot/vmlinuz-2.4.21-15.EL ro root = LABEL =/single
Press enter to confirm and then press "B" on the kernel option. Then the system will guide you to the single-user mode. In this case, you can use the passwd command to modify the root password. Or perform other repair operations.
If the server is guided by lilo, it is simpler. Press esc when the system prompts you to select the kernel to enter the boot: prompt and press tab to display the bootable kernel, then, add single to the space behind the kernel. The details are not described in detail.
Network troubleshooting
The network fault here only refers to the situation where the server network cannot be connected, excluding other faults in the overall network structure. Generally, when the server cannot ping a machine in the same subnet, it is deemed that the server has a network fault. At the same time, the server may be able to ping the peer end, but the peer end cannot access the server-related services.
If the server cannot ping the peer end, check whether the network physical line is normal.
Whether the switch device connecting to the server has VLAN restrictions after the network line check is completed.
After confirming that the external conditions are normal, check the NIC settings by referring to the NIC Settings section mentioned in the previous section. You can also run the ifconfig command to check whether the NIC is correctly specified with the IP address. The correct ifconfig is shown as follows:
Eth0 Link encap: Ethernet HWaddr 00: 0A: 5E: 3C: BD: 94
Inet addr: Bcast: Mask:
Up broadcast running multicast mtu: 1500 Metric: 1
RX packets: 113 errors: 0 dropped: 0 overruns: 1 frame: 0
TX packets: 66 errors: 0 dropped: 0 overruns: 0 carrier: 0
Collisions: 0 FIG: 1000
RX bytes: 18479 (18.0 Kb) TX bytes: 8075 (7.8 Kb)
Interrupt: 11 Base address: 0xa000

Lo Link encap: Local Loopback
Inet addr: Mask:
Up loopback running mtu: 16436 Metric: 1
RX packets: 10 errors: 0 dropped: 0 overruns: 0 frame: 0
TX packets: 10 errors: 0 dropped: 0 overruns: 0 carrier: 0
Collisions: 0 txqueuelen: 0
RX bytes: 700 (700.0 B) TX bytes: 700 (700.0 B)
If the inet addr part in eth0 does not have the correct IP address, it is likely that the configuration file is not uniform, resulting in confusion. In this case, you can use the ifconfig eth0 netmask up command to temporarily activate the Protocol status of the eth0 port. Then, test whether the communication is normal.
In the preceding configuration, pay attention to the netmask configuration. Because the IP address must be used with netmask at the same time, if the netmask settings are incorrect, communication may fail.
In a multi-layer switch environment or cross-network segment environment, you must also pay attention to the routing configuration. For these questions, see the network knowledge section.
If the above check is correct, check whether the firewall policy is applied on the server. Iptables? L

There is also a case of network failure because the NIC cannot be found due to the loss of the driver program. This is generally caused by poor network adapter quality. If this problem occurs, please refer to the driver installation section. Or simply shut down the computer, and then unplug the NIC so that kudzu can automatically check for hardware changes and uninstall the NIC Driver. Disable the computer and insert the NIC so that kudzu can automatically install the NIC Driver.

Performance troubleshooting
If the performance of the server is seriously degraded, or the performance is poor after the server is installed. First, refer to the installation section above to see if it is caused by performance problems of the hardware driver. If the compatible hardware is used, perform the following steps.
First, test whether the read speed of the hard disk is normal.
Hdparm? TT/dev/hda
The first output is the speed at which files are read from the cache. This speed mainly represents the performance of the motherboard. Generally, a general motherboard, such as a 845 series motherboard, caches a read speed of about-MB per second. The recommended server motherboard is at 1 GB per second. This is the difference between the server motherboard and the common motherboard. Of course, this performance indicator is also affected by the memory size and hard disk speed.
The second output is the speed at which data is read directly from the hard disk. Under normal circumstances, the speed of the 7200-to-SATA hard disk is generally around 55 MB per second.
If the data obtained from the test differs greatly from the preceding normal data, use the lspci command to check whether the IDE control chip is correctly identified. Of course, in general, the IDE driver will be correctly installed. If the IDE driver is not correctly installed, it is generally considered a hardware compatibility problem.

Troubleshooting of service faults
The services we discuss here mainly refer to the Samba service. The Samba service includes two major services: smbd and nmbd. Smbd is the main service that provides smb file transmission, and nmbd is the netbios service.
A service failure usually occurs when the IP address of the server can be pinged, but the netbios machine name of the server cannot be pinged, such as ecofe2. Of course, server smb sharing cannot be accessed through the machine name. This failure may occur because the entire samba service is not started, that is, both smbd and nmbd are not started, or the smbd service is started but the nmbd service is not started. If the nmbd service is not started and the smbd service is started, you can access the file server in a similar way, such as \ Server IP address \ game $.
The command to check whether the service is properly started is:
Service smb status
Normal conditions should be output
Smbd (pid 2657) is running...
Nmbd (pid 2661) is running...
If any one of the services stops, restart the service and run the following command:
Service smb restart
Shutting down SMB services: [OK]
Shutting down NMB services: [OK]
Starting SMB services: [OK]
Starting NMB services: [OK]
There are two points to note: one is that netbios broadcast is an interval type, and it is not always broadcast. Therefore, if it happens to be in the gap of netbios broadcast when the workstation is started up, you may not be able to find ecofe2 for a short time. Of course, the workstation can query the IP address of the netbios machine name through broadcast. There may also be two machines named ecofe2 on the Internet, but the IP address and MAC address are different. This situation may occur when multiple servers in the company are plugged in to the network for debugging. At this time, restart the samba service once. Second, if two NICs are active on the server, the IP address is set, but only one Nic is connected to the network. This will cause the nmbd service to terminate abnormally after a period of time. To solve this problem, either connect another Nic to the NIC or disable another Nic. You can disable idle NICs from the BIOS or delete the NIC configuration file or change the ONBOOT option in the NIC configuration file to no.
Related Article

E-Commerce Solutions

Leverage the same tools powering the Alibaba Ecosystem

Learn more >

Apsara Conference 2019

The Rise of Data Intelligence, September 25th - 27th, Hangzhou, China

Learn more >

Alibaba Cloud Free Trial

Learn and experience the power of Alibaba Cloud with a free trial worth $300-1200 USD

Learn more >

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: and provide relevant evidence. A staff member will contact you within 5 working days.