[Reprinted] O & M career direction! How can I enter the O & M industry? Essential Skills and skills for O & M engineers

Source: Internet
Author: User
Tags mysql commands processing text website performance nginx reverse proxy saltstack

[Reprinted] O & M career direction! How can I enter the O & M industry? Essential Skills and skills for O & M engineers

Preface: Reprinted Chen Hao's previous article on Security O & M. Well written. Very nice people, encountered a problem, qq quickly replied to me.

Three thousand is the most difficult to get started. When everything is done, everything is easy to say. Good nature is constantly striving for practice, and bad nature is quickly eliminated. Please be diligent and diligent ~

 

 

Certificate certificate --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

 

Automatic it O & M requires convenient use of various tools for management and maintenance to effectively implement server protection.
Introduction to common linux O & M tools

1. The rsync tool is often used in many places to implement the synchronization effects of several servers
Our company uses this tool to synchronize the game server and client of the server. There are several examples in this article.

Rsync Hardening Technology (when firewall is enabled by manually modifying the port) and synchronizing only the desired server through the script

Inotify + rsync + mutt + msmtp automatically updates linux files or directories and sends emails to administrators.

2. There are many types of network services and services. Each company uses different services, but basic services must be mastered, such as FTP, DNS, SAMBA, and email, just take a look at the above. LAMP and LNMP must be skilled. I don't mean that light will be built, but that we must be familiar with the equivalent configurations in it, because the most critical part of the company is the WEB server, you must be familiar with nginx and apache, especially nginx. At least some companies still use tomcat, so it is best to learn about it. In fact, you don't have to worry too much about network services. Generally, the company's environment has been set up. Even if there are new servers or you want to correct them, the company will have relevant documents for your reference, it won't make you confused, but at least the relevant configuration must be well-known, and it must be compiled and installed many times. Those modules should be familiar with their functions, especially those modules in PHP.
These two points are only the foundation and necessary conditions. They cannot be said to be tools, but they are the real tools to be mastered.

Samba file sharing service (sharing scripts make your work easier)

How to install apache in Linux web Services (source code compilation and Custom Services)

FTP (with virtual users, and each virtual user can have independent attribute configuration)

Build a DHCP server in linux

 

3. shell scripts and another scripting language, which must be provided by O & M personnel. If you do not understand this, you may not be able to get started. At least you must write some system management scripts, the simplest thing is to write a script to monitor the CPU and memory ratio. This is the most basic thing. Don't think it will be helpful to write those guesses and calculate the number, for learning purposes only, writing system scripts is the most meaningful, and another scripting language is optional. Generally, it is 3 P, namely, python, perl, php, and php, unless you want to do development, I personally suggest learning python is better, and it is difficult to achieve automated O & M. perl is very powerful in text processing. It is enough to learn either of them.

Shell (1) Introduction to complex script examples and explanations

Shell (2) getting started with complex script instances (calculators)
4. sed and awk tools must be mastered. While Mastering these two tools, we must also master regular expressions. This is painful. Regular Expressions are the most difficult expressions to learn, however, the combination of sed and awk is very powerful and useful in processing text content and filtering WEB content. However, it is often used in combination with shell, so we will learn 3rd points by the way.
Sed concise tutorial
5. Text processing commands, such as sort, tr, cut, paste, uniq, and tee, must be learned at the same time when combined with 3rd points.


6. Database: mysql is preferred. Don't ask me why I don't want to learn sqlserver and oracle. because mysql is used for linux at most, adding, deleting, modifying, and querying are required, other aspects may not be needed, because the O & M personnel still need to check the most, which optimization and Development Statements won't be used by you.
Mysql (detailed ideas for manual compilation, addition, deletion, modification, query, authorization, and backup and restoration)


7. Firewall is hard to learn. Firewall is also a difficult problem. It is easy to say, easy to say, and the most important thing is to understand the rules. If you have learned CCNA, you may be more eager to learn, because iptables also has a NAT table, the principle is the same, and the FILTER table is used the most, it is certainly unqualified if you do not learn it.
Firewall (I) host firewall

Firewall (2) SNAT and DNAT


8. Monitoring tools are very important. I personally suggest you study these three tools, cacti, nagios, and zibbix. Enterprises should use nagios and zibbix at most. Learn them anyway, but nagios will be a little difficult, because it will involve writing automatic monitoring using scripts, which is difficult in that place.

CentOS 6.2 + Nginx + Nagios, SMS and QQ email reminder

Centralized Cacti detection on servers

9. This is very important for clusters and hot standby, and you must understand it. But when you get to the company, you will not be able to get it, because the new users will not touch you, and there are many cluster tools, it is best to learn LVS, this is required, it is best to also learn nginx cluster, reverse proxy, and hot backup, this can be achieved by more tools, for example, if my company develops a hot standby tool by itself, mysql also needs to learn Hot Standby, that is, master-slave replication. This is not easy to tell me, but it is not easy to learn the entire process, it's boring to just follow the instructions.

Mysql master-slave synchronization and dual-master synchronization. What should I do if the server is accidentally suspended?

Mysql High Performance stress testing (Summary For A Long Time)

Nginx cache configuration and troubleshooting

10. Data Backup: I can't do it without learning it. There are many tools, but at least I need to understand the RAID principle. In particular, the most common 1 + 0 or 0 + 1 for enterprises. I have to do my own experiments, there are many backup tools, such as tar and dump.
Forget it. You can get started with these 10 points, because some technologies are hard to learn, such as some important technologies in apache and nginx, such as system optimization, service optimization, and program optimization, which are hard to learn before getting involved with work, so let's take these 10 points first, it is estimated that you should be familiar with the script for at least three months. It is very difficult for you to learn the script part. I suggest learning the shell first and then learning another script language after work, this will be better.

Common MySQL commands


The above are the tools that linux O & M engineers need to master. In fact, there are still many tools to master, but it is very difficult for you to learn in the learning environment. I will remind you again, the tools mentioned here are equivalent to skills, rather than graphical tools like windows or ubuntu, which are useless. In addition, do not install a graphic interface when learning linux, so that virtual machines do not need to eat too much memory, we do not recommend that you install linux on a real machine to achieve the learning performance.

 

 

 

Certificate certificate --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

 

 

Linux O & M engineers

Accumulated experience
After more than four years of O & M, it is like a game cool upgrade. After the upgrade, the knowledge system and O & M system have also changed a lot and learned a lot of new knowledge points.

An O & M engineer is a process from being forced to become awesome. The premise is that you must be able to endure the competition and be able to perceive the changes in the current trend with a keen sense of smell. For example, this year's big data, artificial intelligence is quite popular... (Python is relatively popular)

I also talked about the basic O & M and found that the benefits for many people are quite high. Next I wrote about my O & M experience over the past four years. I have been engaged in game O & M for more than two years, more than one year of Security O & M, one year of big data O & M, the relevant industry information is not very proficient, but the familiarity and proficiency is relatively acceptable.

 

 

For topology details, see:


Intermediate
From my interview experience and interview with others. Some people think that O & M is to deploy a software and set some basic functions.

For example:After installing LAMP and LNMP, I feel that I have mastered the deployment methods. In fact, most of the websites have one-click installation scripts, which are not your highlights in the interviewer's eyes. Generally, the company's environment architecture is well deployed, and you rarely need to change the environment architecture. Even if you have installed the LNMP architecture, are you familiar with the principles, Nginx optimization, and MySQL optimization?

Another example:When I encountered a problem during the interview, the interviewer asked you about the role of Nginx reverse proxy since you are familiar with the LNMP architecture.
You should not explain how to optimize the software and configuration, but how to improve the website performance in depth.

1. Using reverse proxy can be understood as layer-7 Server Load balancer at the application layer. After using Server Load balancer, you can easily scale up the server cluster horizontally to improve the overall concurrency and compression capacity of the cluster.
2. Generally, the reverse proxy server has a local Cache function. the Cache of static resources effectively reduces the load on the backend server and improves performance.

The following describes the core technologies that O & M personnel need to master in their work.
Note that this is learned at work and is difficult to grasp in learning.

1. The first major troubleshooting Error
● Analyze the reasons why some programs cannot run or fail to run as expected, track the program running, and view the system call process.
● In-depth analysis of system bottlenecks.

View remaining memory:

Free -m
#-/+ buffers/cache:       6458       1649
#6458m is real use memory 1649m is real remaining memory (remaining memory + cache + buffer)
#Linux will use all the remaining memory as the cache, so to ensure the running speed of Linux, we need to ensure the cache size of memory


System Information:

Uname -a # view Linux kernel version information
Cat /proc/version # view the kernel version
Cat /etc/issue # view system version
Lsb_release-a # view system version need to install centos-release
Locales -a # lists all language families
All the codes in the current environment variable locale #
Check the time
Who # is currently online
W # current online user
Whoami # view the current user name
Logname # view the initial login username
Uptime # looks at the server startup time
Sar-n DEV 1 10 # view network card network speed traffic
Dmesg # displays boot information
Lsmod # view the kernel module


Hardware information:

More /proc/cpuinfo # view CPU information
Lscpu # view CPU information
Cat /proc/cpuinfo | grep name | cut-f2-d: | uniq-c # view CPU model and number of logical cores
The number of bits the getconf LONG_BIT # CPU runs
Cat /proc/cpuinfo | grep 'physical id' |sort| uniq-c # number of physical cpus
Cat /proc/cpuinfo | grep flags | grep 'lm' | wc-l # results greater than 0 support 64 bits
Cat /proc/cpuinfo|grep flags # to see if the CPU supports virtualization pae supports paravirtualization IntelVT supports full virtualization
More /proc/meminfo # view memory information
Dmidecode # view full hardware information
Dmidecode | grep "Product Name" # view server model
Dmidecode | grep-p-a5 "Memory\s+Device" | grep Size | grep-v Range # view Memory slot
Cat /proc/mdstat # view soft raid information
Cat/proc/scsi/SCSI # view Dell hard raid information (IBM, HP need official detection tools)
Lspci # view hardware information
Lspci |grep RAID # to see if RAID is supported
Lspci-vvv |grep Ethernet # view the network card model
Lspci-vvv |grep Kernel|grep driver # view the driver module
Modinfo tg2 # view driver version (driver module)
Ethtool - I em1 # view the network card driver version
Ethtool em1


● Analyze web logs through analysis. (For example, anti-fire software ,)
● Analyze system performance bottlenecks (IO/memory/cpu, common tools, and Sar/vmstat/iostat/ipcs for the shift key combination in the top command)

Common log management commands:

The history # diachronic command defaults to 1000
HISTTIMEFORMAT="%Y-%m-%d %H:% m :%S "# let the history command show the time
History-c # clear records command
Cat $HOME/.bash_history # historical command record file
Lastb -a # lists information about users who failed to log into the system and clears the binary log file echo >/var/log/btmp
Clean out the binary log file echo >/var/log/wtmp by default
Who /var/log/wtmp # view logged in user information
Lastlog # user last logged in
Tail-f /var/log/messages # system log
Tail-f /var/log/secure # SSH logging


2. Optimization
Optimization can be said to be the most delicious O & M skills. The general salary for O & M optimization is very high, and the optimization is at risk, it is not necessary to search for an article on the Internet to modify the configuration file or parameter, which is called optimization. This can easily cause downtime.

Optimization is performed based on the actual hardware parameters in the field environment to improve the software performance and website performance. I can only give a half-knowledge explanation. At that time, optimizing mysql and tomcat parameters was also based on the online articles and official documentation to find parameters and test the performance on the virtual machine.

Cost Optimization and performance optimization. Here I will provide tomcat optimization jvm parameters (put in the field environment only after corresponding tests): (remember not to perform monitoring optimization)
-Standard parameters, all jvm should support
-X non-standard, each jvm implementation is different
-XX: unstable parameter. The next version may be canceled.
Serial collector single-thread serialization
Parallel collector Multithreading

Start jvisualvm.exe to monitor dump memory overflow
-Xms: initial heap size
-Xmx: Maximum heap size
-Xss: thread stack size
-XX: NewSize = n: Set the young generation size
-XX: NewRatio = n: Set the ratio of the young generation to the old generation. For example, 3 indicates the ratio of the young generation to the old generation:. The young generation accounts for 1/4 of the young generation and the old generation.
-XX: Ratio vorratio = n: Ratio of the eden region in the young generation to the two vor regions.
-XX: MaxPermSize = n: sets the persistent generation size.

Collector settings
-XX: + UseSerialGC: sets the serial collector.
-XX: + UseParallelGC: set parallel collectors
-XX: + UseConcMarkSweepGC: sets the concurrent collector.

Collect statistics
-XX: + PrintGC
-XX: + PrintGCDetails
-Xloggc: filename

Tocmat optimization validation has several jvm virtual machines
Set JAVA_OPTS =
-Xms4g
-Xmx4g
-Xss512k
-XX: + AggressiveOpts optimized options. All optimization items are added.
-XX: + UseBiasedLocking optimization locks should be selected, paranoid locks
-XX: permSize = 64 m size of the original zone. A larger value is set for a maximum of 300 m categories.
-XX: MaxPermSize = 300 m
-XX: + DisableExplicitGC // System. gc () does not display gc calls
-XX: + UseConcMarkSweepGC use cms to shorten the corresponding time, concurrent collection, and low pause
-XX: + UseParNewGC parallel collection of new generation of garbage
-XX: + CMSParallelRemarkEnabled minimize mark time when UseParNewGC is used
-XX: + UseCMSCompactAtFullCollection: Enable compression for older generations to reduce fragments when concurrent collectors are used.
-XX: LargePageSizelnBytes = 128 m memory paging size improves performance
-XX: + UseFastAccessorMethods get/set Method to convert to local code
-Djava awt headless = true: Fixed a bug that may occur when tomcat handles icons in linux.

Memory Optimization:


"C:\Program Files\Java\jdk1.8.0_31\bin\java" -XX:+DoEscapeAnalysis -XX:+EliminateAllocations -XX:+UseTLAB -XX:+PrintGCDetails -Didea.launcher.port=7540 "-Didea.launcher.bin.path=E:\java\IntelliJ IDEA 2016.3\bin" -Dfile.encoding=UTF-8 -classpath "C:\Program Files\Java\jdk1.8.0_31\jre\lib\charsets.jar;C:\Program Files\Java\jdk1.8.0_31\jre\lib\deploy.jar;C:\Program Files\Java\jdk1.8.0_31\jre\lib\ext\access-bridge-64.jar;C:\Program Files\Java\jdk1.8.0_31\jre\lib\ext\cldrdata.jar;C:\Program Files\Java\jdk1.8.0_31\jre\lib\ext\dnsns.jar;C:\Program Files\Java\jdk1.8.0_31\jre\lib\ext\jaccess.jar;C:\Program Files\Java\jdk1.8.0_31\jre\lib\ext\jfxrt.jar;C:\Program Files\Java\jdk1.8.0_31\jre\lib\ext\localedata.jar;C:\Program Files\Java\jdk1.8.0_31\jre\lib\ext\nashorn.jar;C:\Program Files\Java\jdk1.8.0_31\jre\lib\ext\sunec.jar;C:\Program Files\Java\jdk1.8.0_31\jre\lib\ext\sunjce_provider.jar;C:\Program Files\Java\jdk1.8.0_31\jre\lib\ext\sunmscapi.jar;C:\Program Files\Java\jdk1.8.0_31\jre\lib\ext\sunpkcs11.jar;C:\Program Files\Java\jdk1.8.0_31\jre\lib\ext\zipfs.jar;C:\Program Files\Java\jdk1.8.0_31\jre\lib\javaws.jar;C:\Program Files\Java\jdk1.8.0_31\jre\lib\jce.jar;C:\Program Files\Java\jdk1.8.0_31\jre\lib\jfr.jar;C:\Program Files\Java\jdk1.8.0_31\jre\lib\jfxswt.jar;C:\Program Files\Java\jdk1.8.0_31\jre\lib\jsse.jar;C:\Program Files\Java\jdk1.8.0_31\jre\lib\management-agent.jar;C:\Program Files\Java\jdk1.8.0_31\jre\lib\plugin.jar;C:\Program Files\Java\jdk1.8.0_31\jre\lib\resources.jar;C:\Program Files\Java\jdk1.8.0_31\jre\lib\rt.jar;E:\java\new\out\production\new;E:\java\IntelliJ IDEA 2016.3\lib\idea_rt.jar" com.intellij.rt.execution.application.AppMain aa.T02
Heap
 PSYoungGen      total 38400K, used 3994K [0x00000000d5d80000, 0x00000000d8800000, 0x0000000100000000)
  eden space 33280K, 12% used [0x00000000d5d80000,0x00000000d61668b8,0x00000000d7e00000)
  from space 5120K, 0% used [0x00000000d8300000,0x00000000d8300000,0x00000000d8800000)
  to   space 5120K, 0% used [0x00000000d7e00000,0x00000000d7e00000,0x00000000d8300000)
 ParOldGen       total 87552K, used 0K [0x0000000081800000, 0x0000000086d80000, 0x00000000d5d80000)
  object space 87552K, 0% used [0x0000000081800000,0x0000000081800000,0x0000000086d80000)
 Metaspace       used 3072K, capacity 4494K, committed 4864K, reserved 1056768K
  class space    used 329K, capacity 386K, committed 512K, reserved 1048576K
Heap
 PSYoungGen      total 38400K, used 1147K [0x00000000d5d80000, 0x00000000d8800000, 0x0000000100000000)
  eden space 33280K, 3% used [0x00000000d5d80000,0x00000000d5e9ecb8,0x00000000d7e00000)
  from space 5120K, 0% used [0x00000000d8300000,0x00000000d8300000,0x00000000d8800000)
  to   space 5120K, 0% used [0x00000000d7e00000,0x00000000d7e00000,0x00000000d8300000)
 ParOldGen       total 87552K, used 0K [0x0000000081800000, 0x0000000086d80000, 0x00000000d5d80000)
  object space 87552K, 0% used [0x0000000081800000,0x0000000081800000,0x0000000086d80000)
 Metaspace       used 3072K, capacity 4494K, committed 4864K, reserved 1056768K
  class space    used 330K, capacity 386K, committed 512K, reserved 1048576K
If the thread local cache uses Eden, more will be used when it is turned on



No parameters before tomcat participate in the results of about 605 tuning per second and about 435 to 3 times per second


3. Development skills
Shell and python are preferred. Currently, shell cannot meet your needs or is very inefficient. Therefore, it is the best choice to use automated python. Currently, the general recruitment requirements are as follows: Write shell, python, and perl scripts, and select python for your personal choice.

Python is easy to understand.

Python is very rich in server management tools. Configuration Management (saltstack) Batch execution (fabric, saltstack) Monitoring (Zenoss, nagios plug-in) virtualization Management (python-libvirt) process management (supervisor) cloud computing (openstack )...... most system C libraries are bound to python.

The process must be incorporated into the system management system, written as a program, and become part of the system. Instead of reusing all types of independent scripts.

With the advent of the cloud computing era, small and medium enterprises do not need O & M. Large companies, O & M without engineering development capabilities, are not competitive.

The most important thing to do is to learn how to increase your salary in python. (The important thing Is said three times .)
I am also learning python, And I am converting the previous shell script instance into a python script.

Python notes: python instance Manual (read all the time)
Download link: http://down.51cto.com/data/2329173

4. Consciousness
1) security awareness:
O & M personnel have high permissions, so they must ensure the security of the account/private key.
● It is best to use encryption tools for storage. For example, truecrypt and lpassword
● Based on local storage. Do not use network disks, or use lastpass.
● Add a password to the ssh Private Key

2) awareness of sharpening:
For any operation configuration, it is best to first understand the operation or configuration principles, and then perform the operation. A word should be called "cutting power without mistake", and similar operations can be put in the opposite way.

3) planning awareness:
Complex change operations, such as multiple hosts and san storage, are involved. It is best to make an operation plan first, write a plan document, and send each command in detail, and then ask the experts to help you review it. This maximizes the security of the entire operation process. If it is an important customer's business system, it is best to have a rollback solution. Once the change fails, the customer can roll back the business in a short time.

4) recording sharing Awareness:
When encountering a special case, remember to write the case process and analysis documents. It is also convenient for you to look at it later, or share it with other siblings for dissemination of knowledge so that you can avoid detours in the future.

5) Monitoring awareness:
For O & M, monitoring is very important. monitoring is the eye for detecting system exceptions. Therefore, O & M should work closely with monitoring.

6) business awareness:
Measure the test taker's knowledge about the service types of each host and the relevance between the services of each host. Any maintenance work provides service services for the primary function. When a service is interrupted, you can quickly know the host Group related to the service, so as to narrow down the troubleshooting scope and locate the fault as quickly as possible.

Attached is the O & M idea topology:

Let's take a look: Security O & M philosophy (half god, half fairy, and migrant workers)

Consciousness is very important. It doesn't mean that you don't need O & M consciousness because you are familiar with many technologies. In fact, leaders are very concerned about O & M consciousness, such as whether to back up data, permission allocation problems, Platform Testing, fault response time, and so on, are all consciousness. Instead of learning a lot of technologies, you can think of yourself as a master. If the platform discovers a fault, you have nothing to do with it, if you think that a simple problem is handled as you like, you don't need to give feedback to other departments. Leaders don't look at your technology, but at your O & M consciousness. You don't have the O & M consciousness, technology is useless, but it will make people in other departments not coordinate with you.

You need to know that IT is a tough job. You need to learn IT in an endless manner. If you don't want to learn IT, you will only get rid of IT. If you don't want to be young, you will have to add value to yourself, otherwise, you will not be able to raise your salary, but you will not be able to engage in this business.

In this world, we are quietly punishing those who will not change...

 

Certificate certificate --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------


Haozi's O & M transcript

Basic skills required for linux O & M engineers

Linux O & M engineers

 


Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.