Linux Operations Engineer Growth process

Source: Internet
Author: User
Tags perl script rsync website performance nginx reverse proxy saltstack

Original address: 78059331

Beginner's article

Linux OPS Common tool topology see:

1rsync Tools

Rsync tools are often used in many places to implement synchronization effects on several servers. Our company is using this tool to complete the server game service side and client synchronization, there are several examples of articles:

    • Rsync Hardening Technology (manually modifying ports to turn on the firewall) and synchronizing only the required servers via scripting

http://chenhao6.blog.51cto.com/6228054/1322579

    • INOTIFY+RSYNC+MUTT+MSMTP implementation of Linux files or directories automatically updated and implemented to send mail to the administrator

http://chenhao6.blog.51cto.com/6228054/1298375

2 Network Services

There are many kinds of services, each company will use a different, but the basic services must be mastered, such as FTP, DNS, SAMBA, mail, these are about to learn about the line, lamp and LNMP is to be skilled, I mean not the light will build, but to be very familiar with the inside of the quite configuration to line, Because the company's most critical is definitely a Web server, so nginx and Apache to be familiar with, especially nginx must be very familiar with, at least some companies will use Tomcat, this is also the best to learn.

In fact, the network services do not have to worry too much, the general company's environment has been set up, even if there is a new server or let you rectification, the company will have the appropriate documentation to make you refer to, will not let you mess, but at least the relevant configuration must be learned, and it is certainly compiled and installed more, those modules should be familiar with In particular PHP those modules.
This 2 point is only the basis, but also the necessary conditions, can not be said to be a tool, the following is the real tool to master.

    • Samba file sharing service (shared scripts make it easier for you to work)

http://chenhao6.blog.51cto.com/6228054/1218028

    • Linux Web Service installation Apache Idea (source code compilation, own definition service)

http://chenhao6.blog.51cto.com/6228054/1223484

    • FTP (Holds virtual users, and each virtual user can have a separate property configuration)

http://chenhao6.blog.51cto.com/6228054/1219713

    • Build a DHCP server under Linux

http://chenhao6.blog.51cto.com/6228054/1217232

3 Scripting languages

Shell script and another scripting language, the shell is the OPS personnel must have, do not understand this connection is not possible, at least to write some system management scripts, the simplest also have to write a monitor CPU, memory ratio of the script bar, this is the most basic, do not think will write those guessing numbers and calculate what number, These do not work, only for the purpose of learning, writing system scripts is the most meaningful, and another scripting language is optional, generally 3P, that is, Python, Perl and php,php do not need to consider, unless you want to do development, I personally suggest learning Python will be better, difficult to automate operations, Perl is a very powerful text processing, anyway, these two learn one on the line.

    • Shell (i) Getting started into complex scripting examples and explanations of your own

http://chenhao6.blog.51cto.com/6228054/1230337

    • Shell (ii) Getting started to a complex script instance (Calculator)

http://chenhao6.blog.51cto.com/6228054/1232070

4sed and awk Tools

These two tools have to be mastered while mastering regular expressions, which is painful and the most difficult to learn, but combined into sed and awk is very powerful, useful when working with text content and filtering Web content, but often used in conjunction with the shell at the same time. So the 3rd will be the way to learn the 4th.

    • A concise tutorial on SED

Https://coolshell.cn/articles/9104.html

5 Text Processing commands

Sort, tr, cut, paste, uniq, tee and so on must learn, also combined with the 3rd scripting language.

6 Database

Preferred MySQL, do not ask why I do not learn SQL Server and Oracle, because Linux is used to be the most absolute MySQL, additions and deletions to learn, especially to learn, and other aspects may not be necessary, because operations and maintenance personnel use the most or check, which optimization and development statements will not let you get.

    • MySQL (Manual compilation of detailed ideas, and additions and deletions, authorization, backup and restore)

http://chenhao6.blog.51cto.com/6228054/1225129

7 Firewall

Firewall is also a difficult point, said difficult, say easy, the most important to understand the rules, if you learn CCNA friends may be more studious, because Iptables also has a NAT table, the principle is the same, and the filter table with the most, anyway, not to learn is certainly unqualified.

    • Firewall (a) host-type firewall

http://chenhao6.blog.51cto.com/6228054/1239306

    • Firewalls (ii) Snat and Dnat

http://chenhao6.blog.51cto.com/6228054/1240714

8 Monitoring Tools

I personally suggest that the best to learn these 3: Cacti,nagios,zabbix, the enterprise should use the most should be Nagios and Zabbix, anyway to learn it, but Nagios will be a bit difficult, because it will involve the use of scripting automatic monitoring, the place is difficult.

    • CentOS 6.2+nginx+nagios, mobile SMS and QQ email alerts

http://chenhao6.blog.51cto.com/6228054/1323192

    • Server Centralized detection cacti

http://chenhao6.blog.51cto.com/6228054/1249302

9 Clusters and Hot spares

This is very important, must understand, but to the company will not let you go, because the novice basically do not let you touch, cluster tools have a lot, the best learning is LVS, this is required to learn, the best also learn nginx cluster, reverse proxy, and hot spare, this on more tools can be achieved, like my company is the development of their own tools. MySQL hot-standby also to learn, is the master-slave replication, this must learn to understand the whole process is not easy, just do not mean to do at all.

    • MySQL master-slave synchronization, dual-master synchronization, if the server accidentally hangs, does not sync what to do

http://chenhao6.blog.51cto.com/6228054/1325247

    • MySQL High performance stress test (summed up for a long time)

http://chenhao6.blog.51cto.com/6228054/1314418

    • Nginx cache configuration and Error resolution

http://chenhao6.blog.51cto.com/6228054/1329106

10 Data backup

There are many tools, but at least to understand the principle of raid, especially the most commonly used 1+0 or 0+1, their own experiments to get out, backup tools, such as tar, dump, it is better to learn more.

Learn above 10 points, should be able to get started, some technology will be more difficult to learn, such as Apache and Nginx also have some very important technology, such as system tuning, service optimization, program optimization, these are not in contact with the work difficult to learn before, so the first 10 points to learn it, estimated to learn at least 3 months, The script part will be very laborious, I suggest to learn the shell first, and then learn another script language after work, this will be better.

These are the tools that the Linux OPS engineers need to master, there are a lot of tools to master, but in the learning environment is difficult to learn, and finally I would like to remind that the tool here is equivalent to skills, rather than windows or Ubuntu graphics tools, and learn Linux do not install graphical interface , so that the virtual machine does not have to eat too much memory, and it is absolutely not recommended to install Linux on the real machine, can not achieve the learning effect.

Intermediate article

This part comes from my own interview experience and interviews with other people's experience summary. First, attach the operation and maintenance idea topology diagram:

Some people think that, in fact, operation is to deploy a software, set some basic functions, even if the operation of the dimension.

For example: Installing LAMP,LNMP, I have mastered the deployment method. In fact, most of the Internet has a button to install the script what is not what the technical content, in the eyes of the interviewer, these are not your highlight. Basically to the company general environment architecture is deployed well, rarely need you to change the environment architecture. Even if you install the LNMP architecture, are you familiar with the principles? Are you familiar with Nginx optimization? Are you familiar with MySQL optimization?

Another example: I interview encountered problems, the interviewer asked you since familiar with the LNMP architecture, then the role of Nginx reverse proxy.

You should not tell the software and configuration, as much as you can say how to optimize, how to improve the performance of the site.

    • The use of reverse proxy can be understood as a 7-tier application layer of load balancing, the use of load balancing can be very convenient scale-out server cluster, the overall concurrency of the cluster, and improve the ability to stress.
    • Usually the reverse proxy server will have the local cache function, through the cache of static resources, effectively reduce the pressure of the back-end server, thereby improving performance.

Let's talk about the core technologies that operations need to master at work. Note that this is mastered at work and difficult to master in learning.

1 The first major row of errors

    • Analyze some programs that are not running or do not run as expected, run a trace on the program, and review the process of system calls.
    • More in-depth analysis of system bottleneck points.

To view the remaining memory:

Free-m

#-/+ buffers/cache:6458 1649

#6458M为真实使用内存 1649M is the real remaining memory (remaining memory + cache + buffer)

#linux会利用所有的剩余内存作为缓存, so to ensure that the Linux run speed, you need to ensure that the memory cache size

System Information:

Hardware information:

    • Analysis of web logs using analytic systems (e.g., anti-fire software)
    • Analyze system performance bottleneck (IO/MEMORY/CPU, common tool, top command SHIFT key combination for special sar/vmstat/iostat/ipcs)

Log Management Common commands:

2 optimization

Optimization can be said to be the most popular operation and maintenance skills, the basic will optimize the operation and maintenance of general wages is very high, and optimization is to take risks, not to search for an article on the Internet to change the configuration file or parameter is called optimization, so it is easy to cause downtime.

Optimization is based on the actual field environment hardware parameters of the partial optimization, improve software performance and website performance. This is the only half-knowledge I can talk about, then the optimized MySQL and Tomcat parameters are also tested on the virtual machine based on the online article and the official website document lookup parameters and then view the performance.

Cost optimization, performance optimization. Here I give Tomcat to optimize the JVM parameters (which have been tested before being put into the field): (Remember no monitoring is not tuned)

-Standard parameters, all JVMs should support

-X non-standard, each JVM implementation is different

-xx unstable parameters, the next version may be canceled

Serial collector single-line program listing

Parallel collector multithreading

Start Jvisualvm.exe Monitor dump memory overflow

-XMS: initial Heap Size

-XMX: Maximum Heap Size

-XSS: Thread stack size

-xx:newsize=n: Setting the young generation size

-xx:newratio=n: Set the ratio of young generations to older generations, such as 3, marking the younger generation: the ratio of older generations to 1:3, the younger generation accounted for the entire young generation of old generation and 1/4

-xx:survivorratio=n: The ratio of the Eden area in the young generation to the 2 survivor districts.

-xx:maxpermsize=n: Setting the persistent generation size

Collector Settings

-XX:+USESERIALGC: Setting up the serial collector

-XX:+USEPARALLELGC: Setting up a parallel collector

-XX:+USECONCMARKSWEEPGC: Setting the concurrency Collector

Recycling statistics Information

-xx:+printgc

-xx:+printgcdetails

-xloggc:filename

Tocmat optimization to confirm that there are several JVM virtual machines

Set java_opts=

-xms4g

-xmx4g

-xss512k

-xx:+aggressiveopts Offensive optimization options, all optimized items are added

-xx:+usebiasedlocking optimization lock, basically have to choose, paranoid lock

-xx:permsize=64m original area size, maximum 300m class set a little bit larger

-xx:maxpermsize=300m

-XX:+DISABLEEXPLICITGC//SYSTEM.GC () does not show call GC

-XX:+USECONCMARKSWEEPGC using CMS to shorten the corresponding time, concurrent collection, low pause

-XX:+USEPARNEWGC collects new generation of rubbish in parallel

-xx:+cmsparallelremarkenabled to minimize Mark's time in the case of using USEPARNEWGC

-xx:+usecmscompactatfullcollection when using concurrent collectors, turn on compression for older generations, which reduces fragmentation

-xx:largepagesizelnbytes=128m memory paging size improves performance

-xx:+usefastaccessormethodsget/set method into local code

-djava Awtheadless=true Fixes a bug that can occur when Tomcat handles icons under Linux

Memory Tuning:

Tomcat before any parameters did not participate in about 605 per second, tuning about 435 per second, nearly 3 times times the result.

3 Development Skills

Choose automated Python as the best choice for Shell and Python, now that the shell doesn't meet your needs or is inefficient. Now the general recruitment requirements, will write Shell or Python, Perl script, personal choice or choose Python.

The Python language is fast and easy to understand. Very rich in server management tools, configuration management (Saltstack) batch execution (Fabric, saltstack) monitoring (Zenoss, Nagios plug-in) virtualization Management (PYTHON-LIBVIRT) process Management (Supervisor) cloud computing (OpenStack) ... And most of the system C libraries have Python bindings.

For the process to determine the matter, eventually must be included in the system management systems, written as a program, as part of the system. Instead of being able to reuse various scripts that are free and monolithic.

With the advent of the era of cloud computing, small and medium-sized companies do not need to be operational, large companies do not have engineering development capabilities of operations, is not competitive.

The most important to learn Python can raise wages, can raise wages, can raise wages. (important thing to say three times ~) I am also learning python, and are converting instances of previous Shell scripts into Python scripts.

Recommended Python notes: python example manual

Download Link: http://down.51cto.com/data/2329173

Consciousness Chapter

1 Safety Awareness

Operation and maintenance personnel have a large number of permissions, so be sure to ensure that the account/private key security.

    • It is best to use cryptographic tools for storage. such as TrueCrypt, Lpassword.
    • Based on local storage. Do not use a network disk, nor do you recommend using LastPass, etc.
    • SSH private key Add password

2 Sharpening Consciousness

For any operational configuration, it is best to understand the operation or configuration principle before you proceed. Should be a word called "sharpening does not mistake chopping power", and for similar operation can be extrapolate.

3 Planning Awareness

Complex change operations such as multiple hosts and SAN storage, it is best to make an operational plan, write a plan document, detail each command, and then ask the expert to help review. This maximizes the safety of the entire operating process. If it is an important customer business system, it is best to have a fallback scenario, and once the change fails, the customer can roll back the business within a short period of time.

4 Record Sharing awareness

When you encounter a particular case, remember to write the case process and the analysis document. Also convenient to look at later, or with other brothers to share, for the dissemination of knowledge to facilitate everyone in the future can be less detours.

5 Monitoring Awareness

Operation and maintenance, monitoring is very important, monitoring is the detection system of various anomalies of the eye, so operation should be closely related to monitoring.

6 Business Awareness

Try to understand the types of services that are maintained on each host, as well as the connectivity between the host businesses. Because any maintenance work is for the host to provide business services, when a business interruption, the fastest to know the host group associated with this business, thereby reducing the scope of troubleshooting, the fastest location failure.

is not your technology is very cow, learn a lot of technology is very familiar, does not mean that you do not need operational awareness, in fact, the leadership is very important to operational awareness, such as there is no good backup, rights distribution problems, platform testing, failure response time, and so on, these are awareness, and not you learn a lot of technology to admit Daniel, Platform discovery failure You have no big son, think very simple problem like handling, do not need to feedback to other departments, etc., the leadership is not to see how your technology, but to see how your operations awareness, you do not have operational awareness, technology and cattle are useless, will only let other departments of people with you do not coordinate.

To know that it is hard to do it, need endless learning, do not learn will only be eliminated, do not want to be a young elimination, you can only continue to value their own, or not you can not raise wages, but you can no longer engage in this line.

The world, in the quiet punishment of the people who do not change ...

Linux Operations Engineer Growth process

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.