What Should O & M engineers learn? [To cainiao]

Source: Internet
Author: User
Tags nginx reverse proxy

Many of my new Linux O & M colleagues don't understand What O & M engineers do? What are the purposes of learning these things? Today, I would like to sum up and hope to help the next step into the line form a whole idea. The red font below represents the free open-source Linux tool to be mastered.

 

What O & M engineers do

Summary

1. ensure the long-term stable operation of services (such as website servers and game servers ).

2. ensure data security and reliability (such as user name and password, game data, blog articles, and transaction data ).

 

Let's explain what O & M engineers need to learn.


1. Ensure long-term stable business operation

If something goes wrong, the user will complain.

 

1. What is the business running on?

Website servers are generally Apache, nginx, and tomcat. However, the MySQL database is also required to store user passwords and other features. Many programs require PHP parsing. Therefore, lnmp, lamp (nginx, Apache, MySQL, and PHP) Environment deployment is a required skill.

 

2. How can I know the problem in time?

This requires monitoring software to send emails or text messages to you. Commonly used tools include zabbix and Nagios. An email program, Sendmail or Postfix, is also required for sending alerts.

 

3. I receive an alarm at home, but the server is an intranet IP address. How can I solve the problem?

Set up openvpn, PPTP, or openswan in the company, and dial Intranet via VPN at home to Solve the Problem 24 hours a day... Alas, I got up in the middle of the night and solved the problem with no salary.

 

 

Ii. Ensuring data security and reliability

If something went wrong, the lead will ask you for tea.

 

1. Sometimes you need to manually change the database content?

Therefore, you need to add, query, modify, and delete commands for the basic MySQL database.

 

2. What if the database server hardware is broken?

A slave database is required for temporary use, so MySQL master-slave replication is required.

 

3. What should I do if I want to restore the database?

Therefore, you need to regularly perform full backup of Mysql Data in crond for restoration and use. If you want to restore to a specified time point, you must also learn MySQL Incremental backup and recovery.

 

4. What if the image or file server uploaded by the user is broken?

Regular backup may not be enough. You need to use rsync and inotify for real-time backup. So that the master server breaks down at any time, and all images can be backed up for recovery.

 

5. Be careful with hackers and increase server security?

SSH is easy to allow access from outsiders. Therefore, only the company's IP address or stepping stone IP address is allowed to be accessed. These are controlled through iptables.

 

Iii. High Performance

A small company will be awesome one day. If you cannot afford it, you can jump to a large company.

 

1. How can an increasing number of users access our website?

So we need multiple Web servers, but how can we achieve load balancing between multiple servers? This requires nginx reverse proxy or LVS + keepalived or haproxy + heartbeat.

 

2. Too many articles and comments are posted during user registration. What should I do if a database cannot resist?

Database pressure can be divided into read and write. If the write cannot resist, you need to split tables into multiple databases. If the read pressure is insufficient, you can use MySQL-proxy read/write splitting to share the read pressure. The simpler and more convenient method is to put the content in the database to the memory, so memcache or redis is used.

 

3. What should I do if the disk cannot withstand the upload and download of N users?

Make multiple disks into RAID, or use distributed storage file systems such as mFs and glusterfs to improve disk read/write capabilities.

 

4. When there are a lot of images on the website, users will always respond that loading the website is too slow. What should I do?

In this case, you can cache images on the website to the front end of the website through squid or varnish to increase the access speed as much as possible. Of course, it is best to purchase commercial CDN acceleration.

 

5. carriers are a big problem. The bandwidth between them seems to be very small. Why is it so slow for China Unicom IP addresses to access my Telecom website?

In this case, you can use bind to build a DNS server, direct the website's DNS records to the self-built DNS server, configure the resolution rules, and then resolve the unicom ip address to the UNICOM website, resolve the Telecom IP address to the telecom website, and the experience will be much better.

 

 

Iv. Automation

Ultimate Goal: Dead machines and idle people.


1. The company bought 100 new servers, and the company actually had one mobile optical drive. When will this system be installed?

Use kickstart or Cobbler to remotely and automatically install the system.

 

2. After each device is installed, a lot of content needs to be optimized. What file descriptors, ports, and software installation should I do without manual operations?

Quickly learn shell, which will free up a lot of work.


3. After the system is installed, you need to enter the password for login. How many passwords are there?

Use keep CT to automatically read the prompt to enter the password and execute the command.


4. What should I do if I want to release new code to an online server in batches?

Use saltstack, puppet, or ansible.

 

V. Others

1. Five servers are required for the entire test environment, but the company only has one idle server?

Learn xen, KVM, or docker, and Virtualize multiple servers to solve resource problems. Docker in particular, we strongly recommend that you deploy a new environment for a certain R & D personnel to help you solve the problem in minutes.


2. Is O & M personnel always responsible for code Control and Permission control of R & D personnel?

SVN or git. This is definitely necessary.

 

End:

Now let's look back and think about what O & M engineers do at ordinary times?

1. Resolve alarm faults at any time.

2. Update the business program.

3. Write some scripts to monitor or complete other functions that can be automatically completed.

4. Complete O & M architecture, and deploy some open-source tools that are more convenient to use and have better performance, and develop O & M process specifications.

5. Miscellaneous, such as switch adjustment, system installation, and new environment deployment.


We strongly recommend the old boys blog:

Summary of maintenance tools that O & M personnel must be familiar

Http://oldboy.blog.51cto.com/2561410/775056

 

 

 


This article is from the "zhuyun" blog and will not be reproduced!

What Should O & M engineers learn? [To cainiao]

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.