Cloud-era OPS engineers should have higher skill requirements

Source: Internet
Author: User

Recently, Yelp's SRE engineer Dmitriy Samovskiy published an article titled "The Operation of the new Era", in which he briefly introduced changes in the focus and role of operations in the context of cloud computing.

In fact, 6 years ago, Dmitriy wrote an article on the trend of DevOps, in which he argued that system administrators need to develop beyond simple scripts and focus on server stability and uptime. But over the past six years, with the development of cloud computing and other technologies, operation and maintenance related work has already been innovated, so Dmitriy re-reflects the current operation and maintenance related technical work. Here is the core point of the article.

  1. Why does the operation dimension change? Will this change go on?

There are two main reasons for this:

    • The rise of IaaS cloud services has profoundly changed operations. Infrastructure is code, and OPS is no longer a traditional server. Cloud vendors can standardize everything and then package it to customers in the form of a service.
    • The Ops people themselves also have more software development skills. OPS are no longer confined to scripts, turning into better developers and taking on more responsibilities.

Research and development production and operations will become more and more difficult to split. No longer have a dedicated operations team, each project development team can do their own product operations. Changes in this role do not imply the disappearance of operations and, on the contrary, operational skills, knowledge and experience are still required.

  2, extensibility has become the focus of operation and maintenance work

In the previous server era, the operations team's main task was to create the environment and maintain the stability of the production. Today, the operations team's focus has shifted to how to improve the scalability of the product. If the scalability is not well done, the traffic load can cause a variety of related issues (session conflicts, user congestion, and data collection size mismatch). If you do a good job of extensibility, your product will run safely and efficiently. This is especially important for high-risk businesses in some financial categories.

With the continuous expansion of service scale, manual system management has become impossible to complete the task, automated operation and maintenance will be an unavoidable choice. One view of the industry is that DevOps is the only way to automate operations; Dmitriy believes that DevOps is a culture that will naturally evolve as businesses grow.

  3, operation and maintenance capacity has become the technical foundation of enterprises

Previously, the team of research and Development engineers was only responsible for developing the product, not responsible for improving the development efficiency of the work, such as: code reuse, implementation mode, user library, core API and so on. Now, some of the DevOps culture companies have merged these efforts.

Some large websites such as Facebook, Google, etc. have full-time SRE (site reliability Engineer) website Reliability Engineers, also known as the Application Operations Division in China. The corresponding responsibilities include capacity planning and implementation, cluster deployment, data center fault tolerance, load balancing and monitoring.

  4, the company at different stages of development on the operation and maintenance of different needs

    • Start-up phase

What the company needs most in the start-up period is how to develop the function that satisfies the market demand, not trifles to catch the Operation dimension. Because the enterprise on the one hand does not have a large number of users, on the other hand, operation and maintenance will consume too much technical resources, so more energy should be put on the rapid iteration, new function development. This period, recommended enterprises adopt Noops.

    • A period of rapid expansion

In the age of the Internet, traditional businesses are starting to provide services directly to their customers, but older operations are not capable of large traffic loads. At this time, operation and maintenance work need Internet scale (Webscale). Webscale It is a new concept relative to the traditional it architecture, meaning that the system can handle a large number of computations, withstanding high loads, strong fault tolerance and continuous deployment and delivery capabilities, and efficient operation and maintenance.

Cloud-era OPS engineers should have higher skill requirements

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.