What Is The Difference Between SRE and DevOps?

Source: Internet
Author: User
Keywords sre devops difference between sre and devops what is sre devops
DevOps and SRE
Although DevOps and SRE are now extremely popular. I found that there are still some misunderstandings about these two positions, so I organized some of my insights into articles for your reference.

The most common misunderstandings:

New DevOps concept, so advanced
SRE is an advanced version of DevOps
O&M can easily turn to DevOps engineer
DevOps and SRE definition
DevOps is a combination of literal Dev development/Ops operation and maintenance. In the strict sense, DevOps is as follows (via DevOps-Wikipedia):

DevOps (a combination of Development and Operations) is a culture, movement, or convention that values communication and cooperation between "software developers (Dev)" and "IT operations technicians (Ops)."
The full name of SRE is Site Reliability Engineering, which was first proposed by Google and carried forward in its engineering practice. They also published a book of the same name "Site Reliability Engineering", so that this concept is widely spread in the circle of Internet engineers.

Google explained SRE (via Site Reliability Engineering-Wikipedia):

Site reliability engineering (SRE) is a discipline that incorporates aspects of software engineering and applies that to operations whose goals are to create ultra-scalable and highly reliable software systems.
I translated it into Chinese:

The website stability engineer is a software engineer who is committed to creating a "highly scalable and highly available system" and implementing it as a principle.
By definition, DevOps is culture, sports, and practices, and SRE is a position with strict job requirements. Culture is a soft definition, and there are more concepts of culture that can be fabricated, and the precise definition of SRE reduces the imagination space (it may also have a high SRE threshold). According to Google, SRE engineers practice DevOps culture. This view is correct, but domestic DevOps are gradually independent of DevOps engineers, so in this article, I focus on the comparison of two positions of DevOps engineer and SRE engineer.

Both produce background and history
Internet demand has spawned DevOps. In the most traditional software companies, only Dev has no Ops. At that time, Ops may still be just technical support personnel. Development follows the waterfall flow: requirements analysis, system design, development, testing, delivery, and operation. Traditional software release is a heavyweight operation. Once released, Dev is almost no longer operating directly. After 80, you may remember that QQ will have a big version released every year, QQ 2000/2003/2004 and so on. At this time, Ops does not need to have direct high-frequency contact with Dev, and even for some pure offline businesses, there is no Ops position.

After the Internet wave, the software has evolved from desktop software to website and mobile phone applications in the traditional sense. At this time, core business logic, such as transactions and social activities, are not completed on the user's desktop, but on the back end of the server. This gives Internet companies a huge operating space: they can change business logic at any time, which promotes rapid iterative business changes. But even so, Dev and Ops are extremely divided. Ops does not care how the code works, Dev does not know how the code runs on the server.

When the industry was still immersed in the joy of being able to release a version every week, in 2009, Flicker proposed the concept of releasing 10+ times a day, which greatly shocked the industry. Flicker proposed several core concepts:

Rapid business development, need to embrace change, run fast in small steps
Ops goal is not for website stability and speed, but to promote rapid business development
Improve Dev/Ops connection based on automation tools: code version management, monitoring
Efficient communication: IRC / IM Robot (now those ChatBot routines were played by Flicker 10 years ago)
Communication culture of trust, transparency, efficiency and mutual assistance
It is really unimaginable. Today, various training companies and some well-known big Vs are calling these DevOps ideas, and they are most vividly displayed in a 2009 slide. Classics are always outdated, shining with wisdom under the dust. Some people equate DevOps with O&M automation, which is only a representation. The goal of DevOps is to increase the speed of business system delivery and provide related tools, systems and services. Some individuals or training institutions add fuel to vinegar and derived meanings, which are scattered around the essence of DevOps.

Next, let's talk about the history of SRE. SRE appears later. In 2003, Google’s Ben Treynor recruited several software engineers. The purpose of this team was to help Google’s production environment services run more stable, robust, and reliable. Unlike small and medium-sized companies, Google serves more than one billion users, and short-term service unavailability can have fatal consequences. So Google is at the forefront of the times and SRE is born. This position is a large-scale cluster service, and small teams do not need this position setting (may not be able to recruit real SRE). After exploring Google for several years, the SRE team began to write their own experiences online and published the book in 2016.

The functions of the two are different
Now many companies extract the DevOps functions individually and call them DevOps engineers. Let's take a look at what DevOps engineers care about: DevOps culture is aimed at the speed of delivery, and DevOps engineers will naturally care about the entire life cycle of the software/service. A simple formula: speed = total amount / time, add engineering terminology, that is, delivery speed = ((functional characteristics * engineering quality) / delivery time) * delivery risk.

Functional features are left to the product manager and project manager to manage. DevOps engineers need to care about the remaining factors: engineering quality / delivery time / delivery risk. The functions of DevOps engineers are as follows:

Manage the entire life cycle of the application (demand, design, development, QA, release, operation)
Focus on improving the efficiency of the whole process, dig out the bottlenecks and solve them
Design and development of automated operation and maintenance platform (standardization, automation, platformization)
Support operation and maintenance system, including virtualization technology, resource management technology, monitoring technology, network technology
The key words of SRE are "high scalability" and "high availability". High scalability means that when the number of service users explodes, the application system and its supporting services (server resources, network systems, database resources) can be expanded without increasing the system structure or strengthening the performance of the machine itself, only by increasing the number of instances. High availability means that when any link in the application architecture becomes unavailable, such as application services, gateways, databases, and other systems hang up, the entire system can be restored and services provided within a predictable time. Of course, since "high" is available, this time is generally expected to be in the minute level. The SRE function can be summarized as follows:

Provide selection, design, development, capacity planning, tuning, and troubleshooting for applications, middleware, and infrastructure
Provide business systems with decision-making based on availability and scalability, and participate in business system design and implementation
Locate, deal with, and manage failures, and optimize related components that lead to failures
Improve resource utilization of various components
Different job content
Different responsibilities lead to different job contents of the two positions. I list the functions of DevOps engineer and SRE engineer as follows:

DevOps
Set the application life management cycle system and reverse the process
Development, management, development engineers/QA engineers use development platform system
Development, management and release system
Development, selection, management monitoring, alarm system
Development, management authority system
Development, selection, management CMBD
Change management
Management failure

SRE
Change management
Management failure
Develop SLA service standards
Development, selection and management of various middleware
Development and management of distributed monitoring system
Development and management of distributed tracking system
Development, management, performance monitoring, detection system (dtrace, flame graph)
Development, selection, training, performance tuning tools
It's an interesting comparison. Both DevOps and SRE care about the application lifecycle, especially changes and failures in the lifecycle. But DevOps work content is mainly for the development link service, a DevOps Team usually provides a series of tool chains, which will include: development tools, version management tools, CI continuous delivery tools, CD continuous release tools, alarm tools, fault handling . The SRE Team pays more attention to changes, failures, performance, and capacity-related issues, which will involve specific businesses. The output tool chain will include: capacity measurement tools, Logging logging tools, Tracing call link tracking tools, Metrics performance measurement tools, Monitoring alarm tools, etc.

DevOps and SRE relationship
DevOps is first and foremost a culture, and gradually becomes an independent post at a later stage; SRE is clearly a post at the beginning; many students confuse DevOps and SRE. They are confused by the apparent lock of the two. It seems that both have tools. The attributes and automation requirements are similar. Some development students even understand this kind of operation and maintenance work as: server + tool + automation. This is a blind man touching the elephant, spying on the leopard.

In terms of skills, both require strong O&M skills. On the career development ceiling, DevOps may lack SRE's skills in some professional fields: computer architecture capabilities; high throughput and high concurrency optimization capabilities; scalable system design capabilities; complex system design capabilities; business system investigation capabilities. Both require soft power, but SRE faces higher complexity, greater challenges, and higher requirements:

DevOps is of universal significance. Modern Internet companies need DevOps, but not all teams have high availability and high scalability requirements, and they do not need SRE. After DevOps engineers master the relevant skills, they also have the opportunity to develop into SRE engineers. A qualified SRE engineer, I have no choice but to transform into a DevOps engineer.

From a professional background, both DevOps and SRE engineers need a background in R&D. The former needs to develop a tool chain, and the latter requires strong architectural design experience. If an operation and maintenance engineer wants to transform into DevOps or SRE, then he needs to add relevant technical knowledge. After all, you can call yourself a DevOps / SRE engineer without building a Jenkins + Kubernetes.

How about, have you solved these common misunderstandings? I hope you can see it here, and finally, the skill points of the two engineers are attached. I hope that students who are interested in becoming these two engineers will work harder.
Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.