Many R&D organizations in China still manage R&D teams through separation of duties. In this case, problems such as inefficient cooperation between teams and mutual blame are caused. In this article, we will discuss with readers how DevOps and separation of duties should be combined to better improve the efficiency of the R&D team.
introduction
How to perfectly integrate
DevOps and separation of duties?
DevOps and Segregation of Duties (SoD: Segregation of Duties) are usually not comparable. DevOps is designed to remove obstacles and minimize handovers, while separation of duties is to minimize risk by increasing isolation.
When working in highly regulated industries (such as finance, healthcare), it can be challenging to operate a team using DevOps. This is because the regulator expects that only requested, approved, and fully tested changes will enter the production environment. In these cases, the main control method is actually separation of duties.
Segregation of Duties
What is separation of duties?
Separation of duties (SoD) is the concept of more than one person completing a task. In business, separation by multiple people in one task is an **internal control method** designed to **prevent fraud and errors**.
-- Wikipedia
In the field of software engineering, this basically means that the person or team developing the code cannot approve others or deploy the code themselves. This is to prevent accidental or malicious release of unauthorized code into the production environment.
In contrast,
DevOps combines the two separate functions of development and operation and maintenance. A team can develop and test code at the same time, and also supports deploying code. Isolation means that people or things are stripped from the subject. This is still one of the common methods of controlling the production environment.
What are the disadvantages of implementing separation of duties in DevOps teams?
SoD will slow down the team by adding unnecessary switches and may introduce errors. Every time a handover occurs, information transfer is required, which not only reduces the speed, but may also lead to incorrect information transfer. Handover will not only affect the deployment of changes, but also affect the handling of emergency events in the production environment. In this case, the person responsible for handling the matter is better to respond to the incident better than the person (or team) responsible for the change.
Separation of duties indicates a lack of trust in the team, which encourages the cultivation of a "fear" culture. The
DevOps team needs to be autonomous in order to obtain all the value and speed advantages advertised by this philosophy. In order to achieve this autonomy, the team needs to always be trusted that they are doing the right thing. When they do something wrong, they will still be asked to take responsibility and correct the error. Remember, the greater the ability, the greater the responsibility.
SoD cannot solve problems related to joint decision-making. This involves someone (or a team) deliberately pushing unauthorized changes to production, whether they are trying to disrupt the process out of malicious or emergency situations. For example, they would say: "We don't have time for a full regression test because we need to release it tomorrow. Can you sign it right away?". No matter how many control measures you have, you can't get rid of this problem.
What if you have no choice but to implement SoD?
As mentioned at the beginning of the article, there are some highly regulated industries that do not allow DevOps teams to work completely autonomously. If you find yourself in this situation, here are some things your team should pay attention to.
Minimize the number of transfers
One of the main principles of
DevOps is to improve your workflow so that the efficiency of performing tasks is the highest. In DevOps, the focus is usually SDLC (that is, CI/CD), but in fact, minimizing switching can be applied to most processes in the team. In the beginning, you need to first evaluate the existing process by confirming the correlation between the workflows; then, you can simplify the process to obtain the minimum number of steps required to achieve the goal. After determining the new workflow, you can add automation. But remember, even if automation is important and usually a good idea, it doesn’t make sense to automate a bad or redundant process. So first confirm the workflow, optimize it, and then automate.
Translator's Note: SDLC (systems development life cycle) is the system development life cycle, also known as the software life cycle. It is used to describe the entire process of an information system from planning, creation, testing to final deployment.
automation
Automate your build, test and deployment. By automating delivery pipelines, you can minimize the risk of human error.
Continuous integration: is a development practice that requires developers to integrate code into a shared code repository multiple times a day. Then verify each check-in through an automated build and unit testing process, which allows the team to discover problems early.
Continuous delivery: It is a natural extension of continuous integration: the team ensures that every change to the system is publishable, and we can publish any version with the click of a button. Continuous delivery is designed to make publishing a boring thing, we can often deliver quickly and get user feedback quickly.
Continuous deployment: A certain version of the code can be automatically deployed immediately after the above link is ready, further expanding continuous delivery.
Even if the deployment of the final production environment needs to be completed by another person or team, you can achieve continuous delivery, at least to allow the R&D organization to reach the production environment ready state in the fastest time.
Remove external dependencies
Eliminate reliance on external teams and try to retain all approval rights in the team. Having external dependencies, such as the approval of specific stakeholders, or having another separate team perform the deployment will slow down the process. This is not to say that these operations should be skipped or ignored, but should be able to be completed by members of this team. Going back to the DevOps principle again, the team should include full-stack or T-type developers, who can appear where they need to perform various tasks of the team.
The main reason for eliminating these external relationships is that it is usually easier to communicate with people on the team than people outside the team. In addition, the people on the team know what happened and what needs to be done without contextual explanation. Both of these benefits will increase the speed of the team.
If you can’t delete the dependency, check to see if you can consider adding relevant people to your team. For example, if you need the company’s business stakeholders to approve your release before deployment, see if you can let your product owner (ie, PO) perform this function on behalf of the business party. In this case, SoD still exists (because PO usually does not directly participate in feature development) and the team no longer depends on people outside the team.
Implementing safety nets is more important than control
Maintain a high level of risk avoidance: One of the best ways to minimize handovers and achieve rapid delivery is to implement a safety net, not control. What does it mean?
In the circus, when trapeze artists perform their acrobatic performances, they are usually protected by safety nets. Another method is to attach them to seat belts, but this obviously restricts their movement. On the other hand, safety nets allow them to move as smoothly and efficiently as possible, and ensure that if problems occur and they happen to fall, they can safely fall into the net and avoid serious injuries.
Don't get me wrong, you definitely need to control in time and place. For example, routine inspections are performed before the flight takes off, and mandatory inspections are required before each take-off, because in this case we cannot afford any major accidents, and the result of the failure will be devastating. Not only because of high material costs (planes are not cheap), but more importantly, life is priceless.
For people who build software, in most cases, if the software fails, people will not lose their lives. Therefore, failure is allowed in these situations, as long as you can "safely" fail, recover quickly and learn from the failure so that it will not be repeated.
In terms of software development, the safety net includes quality system monitoring of the production environment. This allows the team to understand the current state of the system and deal with emergencies or avoid accidents before users respond. In the event of a major accident, the service can be restored by rapid rollback. For example, blue/green deployment is another applicable safety net.
Blue/green deployment
This technology is a well-known (but underutilized) cloud model that is used to minimize downtime for releases and provide a quick rollback method (ie safety net) when problems occur.
Martin Fowler (one of the original co-creators of the Agile Manifesto) explained the following:
One of the challenges of automated deployment is switching itself: bringing software from the final stage of testing to the production environment. You usually need to do this quickly to minimize downtime. The blue/green deployment accomplishes this by ensuring that you have two production environments that are as identical as possible. At any time, one of them, for example blue, refers to the production environment. When preparing a new version of the software, you will go through the final testing phase in a green environment. Once the software is working in a green environment, you switch routes so that all incoming requests enter the green environment while the blue environment is idle.
The blue/green deployment also provides you with a quick rollback method. If there are any problems, you can switch the route back to the blue environment.
The point here is not to wait for the software to be considered perfect before releasing it. With the kill-switch mechanism, we allow all versions to go into production, because we can quickly restore services when needed. So we emphasize rapid recovery, rather than trying to avoid (inevitably) failure.
in conclusion
In terms of software delivery control, separation of duties should be the last resort. Unless failure may result in loss of life, or mandatory requirements by regulators, if you want to get the maximum benefit of DevOps principles and practices, you should avoid separation of duties. That being said, if you are forced to use SoD in a team, then implementing the following technologies will allow you to successfully implement
DevOps, even if not perfectly.
* Minimize the number of handovers: optimize your workflow by minimizing the number of steps and required personnel.
* Automation: to automate everything, but after you have confirmed the content of automation. Remember, continuous integration, continuous delivery and continuous deployment, these are your mentors and friends.
* Remove external dependencies: It is easier to work with people on the team, so please remove or minimize external dependencies.
* Safety net control: failure recovery, rather than trying to avoid the failure completely.