Operation and maintenance Team (OPS) and technical team effective communication and cooperation
Source: Internet
Author: User
I. Technical team segmentation and coordination issuesIn the IT enterprise, the product from the idea to delivery to the user, from the overall view of the technical department is responsible, but if deep into the technical department, will be found by different technical teams responsible for different parts or stages. Generally will be divided into product team, development team, testing team and operations team, in the Internet company, operations and maintenance team generally also divided into the basic operations and operation of the product two teams, basic operations responsible for infrastructure (including rack, network, hardware) and operating system installation, for the entire company's products to provide infrastructure operations services. The product operation and maintenance is responsible for the problem of the online product processing, code deployment and interface with the development. Different technical teams generally belong to different departments, dispersed in the company's different office areas, the team internal communication is relatively more, but less communication between the team. Different teams will form their own work habits, rhythm, have their own concerns, generally just know the interface with the overall role of the team, but do not know the other side may face difficulties and work in the challenge point. In addition, if the company is large enough, each team will be divided into smaller small teams, such as basic operations generally have a system team, network team and IDC team, which makes communication between the team more difficult. From product planning to on-line, generally in the following order through the team: the development team to collect product requirements, set a timetable and after development and development, the test or quality team to test and then to the operations team deployment of new products or a new version of the operations team will be found during the operation of the code defects feedback to the development team to repair In each of the above stages, the corresponding team is to do each other, usually in the end will be the ball to the next team, if the next team found the problem will kick back the ball to the original team. If you go deep into different teams, or hear different complaints.
The basic OPS team often complains about:product Development A little plan is not, suddenly to go online machine, let us unprepared. every product is anxious to go online, who urged the urgent on who, who can say, in the end that important. The system will be re-installed, broken a disk on the rush to repair, just back from the computer room, but also the past. on-line too sudden, no switches, no racks, and the need to move other machines to make room. that place has organic shelves and switch ports, but there are no four-tier devices, they have to be on the four floor behind, there really is no way. just with their on-line to a computer room, they said to change to another room, do the toss. how can they use the device so that the bandwidth of the connected ports is full. The product operations team will say:really no way, the last line is not to say no rack, there is no switch, there is no four layer of equipment. never tell us when the equipment can be delivered to us on-line, do not send someone to urge this matter, a little spectrum is not. originally did not think how to use these equipment, first one months in advance to apply for on-line, we have to think clearly, they say and have to change room. How the network always goes wrong, how they plan. development of the code is too unreliable, a launch on the user complaints, can only roll back to the old version. the technical ability of the developer is not enough to write a version that can be used. development requires a test environment that is the same as the production environment, which is not possible. and the development team said:They don't let us touch the system on the line, what the production environment is, we don't know, we can't develop the code. we worked hard to develop a few months, on-line problems and directly rolled back, the mood is very difficult to suffer. code in the test environment or my machine run good ah, how the line on the issue. test how to measure, so many problems can not be found. we want our product OPS colleagues to help with a test environment that is exactly the same as the one on the line. In addition, the Test team may say:developers do not write code that specifies unit tests. think of an automated integration test environment, because of the development of the reasons, always can not be achieved. test environment is not the same as the production environment, a lot of problems only foundThere are so many bug not solved, the product is urging the line. second, the technical team with bad influenceThe conflicts and complaints among the teams seen above are different, and the impact is similar: the progress of the product is delayed, and the whole team is difficult to deliver the new version normally. Product on-line after a lot of problems, affecting the user's access. The morale of the team is poor. Recently, there has been a poor coordination between the OPS team and the development team, and the reasons are as follows: The new product has been delayed for two weeks, and normally it will be online in one day. The reason is that development is not considered, the test environment is not found, until the launch before the deployment to more than one machine, according to the way the original plan to develop multiple machines can not collaborate to complete the task. There is also in the design phase did not consider the situation of the production environment, in the process of online need to make corresponding code adjustment. After-line quality is unstable, there are many emergency repair. The reason is ditto. temporarily increase hardware input. One component in the new product is a new technology solution that is incompatible with the original lamp system, so new machines need to be deployed separately. In addition to low service availability standards, and has left problems. Because there is a temporary need to increase the hardware, and just one more, so that the formation of a single point, if the machine fails, the service will be all interrupted. Also, due to poor design considerations, the integration of other components results in a single point. So this reduces the availability of the service and has to be solved later. In addition, the components of the new software, installation, service start-stop and software configuration management are purely handmade, and later have to find time to be included in the automatic configuration management. Affect the morale of the team. The development, testing and operation of the online process are uncomfortable and have complained to each other. If not handled well, it will affect the future mates. While there are some issues that really need to be solved by some teams to improve their personnel skills, these teams can make a concerted effort, and the same combination of people will certainly produce better results.
iii. ways to solve the problem of team coordination in the pastThe first time we met the team, we didn't have time to solve the problem, the company's strategic adjustment, the entire development and system operations team transferred to another large department. But when we re-combed the technical team elsewhere, it didn't happen again, and in retrospect, our approach was: Department
The developer has the account of the server in the production environment, can observe the operation of the code, a few core developers also have sudo permissions, of course, they will not casually modify the server settingsat the beginning of development with the system operations team communication, in the code to increase the data collection interface and monitoring interface, so on-line, it is easy to collect product performance data, and can easily monitor the operation status and alarmThere is also a sandbox and beta environment in the production environment, so that a large version can be adapted to a certain period of time in a sandbox environment before transitioning from test to production, allowing a relatively smooth transition to the production environmentsome developers temporarily transferred to the system operations team to work in one to two quarters, with the system operations colleagues on-line products to solve the problem of the operation of the product, so as to better understand how the code in the production environment to run, go back to better operations colleagues communicate, developed code easier to run in production environment In this way, although there is a clear division of responsibilities between different teams, there is a lot of flexibility in the middle part of the coordination. In addition, the development, operations and testing teams in the core of the team has a sense of identity, everyone started with the goal is to run the company can succeed, this is not the root cause of the problem of cooperation. This is actually similar to the core point of DevOps, so why not revisit DevOps and refer to solving the problem of teamwork.
Iv. DevOpsDevOps was a concept that came from Europe in the 2010 and was first introduced by a group of engineers with interdisciplinary skills in order to solve the following problems: the introduction of new features and the long cycle of solving old problems is fraught with risks, the code can run stably in the production environment, no one has confidence, Can only be difficult to push up, and then see if there are problems different teams isolate each other, with poor. If the developer receives a question, the first response is "work well on my machine."
I think the core of DevOps is that whether you are a developer, a tester, a manager, a DBA, a network engineer or a system administrator, we all work together to provide a stable and high-quality software service to our customers, and to achieve the business interests of the company, including their own job opportunities. So, DevOps is actually a bridge between teams, so that they not only rely on online applications such as Hongyanchuanshu tools to communicate, and often leave their own island, go to other people's islands, understand others, and provide their own ideas, to help each other. DevOps is more like a sport, and each company needs to learn from its own characteristics to promote collaboration and collaboration between teams. There is a need to work on three fronts: people are trained on the one hand, they are encouraged to understand the work of other teams, the challenges they face, etc., so that they can use their own expertise to examine and help other teams, on the other hand to recruit some comprehensive technical personnel, in different teams to build some suitable bridge. Process in the early stages of research and development, let the system operations colleagues to participate in, build a test environment, to verify the idea, or can be directly in some project teams with the system, development and testing and product personnel, together for the product on-line efforts. When problems arise, together we think of ways to find the real root cause of the problem, avoid mutual dodge, the solution is implemented in the future development process. Collaboration factors also need to be considered in the performance appraisal process. The tool actually says that there is a lot more to be said about DevOps in terms of tools, something similar to agile.
rapid System deployment and automation of product code releasesTools are especially important. To avoid bending over the bend and towards the other extreme, you need to avoid the following
DCommon misconceptions of Evops:DevOps means giving the developer root privilegesYou can add sudo permissions to the developer to run the specified commands, such as restarting the Web service. Let developers know more about the production environment and the health of the product, but it does not mean that developers like administrators to manage the machine. All system administrators need to write code, all developers need to put on the shelves of the machine in the system management and developers in all areas still need their own experts, such as storage, networking, installation, JavaScript and other specialized personnel, DevOps does not mean that people do not do their own expertise. You have to use a tool, or it's not devops. Some technical and automated tools are helpful for driving collaboration across teams, but it is still important to focus on the issues to be addressed, and choose the right tools for the problem and organization. We need to recruit DevOps DevOps is not a new post
v. Combining devops to solve teamwork problemsThe management is concerned about the communication mechanism and atmosphere between the teams: the new version can be a reliable and stable operation in the production environment as the goal to form a collaborative atmosphere. In the early stages of the project, between projects, operations, development and testing to communicate, if possible, sit together, face to side communication. Before the project goes live,
In addition to testing capabilities, you should focus on deployment, backup, monitoring, security, and configuration management, and the more problems you will find early on, the more you can do less late-stage problems and avoid impacting the user experience. Establish regular communication mechanism for the core members of each team. Collaboration between teams is included in the performance appraisal process.
let developers understand operations, concerns, and challenges, and help with operations from a development perspective:The developer participates in in-house training for the OPS team and understands the online system. Understand how operations can locate and resolve failures, how to monitor the operation of the system, and so on. A small number of developers can work with OPS as a release into a production environment, allowing developers to focus on and understand how their code works. Modify the code from the perspective of operations to facilitate the daily change and adjustment, monitoring and alarm for the operation and maintenance personnel. Help Ops to modify the puppet configuration template. help operators to write and modify product release scripts to improve the level of automation. Let operations staff understand the focus and challenges of the development process and improve the development process from an operational perspective:
operations for the development of a virtual machine-based test environment in the company, the installation of virtual machines, configuration management and code publishing in the same manner as the production environment. Open
The sender and the tester release the version to the test environment like OPS. Drum
Developers and testers modify puppet configuration and templates to manage their own virtual machines. in a production environment, the beta environment is built, and developers can send the version directly, allowing the code to buffer more layers before the final launch. operation and maintenance to understand the module structure of code, from the angle of operation to modify the code, so that the product on-line more convenient operation and maintenance and adapt to the characteristics of the production environment. operations are involved in continuous integration testing, with their own automated knowledge to help automate integration testing.
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.