From: http://news.cnblogs.com/n/74036/
Based on what is system administration like at Google of Thomas A. limoncelli, this article adds some of my views.
How does Google's System Administrator work?
As Google services have been clustered, System Engineers are not exposed to hardware, such as installing servers. In addition, most of the work has been automated, such as setting up LDAP and Server Load balancer. In contrast, at present, most Internet companies in China still need to do a lot of repetitive underlying work. For example, if a business database is too large and needs to be split, from the perspective of the system administrator, we need to do the following:
- Communicate with technical personnel about current business characteristics, develop and evaluate the Splitting SchemeProgramRisks
-
- Build a test environment and test program compatibility by technicians
-
- Formulate implementation plans to ensure smooth business transition without downtime
- Online late at night
-
- Observe the 1-2 days of operation
We need to think about whether the above work is valuable for system administrators and technicians. As Cassandra solves the problem of automatic expansion of distributed storage, it is a development direction in the industry, although Cassandra's stability still needs to be improved ).
How do Google's system engineers work?
They will usually be on duty for one week to respond to various problems, such as completing the expansion business in the above scenarios. Then, about five weeks later, I left the front-line job to think about how to automate and improve the jobs I encountered in the previous week, and run scripts and monitoring programs to solve the problems I encountered repeatedly, or further feedback to the technical staff to improve the application for automation. Is only about a ratio, and the time period can be flexibly arranged. For example, you can schedule the change by day and make improvements by day on duty/7 days. After the improvement is completed, the automation program will complete most of the work in the same scenario next time. In other companies, SA is usually busy working on the first-line machinery to repeat the above work, but at Google, it has reserved a considerable amount of time for System Engineers to think about improvements.
This is why Google's System Administrator claimed to be SRE (site reliabilityengineers. Sre constantly optimizes the system in charge. Some people focus on the O & M level, and others may focus on automation tools. All SAS must have certain program or script development capabilities.
Therefore, when we encounter Google's data scale, automation is not necessary, but how to achieve it better.
Other exciting jobs at Google include
- Collaboration with developers.
-
- You only need to care about the technology, and there is also a career improvement channel in the technical field, so you do not have to switch to a technical management position or other.
-
- My colleagues are very smart and often feel that they are the least.
- Many challenges, conservatively estimated 2-10 years ahead of the industry, working here is like giving you a magic crystal ball, you can predict the future of this industry through your work.
Inspired by the Google method, some automation directions that can be studied below
1. Program deployment
C/C ++/Java/PHP/Python/Ruby/C # and other languages, how to release automatically without stopping services, and how to solve module dependencies concisely, for example, if you need to update 10 mutually dependent modules at the same time within one day, and you cannot stop Service Web Container virtualization, multiple services can be deployed on the same web container, and services are isolated from each other without affecting each other.
Automate the O & M of newly developed service programs. Generally, 10 is a watershed in terms of the number of service programs, and it is not a problem to manage less than 10 services through manual repetitive operations, however, more than 10 servers require automated management. Many excellent open-source programs (such as Tokyo cabinet and redis) have excellent performance on a single machine, but they cannot be deployed on a large scale. Many technical staff in large companies often mention that many open-source software is not suitable for them for this reason.
2. resource deployment
- MySQL
-
- Distributed File Storage
-
- Cache, taking Automatic Cache Management as an Example
- Port resource management. Different services use different ports, and different data in the same application use different ports.
-
- Because you can refer to previous cache-related blog posts.
-
- Capacity management. Different data requires different capacities.
- Dynamic resizing, application business growth, such as expansion from 10 Gb to 100 GB
-
- Proxy function, such as virtual port ing, the program accesses a fixed virtual port, so you do not need to restart the service
-
- It can be expanded at any time, and the application does not need consistent hash. Proxy helps you do it.
3. system deployment
- OS
-
- Reverse proxy and Server Load balancer
-
- Local partition capacity, batch management
- Release and stop a program. For example, you can deploy a program to 100 servers at a click.
-
- Virtualization, easier to deploy than physical servers, higher resource utilization, and more controllable deployment
Most Chinese Internet companies still have primitive basic technologies, which are also related to the industry's excessive emphasis on "operating good products". Basic R & D is usually not valued, it is only possible to work hard in areas with low thresholds. The technical difference with Google is more than 10 years.