Many people should have read James Whittaker's blog or new book "how Google test software". Here I don't want to repeat his content, it is to analyze and compare how Google guarantees its product quality from another perspective.
I have not worked in Google, so I have no first-hand experience. I am only a bystander to analyze Google's quality control practices. The main information comes from Google's test blog, chats with friends who work on Google in Seattle and projects, and James's new book "how Google test software". However, onlookers have the advantage of watching and can see the entire forest. Compared with many engineers who work in large companies, they often focus on a product or a team and only see one tree J. In any case, personal opinions are for reference only.
We mentioned earlier in Microsoft's quality control practices that most Microsoft products are mainly table-based products, such as Windows, office, and SQL Server. The most specific desktop product is the high cost of product recalls or hot fixes, and many key services are running, this forces Microsoft to invest a lot of manpower and material resources before the product release to fully test the product to ensure the high quality of the product. Unlike Microsoft, Google uses different policies to ensure software quality. Before analyzing Google's quality policy, we must understand the root cause of Google's policy:
1. Google quality culture:Google originated from campus. With limited funds, the founder had to use cheap machines and put multiple cheap machines together to improve processing capabilities. The biggest problem with these cheap machines is frequent crashes or decommission. Therefore, Google must have a strong fault tolerance capability in the initial stage. In other words, the system can still provide services when some machines are crashed or scrapped.
In other words, the system may have errors, but the entire system cannot be down (Graceful Degradation ). Google, from the very beginning, has been forced to place high fault tolerance capabilities, but has achieved a huge advantage of running services in data centers. We know that the probability of hardware errors is generally about one in one thousandth. If there are 10 thousand machines, the probability of one error will reach. In the current data center, there are tens of thousands of machines and more than 100,000 machines. Therefore, the fault tolerance capability of products is not dispensable, but necessary. Therefore, Google believes that a single module can have bugs if it fails. It ensures the overall quality of the system through the powerful fault tolerance capability of the system.
2. Internet products:Google is a successful representative of Internet companies. The biggest feature of Internet products is "fast": Fast product definition, fast development, fast feedback, and fast death. To effectively use limited test resources, Google believes in another principle: build the right it before you build it right. that is to say, only after you confirm that the product is indeed the product you need (build the right it) Can you start to improve its quality (build it right ). The principle is simple. If an unknown product is correct, there is no need to waste resources to improve its quality. Therefore, most of Google's product testers are involved late, and developers do not have to perform tests on their own to ensure basic quality.
After understanding Goolge's understanding of product quality, it's hard to understand what testing strategies Google uses:
1. Dev owns quality
Google believes that: who writes the code, who is responsible for the development module, and who is responsible for the quality. Therefore, developers also need to spend a lot of time testing while writing code, mainly unit testing and Module Testing. Google firmly believes that the software quality was originally created, rather than tested by tests the day after tomorrow. It is not easy for developers to take responsibility for product quality. Google uses three major paths: 1. Reduce the number of testers, so developers have to perform tests; second, test certificate program is used to positively influence development and test. The most important third point is to establish a powerful and complete infrastructure so that developers can easily write and test data, automated Testing is easy to run.
2. Tester is to enable developer to test your tively
This is a great change in the responsibilities of traditional testers. Traditionally, testers are primarily responsible for finding bugs in products. Since Google requires developers to be responsible for quality, of course they do not need traditional testers. Therefore, Google's tests are more time-consuming in developing test automation, developing test tools, and developing infrastructure. It takes a little time to perform real tests. Later, I renamed the Test department from the original "Test Service" to "engineering productivity ". Testers are primarily responsible for making testing easier for developers.
However, in the last two years, as its products become more mature and more complex, Google began to strengthen its post-product testing. The main reason is that although developers can perform many unit and module tests to ensure the quality of the module, many bugs are discovered only when they are integrated with other modules. Therefore, Google divides test engineers into two types: one is responsible for development with developers, mainly for unit testing and testing tools. The other is user-oriented testing engineers who primarily perform user-oriented integration scenario testing.
3. Continuous Integration
This does not need to be discussed much. If continuous integration is not used, it is a little too much for a project team engaged in Internet or service-based products. Google's continuous integration is a leader in the industry, with powerful test automation and complete infrastructure to ensure that development and test engineers do not have to deploy or run them, it takes time to analyze the results. Instead, it focuses on automatic development and testing. After the code is submitted, thousands of test cases will run automatically and the results will be returned soon for further analysis.
On the other hand, Google continues to optimize existing tools and infrastructure to further improve the efficiency of continuous integration. For example, what test cases are the biggest headache in continuous integration? If you run too much, it will certainly prolong the running time, thus reducing the efficiency. If you run less, there is a risk of missed tests. Google has developed a test case analysis tool to analyze the dependencies between code and test cases. After modifying a line of code, the tool determines which test cases must be run, that is, not many. Microsoft has similar tools to help testers determine the priority of running test cases, but I personally feel that the effect is not very good. Therefore, I am also very interested in how the Google tool works and how it is applied.
In addition, continuous integration is based on automation. Test automation is not omnipotent, but it is absolutely impossible without test automation. Note that testing automation not only frees people, but also aims at regression, more importantly, developers are forced to take into account how to automatically test the module during design to greatly improve the module's testability (we know this is an important indicator to improve the software quality ). Of course, in addition to testing automation, Google has developed many tools and platforms to greatly improve testing efficiency.
4. Measure everything
Objectively speaking, I think there is nothing special about the above points, but the following will definitely benefit me: measure everything. From the underlying hardware drive, to the CPU, memory, disk IO of the operating system, to the call of each API, and finally to the user experience at the highest level, google monitors and measures all these activities. Then, we can mine and analyze the monitoring and measurement data, so that we can understand the operation of the entire system. On one side, if A bug exists, it can be found in the shortest time and quickly find the root cause of the bug based on the monitoring data; on the other hand, based on detailed monitoring data, it clearly shows where changes need to be made, especially in terms of system performance. On the other hand, it is to understand the user's usage and rules, this provides accurate data and prediction for product function improvement. Google thinks: If you can't measure your product/component, don't build it.
Summary
Google is a successful representative of Internet companies. Its quality control practices and experience on Internet products are of reference significance to Internet companies. In the trade-off between the product release speed and the product release quality, Google chooses the release speed. While ensuring the quality of basic products, we can push products to the market as quickly as possible, and then continue to evolve through a wide range of feedback channels and tools.
This not only controls the user but also guarantees the quality, but also achieves the products that do not have users: fail fast, fail cheap. In addition to Google, another company in Seattle is also a big contributor to Internet products, especially for online sales and cloud computing application services.