Google's success in the search field is undoubtedly inseparable from its Advanced Search Algorithm However, there are some little-known secrets behind it: Google's experience in running data centers is more valuable than its search algorithms.
Google does not spend tens of millions or even hundreds of millions of dollars to buy expensive devices, but only millions of dollars in cheap computers to build its infrastructure. By studying hardware costs, Google's technical staff found that the cost of purchasing some high-end servers is much higher than that of dozens of simpler "popular" servers.
However, after purchasing cheap servers, the difficulty lies in how to coordinate the operation of these hardware to ensure that a computer failure does not affect the entire system and can complete tasks such as returning search results or displaying advertisements.
For this reason, Google considers using a home PC. After all, a home PC crashes every three years due to software and hardware defects. If Google has thousands of PCs, it is normal that one PC crashes every day. Therefore, it is best to automatically solve this problem, otherwise the service will certainly have problems.
To this end, Google, which attracts the world's best computer minds, has developed a large number of software tools to install computing devices.
Google has its own file system called "Google File System", which is specially optimized for processing large data and can process 64 mb data blocks. More importantly, it can cope with possible disk or network faults at any time. Google's data is replicated in three copies and stored in different places. This ensures that nothing is lost. With these measures to cope with faults, PC can fully shoulder the heavy responsibilities of internet search services.
Google's thousands of PC servers run a simplified version of Linux based on Red Hat. The system kernel has been modified for Google's special applications.
Google also designed a system that can process large amounts of data and quickly respond to queries. Google divides the entire web into millions of fragments. in Google's technical terms, these fragments are called shard and can be copied when a system error occurs.
Google created a vocabulary index that appears on the web, and its file server stores the current Google Page.
Another important technological innovation in Google's data center management is to compile software systems that can run smoothly on thousands of servers. Under normal circumstances, developing software systems that run concurrently on multiple servers requires specialized programming tools and ingenuity.
Google's programming tool is called mapreduce. When a system error occurs, it can automatically restore the entireProgramAnd this is crucial for cost reduction. Since last year, Google has begun to use mapreduce programming tools on a large scale.
In addition, Google has developed the batch task scheduling software global work queue, which can schedule millions of operations. The software system can divide tasks into many smaller computing operations and assign them to each computer for completion.
To solve the urgent and catastrophic problem, Google has also prepared six rescue trains to respond to emergencies in Google's data center. In addition, power costs are another important factor in Google's data center design. Because more low-cost computing devices are purchased, the overall power consumption will increase. Controlling Power expenditures is also a major concern for Google to design data centers.
Source: reposted enet power in Silicon Valley