Name of the author.
Note: There are three considerations: 1. this is an implementation project, that is to say, the algorithm, architecture, everything is in the experiment, it needs to be verified, it takes more time, so it is impossible to invest in hardware, it must be some low-cost experimental products. 2. Use Linux and the algorithms that work with you to fully control the hardware, so that the hardware that can basically run the lowest version of Linux can survive. In terms of functionality, SPIDER does not require demanding hardware for computing, while crawling requires a lot of CPU and storage, which takes a long wait time and takes a long capture time, therefore, in terms of Spider, the number is better than the quality and speed, which is correct. In addition, the power consumption is low. 3. Financial resources are insufficient to put the funds first in the hardware sector. In addition, because of the large number of resources, Angel funds cannot be relied on to invest in hardware. In addition, the experiment will take a long time, and it will take a lot of development costs to hire people. At that time, it was just a thought. There were too many problems and the results were unknown. 4. Load and balance problems. Multiple machines and distribution mean that the running status of a single unit does not affect the overall progress. In that age, such a computer was actually quite good. At that time, hardware was very expensive in China. The price issue is also an inevitable deciding factor for self-assembly. 5. processing logic requires too many steps of computing. Naturally, it is impossible to use a few machines with good performance, because some machines have been capturing, some machines have been extracting, and some machines have been indexing, some machines have been calculating relevance. This is a logical parallelism, And the bottleneck is the number of single CPUs. In addition, even if the processing capability of a machine is strong, in the face of such special processing methods, the use of a small number of high-performance machine solutions is not good, multithreading and work-time slice switching is not feasible. We know that, under heavy computing, the entire machine cannot be switched frequently. Even if it can, the effect will not work. 6. Conclusion: a large number of distributed computers do not require high performance. Therefore, such features determine the implementation and feasibility of such a solution, and are extended to date. =, When Sergey Brin and Larry Page created Google in the backyard garage seven years ago, they certainly did not expect that they were creating another myth in the IT world.
Let's take a look at what Google did when it first started:
Click to view the chart
This is Google's back-end server, 300 MHz Pentium II, M memory, GB hard drive. Oh, it's worse than even a bad host.
IBM donated F50 IBM rs6000, 4 Processors, M memory, GB hard disk.
Click to view the chart
39 GB on the left, 64 GB hard drive on the right, connected to Sun ultra II
Click to view the chart
It is also a GB memory donated by IBM
Click to view the chart
The dual MHz processor, the M memory of Sun ultra II, backrub (Google's name at the time) is here to extend its reach to the world
Click to view the chart
Self-made SCSI disk array, 100 GB
Click to view the chart
Looking at this mess, I can't imagine how much information is transmitted in a line...
Click to view the chart
This is
GoogleThe birth of giants.