About the hardware and software who is leading this topic, apply a proverb is 30 years Hedong 30 years of Hexi, the wind turbine flow. Software and hardware must be mutually reinforcing, mutually reinforcing and mutually reinforcing. Some of the top-tier structures that were previously criticized may be found to be the most appropriate option in a few years ' time, and will become inappropriate in a few years. The definition of software is also the definition of hardware. The result of the hardware definition is strong but not flexible, the software definition will begin to brew a comeback, but everything has inertia, the software "excessive" definition, you will find a lot of things can not, but also rely on hardware to accelerate, this time began to enter the hardware definition cycle, and then cycle. We can use a few examples to explore this rule.
CPU and OS
A pair of couples do not abandon, Yin hold yang, Yang Yin. At first there is no so-called interruption, there is no so-called OS, only the order to execute the command computer and written dead program, very inflexible. And then there's the os,cpu. First execute the OS this large loop program, and then load the required execution of the user program, execution exit, you can continue to load other program execution. Even if the simplest OS wants to play, the CPU at least has to provide IO and clock interrupt mechanisms. Os born, you have to constantly grow up, constantly evolving, a single task is not flexible, multitasking, all tasks shared memory space, resulting in security problems, this has to introduce virtual memory technology, so the software is increasingly complex, the performance gradually die. At this point the CPU came out to speak, I'll handle the virtual memory, provide the ultimate page table, provide a dedicated control register, and provide a dedicated look-up table acceleration hardware parts. The productivity of multitasking OS is initially released, but performance is still poor and CPU-dependent. The CPU continues to exert force, the introduction of Hyper-Threading technology, so that multiple threads of code can be executed concurrently, thanks to the pipeline design, in order to be able to better implement the thread concurrent execution, and later continue to appear multi-core and multi-CPU SMP technology, the OS had to make changes. But the CPU core is not at any time very efficient concurrent multithreading, as the software complexity, thread synchronization, cache consistency and other problems led to the need for a large number of state and data synchronization, traditional shared front-end bus efficiency is too low, so have to switch to fabric such as INTELQPI, Access memory through too much jump device efficiency does not go, so also changed to direct-connected CPU distributed sharing architecture, which is today's form. Then how to develop, you should be able to follow the inertia of the derivation, the advent of the exchange of fabrics, means that the CPU and the CPU can be farther away, as long as there is enough high-speed link connection, this form is actually a large NUMA computer form. This pattern of samsara means changes in the architecture of the software, where high-performance scenarios have to be used for mainframes and minicomputer, but they are extremely expensive--simply because they are not open, and it is not possible to put resources on a distributed system to customize their applications like the Internet. After the open large Numa system appears, perhaps the previously "overly" defined distributed system ecology will be hushed into a new cycle era in which once-polished distributed systems may be considered an incredible "wild path" by the generation of Engineers/architects: "You see, Before this structure, good pit Dad! " This is like we now look back to some of the design before, will feel incredible, then people are so "brain residual" mody? Well, if you go back to that time, maybe more brain residue:. No matter who is brain-dead, a fact is always the same, that is, the absolute value of hardware performance has been straight up, notPipe distributed or centralized.
CPU and VMM
VMM can develop to this point is no one expected, the beginning is to play, did not expect to play a big out. There are many people who hold this view, but this view is only superficial. The virtual machine technology originated from the mainframe, small and medium-sized computers have been used for many years, so VMM is not playing. Large machine small machine is closed market, technology is indeed cattle. Many technologies in the open market field are actually from mainframe computers. Virtual machine is obviously a single machine performance surplus, and the overall resources of multiple machines can not get the global fine-grained pool distribution of the product of the era. VMM virtual CPU, virtual IO device, virtual memory, first full software implementation, each instruction interpretation execution, later optimized design, but ultimately to monitor and intercept + virtual those sensitive and privileged instructions, each process also virtual extra page table to virtual memory, IO needs to go through a lot of memory copies to send out a packet, if you want to business, the software is not all aspects of the cost is not, at this time also hardware, in the CPU level to provide hardware assistance, IO Equipment also began to have a Sriov/mriov program, I always feel that this hardware is a bit "excessive" definition, Cheated by the software once. Why not? Because the hardware resources can not do pool and fine-grained segmentation, will create VMM this embarrassing thing, and at this time the hardware seems to have gone insane, a series of complex technology to support the VMM. In fact, the hardware has another way to go, but also to achieve the VMM similar effect, that is, to make the hardware can be split, rather than using software to Shard. This road has been tried on the small machine system, using the bus-level isolation switch to slice different CPU and memory and IO slots. The prerequisite for fine-grained segmentation is to reduce the granularity of hardware segmentation and to increase performance by single CPU is not a more sensible route. In recent years, the nuclear CPU has been popping up, single CPU128 core is no surprise, but because the ecology is not mature, they are still confined to the high degree of parallelism and low coupling processing scenarios such as network packet processing. Another sign is the rise of arm's ecology, and there are signs that it is likely to be a bright road. But it's not so easy to put the traditional ecology on the road. We see Intel is working on the SIPH Silicon Light program, its commitment to flexible hardware resources, if the granularity is thin enough, VMM can actually exit the stage, this will be a hardware-defeating software of the bloody battle.
Virtual Machines and cloud computing
The development of virtual machines accelerates hardware acceleration, and it is precisely because of the hard acceleration and the virtual machine can be widely used, it is so that the concept of cloud computing is brought out, that is, the hardware in turn accelerates the software changes. And as the volume of the rise, will affect the qualitative change, people will find that the VM this thing is very inefficient virtualization, VMM personal understanding is actually an evil yang, he seems to be very much of the loss of Yin real, embodied as too many unnecessary operating system examples. The operating system uses threads/processes to virtualize multitasking multiuser operations, the cost of each system call is very high, so that a CPU running multiple operating system instances, is undoubtedly a great waste, mentioned above this model is a stand-alone performance and overall resources can not get the product of the era of the pool. And the emergence of cloud computing architecture will break this paradox. Cloud computing may be the birth of a global virtual machine resource Scheduling management software framework, but after all, a thing is constantly evolving, cloud computing will eventually find its mission, that is a large range of global resource pooling, distribution scheduling management monitoring, that is, data center-level OS, do things like a stand-alone OS. In this case, then AAAS (Applicationas a Service) should be the final state of cloud computing, which is equivalent to open the screen, there will be a pile of application icon, point in to complete the function you want, exit, end. Since users do not need IaaS, do not need to face the operating system directly, so many VM instances are not necessary, consumed resources. Cloud computing needs to implement a global application process-level dispatch hub instead of scheduling VMs. Think again why the big machine needs VMS? Because the big machine that era does not now this concept of cloud computing, XaaS This thinking, you can say that at that time the brain remnant, when the software technology is very closed and underdeveloped, so the resources of fine-grained segmentation, with a VM is also a Gordian knot of the solution. We also see that process-level virtual machines (such as the Linuxcontainer) industry are getting more and more concerned. These are the software frameworks for cloud computing, the definition of this macro OS, and what does this definition have to do with hardware? I think that's going to give birth to two changes in the form of hardware, one is that the single point of performance is low enough, the intensity is thin enough, single point of performance "low enough", which may be surprising, But in the future can be really ah, the nuclear CPU is a very good embryo; the other is the local multi-layer high-speed fabric communication, because the CPU core can be arbitrary segmentation and combination, they must be a high-speed bus to connect with each other, there are a variety of fabric solutions and products, Although this is relatively low-key unpopular but also mature, coupled with silicon technology will be the fabric stealth to the rack outside, which provides support for a large range of pool. This change in hardware is likely to affect the architecture of the software, making it large andLine computing no longer requires remote messaging mechanisms such as MPI, and messaging directly uses fabric hardware-accelerated queue FIFO, which simplifies programming and can ultimately be popularized for HPC models.
Cloud computing, macro operating systems, data center-level NUMA machines, everything is possible.
Original link: http://labs.chinamobile.com/news/104494_p2