Origin:http://blog.sina.com.cn/s/blog_8701e6ba0102v333.html
The book says that arm is the Harvard structure, but I always feel like I can't see it. Later, I have my own opinion on the analysis of the ARM9 nucleus of s3c2440.
My conclusion is that "ARM9 is called the Harvard structure from which it has command cache and data cache."
Originally, I suspect that s3c2440 's ARM9 nucleus is not a Harvard structure, but a Neumann structure. My reasons are as follows. Harvard structure required to be able to access both instructions and data, so I think a Harvard structure requires that the processor's bus is complex and should include the program's address bus, the program's data bus, the data's address bus, and the data bus. However, observing the s3c2440 nuclear discovery, the program and data will eventually be loaded into RAM, so it has only two buses, FLASH, RAM, including peripherals, are common to these two lines.
However, when I see the cache in the CPU core is divided into instructions cache and data cache, I understand why ARM9 is a Harvard structure, actually now called "improved Harvard structure." It is important to note that the Harvard structure and the "improved Harvard structure" are very different, so it is inappropriate or even wrong to call ARM9 the Harvard structure. Well, I think if the two cache is banned, it cannot be called the Harvard structure, but the Neumann structure. For ARM9 is the "improved Harvard structure" of the simple explanation is ARM9 CPU direct access to the cache, and the cache is divided into instruction cache and data cache, the two cache is independent, so you can access both instructions and data, that is, can run in parallel.
But why design an "improved Harvard structure"? Because the original Neumann structure although the data throughput rate is low, but the bus structure is simple, so the cost is also low. Harvard structure because of the complex and powerful bus structure, so the data throughput rate is high, running faster, but the design implementation of complex, high cost. The "Improved Harvard structure" combines the strengths of both and integrates them to achieve optimization. Therefore, in the ARM9 can see the shadow of Neumann structure, also can see the shadow of Harvard structure.
Attached is a good explanatory document on the network as follows:
1. von Neumann structure
Von Neumann structure is also known as the Princeton Architecture (princetionarchitecture).
In the 1945, von Neumann first put forward the concept of "stored program" and the binary principle, and later, people use this concept and principle of the computer system designed as "von Neumann-type structure" computers. The processor in the von Neumann structure uses the same memory and is transmitted through the same bus.
The von Neumann structure processor has the following characteristics:
Must have a memory;
Must have a controller;
There must be an operator to perform arithmetic and logical operations;
There must be an input and output device for human-computer communication.
The main contribution of von Neumann is to propose and implement the concept of "stored program". Since both the instruction and the data are binary codes, and the address of the instruction and operand are closely related, it is natural to choose this structure. However, this kind of instruction and data share the structure of the same bus, making the transmission of information flow become the bottleneck of restricting the computer performance, which affects the speed of data processing.
In a typical case, it takes 3 steps to complete an instruction, namely: Fetch instruction, instruction decoding and execution instruction. From the timing relation of the instruction flow, we can also see the difference between the von Neumann structure and the Harvard structure treatment method. To give the simplest instructions for reading and writing memory, instructions 1 to 3 are both save and fetch instructions, for the von Neumann structure processor, because fetch instruction and access data from the same storage space access, through the same bus transmission, so they can not overlap execution, only one after completion of the next.
ARM7 series of CPUs have many sections, of which some CPUs do not have internal cache, such as ARM7TDMI, is pure von Neumann structure, others have internal cache and data and instructions of the cache separated CPU is used in the Harvard structure.
2. Harvard Structure
The Harvard structure is a memory structure that separates program instruction storage from data storage, as shown in Figure 1. The CPU first reads the program instruction contents in the program instruction memory, decodes the data address, reads the data in the corresponding data memory, and performs the next operation (usually execution). Program instruction storage and data storage separate, can make instruction and data have different data width, such as Microchip Company's PIC16 chip program instruction is 14-bit width, and data is 8-bit width.
Figure 1 Harvard architecture diagram
A Harvard-structured microprocessor usually has a higher execution efficiency. The program instruction and the data instruction are organized and stored separately, and the next instruction can be read in advance when executing. The
currently uses the Harvard structure of the central processing Unit and microcontrollers, in addition to the microchip company's PIC series of chips, as well as Motorola's MC68 series, Zilog company Z8 series, Atmel Company's AVR series and arm Company's ARM9, ARM10 and ARM11. The
Harvard structure is an independent architecture of program and data space designed to mitigate the bottleneck of the program running.
For example, in the most common convolution operations, one instruction takes two operands at the same time, in the pipeline processing, also has one to refer to the operation, if the program and the data through a bus accesses, takes the point and takes the number to have the conflict, but this to the large computation cycle execution efficiency is very unfavorable. The
Harvard structure can basically solve the problem of conflict between picking and counting.
and access to another operand, you can only use the enhanced Harvard structure (need to point out that the strengthening of the Harvard structure and the improved Harvard structure is not a concept) , for example, like TI, the data area is split again, and more than one group of buses. Or to AD, using command cache, the instruction area can store a part of the data.
In typical cases, it takes 3 steps to complete an instruction, namely: FETCH instructions, instruction decoding, and execution instructions. From the timing relation of the instruction flow, we can also see the difference between the von Neumann structure and the Harvard structure treatment method. To give the simplest instructions for reading and writing memory, instruction 1 to instruction 3 are both save and fetch instruction &NBSP, the von Neumann structure processor, because fetch instruction and access data from the same storage space access, through the same bus transmission, so they can not overlap execution, only one after completion of the next.
If the Harvard structure is used to handle the same 3 access number instruction, because the command and access data through different storage space and different bus, so that each instruction can overlap execution, so, also overcome the data flow transmission bottleneck, improve the operation speed.
3, the difference between the von Neumann system and the Harvard bus system
The difference between the two is whether the program space and the data space are one. Von Neumann structure data space and address space are not separated, Harvard structure data space and address space are separate.
Most of the early microprocessors used von Neumann architecture, typically represented by Intel's X86 microprocessor. The finger and fetch operands are all on the same bus, and are taken by time-sharing. The disadvantage is that when running at high speed, it can not reach the simultaneous command and the operation number, thus forming the bottleneck of the transmission process.
The application of Harvard bus technology is represented by DSP and arm. Using the Harvard bus Architecture, the internal program space and the data space are separate, which allows the simultaneous fetching and taking of operands, thus greatly improving the computing power.
The hardware structure of DSP chip has von Neumann structure and Harvard structure, the difference is that the address space and data space are separated or not. The general DSP is the use of improved Harvard structure, that is, separate data space and address space is not just one, but there are many, which according to different manufacturers of DSP chips differ. In terms of outward addressing, it is logically the same, because the reasons for external pins are generally achieved through the corresponding spatial selection. is essentially the same truth.
4. The structural differences between the improved Harvard structure and the Harvard architecture
Compared to the von Norman structure processor, the Harvard structure processor has two distinct features:
(1). The use of two separate memory modules to store instructions and data, each enclosure does not allow instruction and data coexistence;
(2). Use a separate two bus, respectively, as a dedicated communication path between the CPU and each storage, and there is no association between the two buses.
Later, the improved Harvard structure was proposed, and its structural features were as follows:
(1). The use of two separate memory modules to store instructions and data, each enclosure does not allow instruction and data coexistence;
(2). With a separate address bus and a separate data bus, the use of public address bus select、read access to two enclosures (program storage module and data storage module), public data bus is used to complete the program storage module or data storage module and the transfer between the CPU;
(3). Two bus is shared by program memory and data memory.
5. Summary
The
architecture has nothing to do with the standalone bus that is used, and is related to the separation of instruction space and data space. 51 SCM Although the data instruction storage area is separate, but the bus is time-sharing reuse, so it belongs to the improved Harvard structure. ARM9 is a Harvard structure, but previous versions (such as ARM7) are also von Neumann structures. Early X86 can quickly occupy the market, a very important reason, it is relying on the von Neumann this simple implementation, LOW-COST bus structure. Now the processor is Neumann on the external bus, but because of the internal cache, it actually looks like an improved Harvard structure internally. As for its pros and cons, the Harvard structure is complex, requiring high connectivity and handling of peripherals, and is highly unsuitable for peripheral storage extensions. So it's hard to use this structure for the early general-purpose CPUs. and SCM, because the internal integration of the required memory, so the use of Harvard structure is also not necessary. Now the processor, relying on the existence of cache, has been very good to unify the two. (I think this sentence is classic)