Shave Overview (Pending modification)

Source: Internet
Author: User
Tags scalar

The shave contains a wide and deep register file, plus a long instruction set (VLIW) that increases the efficiency of the code size.

such as: The VLIW Package controls multiple function units, SIMD functions, can improve the degree of parallelism and throughput.

SIMD: Single instruction stream multi-stream ( English:Instruction multiple data, abbreviation:SIMD) is a controller to control multiple processors, A technology that performs the same operations on a set of data (also known as "data vectors") to achieve space parallelism. It is typically represented by the vector processors (Vector Processor) and array processors (arrays Processor).

For example, there are four compute units that send an addition instruction to the four compute units so that they can perform addition calculations at the same time. Vector Computing X,y,z,w is the use of this, so is a typical representative.

Shave supports multiple types of SIMD instructions, 16bit int, 32bit int, 16bit float, 32bit float, 8bit int, and more.

Shave also supports assembly, C and C + + programming, which requires Movidius internal Duke Moviasm and Movicompile.

Below are the basics: IRF, VRF, VAU, SAU, IAU, PEU, BRU and LSU

IRF (Shaping register file)

IRF contains 32 registers, each of which is 32bit long. These registers are mainly for the support of shaping operations, not only that, but also with the load and access instructions.

These registers are executed and manipulated by the IAU (Integer arithmetic unit, shaping Operation unit) and SAU (scalar arithmetic unit scalar unit).

In addition, SIMD operations also have some 16bit and 8bit integer types in sau and IAU.

VRF (vector register file)

The VRF also contains 32 registers, each of which is 128bit in length. These registers are for the shave to provide SIMD operations.

These registers are executed called Vau (Vector arithmetic Unit). It supports both shaping and surfacing operations, supporting 8,16, 32bit shaping, or floating-point.

SAU (Scalar arithmetic Unit)

This unit provides support for floating-point arithmetic for IRF.

In addition to the most common floating-point source operations, the unit also implements some complex 16bit floating-point operations such as: complement, sine, square root, square root reciprocal, cosine, tangent, logarithmic, and exponential.

This unit also provides for shaping operations. If useful, this feature will be more used to provide a parallel shaping operation for IRF.

CMU (Compare and Move unit compare moving units)

This unit provides the ability to copy values from one register to another register. Supports any combination and multi-bit length (bit).

This unit also provides the ability to compare data types. The comparison is done by setting multiple conditional entrances. VRF can also initiate this comparison to compare multiple data.

LSU (load store unit loading storage unit)

There are 2 load storage units available to load and store data to two register files

LSU works with other units, blending multiple data types, described above in the shave Isa documentation.

BRU (Branching Unit branch)

BRU provides branching functionality. The shave has a 5-cycle delay slot for filling in other instructions.

PEU (predicate Execution Unit Predictive execution Unit)

Peu helps to achieve conditional branch prediction and preservation conditions within LSU and VAU units

IRF: Shaping register file (32*32)
From words of 32-bit wide data
The IRF can be accessed by the IAU or the SAU units
Scalar Register File (32x32)
32 Words of 32-bit wide data
the SRF can accessed by the SAU unit only

Vrf:vector Register File (32x128)

32 Words of 128-bit wide data
the VRF can accessed by the VAU unit only

Here is an example of a shave assembly

; The code is for ISAAC version 1.1
. version
. Data Initsection 0x40000000
. int 0x10, 0x10, 100, 100
. Data Colorsection 0x90101000
. Byte 0xFF, 0xFF, 0xFF; white
. Byte 0x00, 0x00, 0x00; black
. Data Framesection 0x10008000
. Incbin "Frame1.bin"
. Code entrypoint 0x1d000000
LSU0. Ldil I0, vcolortable; load pointer to colors
|| LSU1. Ldih I0, vcolortable
LSU0. Ldih I1, vbmppointer; load Bmppointer
|| LSU1. Ldih I1, Vbmppointer
. end

This assembly consists of 2 fields: Data sections and Code sections. Each of them contains:

<sectionName> + <sectionDefaultAddress>

The data segment consists mainly of images and tests.

The code snippet contains the assembly codes that need to be executed.

Code is organized in the pipeline and pre-processing instructions.

"| |" The expression is executed simultaneously within the same cycle.


Compiler space bar and TAB key


Comments are ";" or "//"


The symbol must be [Label]: as follows:

LSU0. Ldil I1, Endlabel
|| LSU1. Ldih I1, Endlabel
VAU. ADD. I32 V1, V2, V3


The symbols in the assembly code are processed into strings. There are differences in capitalization.

. Set Symbolname 0xffff0000
LSU0. Ldil I1, Symbolname
|| LSU1. Ldih I1, (symbolname >> 16)
; I1 'll has the value 0xffff0000

Symbolname will be replaced by 0xffff0000.


Numbers are treated as immediate numbers, may have different prefixes, decimal (no prefixes), 16 binary (' 0x '), 2 binary (' 0b ')

The Ldil and LDIH operators are load-specific immediate numbers:

Ldil: Load low 16 bits.

Ldih: If this number can fill 16bit, load this number, if not fill 16bits, high 16 bit will be loaded (??). )


. Set SymbolName1 0x41200000
. Set SymbolName2 10f32
LSU0. Ldil I1, SymbolName1
|| LSU1. Ldih I1, (SymbolName1 >> 16)
LSU0. Ldil I2, SymbolName2
|| LSU1. Ldih I2, (SymbolName2 >> 16)
; I1 and I2 would have the same values 0x41200000

I1 and I2 are equal to 0x41200000.


The parameter separator symbol can be a space or ', '.

NOP 4 is equivalent to:





NOP executes 4 times.

Shave Overview (Pending modification)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.