The process of converting image processing algorithm to FPGA system design is called algorithm mapping, and the implementation of CPU parallel algorithm is different from that of FPGA parallel algorithm.
1. Algorithmic System Architecture
The image processing algorithm mainly has two kinds of design structure: pipeline structure and parallel array structure.
1.1 Pipeline structure
In my opinion, there is a certain difference between the pipelined structure and the serial structure of the CPU that we have previously understood. Instead, it is similar to the pipelined structure of the processor, which synchronizes the flow level according to the clock.
1.2 Parallel array Structure
|--Data1--->Data1-->Data1 |InputData------Data2--->Data2-->Data2 | |--Data3--->Data3-->Data3
2. Algorithm conversion 2.1 constant conversion
For FPGA, it is simpler to do addition and shift, but it involves multiplication and division to call to a multiplier or divider, which is a valuable resource in FPGA, so as constant, we try to convert to shift and addition operations, such as:
ex1: dout = din * 255转换后 dout = ( din << 8 ) - dinex2: dout = din * 11转换后: dout = din * 2^2 + din * 2^3 - din * 2^0 = din * (2^2 + 2^3 - 2^0)
255 and 11 in the above are constant constants
2.2 Inequality equivalent conversion
The purpose of the conversion is also to save resources more when the FPGA is implemented. Such as:
ex1: √a ̄ < b, a > 0转换后: b^2 > aex2: a/b > c/b (b > 0, d > 0)转换后 a * d > b * c
2.3 Take approximate value
The simplest approximation is rounding, if a data d[-1~4] is an integer part, d[3~0] is a fractional part, then rounding can be:
assign dout = din[3]?(dout[-1~4] + 1) : (dout[-1~4])
Taylor Formula definition
Then you can have the following approximate conversions:
3. Construct a lookup table
A lookup table is a simple query operation instead of a runtime real-time calculation, and using a lookup table instead of those that are often required by the runtime can greatly reduce the complexity of time, such as the trigonometric functions we use frequently should replace real-time computations with lookup tables.
Reflection on FPGA algorithm mapping