Last but not least, the structure of convolution neural network is built on FPGA.
The FPGA I use is Xilinx's xc6slx45, and the following is the final resource usage
One of the most important design is to solve the problem of two-dimensional convolution, I used the shift RAM IP core
But there's a problem with using it: you need to get rid of some invalid data. Specifically as follows:
At the same time, this structure can also be used on maxpooling.
Activation function uses Relu, in Verilog only need one line of code to be able to handle
The entire design takes shift RAM (used to do two-dimensional convolution), block RAM (Do data caching, cache intermediate results), ROM (store pictures and parameters). Because this chip DSP only 58, each volume kernel size is 5x5, therefore calculates each convolution to need 25 DSP multiplication, therefore calculates the parallelism degree to be 2, simultaneously calculates 2 feature map altogether consumes the DSP 50. In order to save resources, the multiplier of the second layer convolution layer is also used to reuse the first layer multiplier.
The final calculation results:
TensorFlow (Translate to Mat file, with matlab view):
FPGA (viewed with chipscope):
Conclusion: The calculation results of FPGA and the results of tensorflow a coefficient, there are some errors from the decimal rounding problems, in general, the algorithm is correct in the verification. Finally, the FPGA calculation requires 7,145 cycles, in the case of 100MHZ clock, a total of 71.45us.
Project finished, write some summary (sum up some pit and train of thought) ...
PS: Learning communication can add me qq1343395571, the program can only write their own slowly