FPGA Development (3)

Source: Internet
Author: User

Reproduced

Squeeze dry FPGA on-chip storage resources

Remember long long ago, the privileged classmate wrote a short blog "m4k usage", the article mentions the cyclone device embedded storage block M4K configuration problems. The article mentions that this m4k block in addition to the storage size is limited 4Kbit, its configurable port number is also limited, usually up to 36 ports available.


At the time, it was simply mentioned that there was such a thing as a reminder to the user that there was no specific talk about how to resolve or, rather, to avoid such a situation. Therefore, this article will combine the privilege classmate recently in the use of FPGA, the configuration of on-chip memory of some on-chip resources can not be fully utilized, more in-depth discussion on how to optimize our configuration on the basis of the existing, that is, the title, our goal is "to squeeze the FPGA's on-chip storage resources."


On how to view the integration or layout of the FPGA on-chip storage of the use of resources, Quartus II software, here to teach you a few strokes, so that everyone in the system after the design of their own storage resources to do plainly, knowing, this for the future product maintenance, It's helpful to upgrade and even push it all over again. Well, I guess you can't wait, so readyàgo!.


After a project is fully compiled, Quartus II pops up a brand-new compilation report, which is first reflected in the flow summary page of the designer. Of course designers can also be shown in 1, directly find the menu bar click Processingàcompilation Report option to view.




Figure 1


Look at the Flow summary page, 2, other options here, just look at the part of total memory bits that is circled: 103,264/165,888 (62%). It is also clear that the total size of the on-chip memory of the device EP2C8Q208C8 used by the privilege is 165,888bit, and 103,264bit is used in the project, and the usage is 62%.




Figure 2


OK, so let's take a look at where the detailed storage resources are used. As shown in 3, click Analysis&synthesisàram Summary to compile the report.




Figure 3


At the same time, on the right side of the page, 4 shows the detailed storage resource allocation. In this page of the report, we can only see the detailed use of storage resources, storage resource types (that is, the use of dedicated on-chip storage resources or the construction of logical resources, obviously with logical resources is very wasteful or even unrealistic), memory type (i.e., RAM/ROM/FIFO, etc.), Memory's bit width and depth information, as well as the storage size, and whether there is an initialization file mapping.




Figure 4


Because it is part of the synthesis report, we do not give some information about specific devices, as we may also be concerned about the number of m4k blocks used in the beginning of the article, or even what m4k blocks are used by the memory we have instantiated. Don't worry, our curiosity developers can still be satisfied. Next we'll turn on the Fitteràresource sectionàram summary option in the compilation report (method with Figure 3). We can see 5, oh, I'm sorry because the page width is limited, so the name column is not fully displayed, the location of a column is only "small lotus only dew point sharp Corner", but it does not matter, as long as you understand the spirit. First of all, this location column, it is mentioned that the designer may be concerned about the specific m4k blocks are what, and m4ks a column is the use of the number of m4k blocks, other options similar, readers can analyze their own. See these, estimates are already at a glance, the designer for each of their own examples of the specific use of the on-chip memory should have some understanding.




Figure 5


However, it is estimated that the attentive reader will ask, I know the number of m4k blocks of each memory I have instantiated, and how do I know if I have exceeded the total number of devices, and do I have to wait until I compile the error? Or do you just pinch the handbook on this page to get a better look? Non-also, in fact, users just point open fitteràresource sectionàresource usage summary,6, which lists a very detailed FPGA all the use of resources on-chip, circled out of the part is here we need to focus on the place. The m4ks indicates that the device's 36 m4k block uses 26, occupancy rate 72%, and total block memory bits and the previous synthesis report is the same, strictly speaking, this data should be considered as the absolute use of storage resources on-chip; Finally, total block. Memory implementation bits option, which is the ultimate implementation of the on-chip resource utilization on the FPGA device (note that this can only be occupied rather than used, Chinese text is really profound, perhaps some time two words how to use the same, but here the privileged students want to distinguish the concept, So deliberately to remind everyone to notice, because, it also involves the subject of this article, haha, sorry, a little swim, its occupancy rate and m4ks are consistent, and must be consistent.




Figure 6


So good, "工欲善其事, its prerequisite", we are the end of the device, it is a matter. Privileged classmate proposed a concept, that is, FPGA on-chip resource utilization, his formula is: (Total block memory bits/total block memory implementation bits), for the design is (62%/72%) = 86. 11%, it should be said to be a good data (oh, quietly tell you, this example is a privileged classmate has been optimized).


Speaking of these concepts, we can real thing play a play, theory is always only theory, to improve must rely on practice. In fact, the project can be returned to the original, the situation before the optimization. Because of the space relationship, here is only one of the most notable examples of its optimization process.


In this project, there is a series of 8bit data streams, the 1th data and 1280th data to do some processing. Therefore, the simplest idea is to instantiate a 1280*8bit shift register. And this shift register is going to do some processing with the data that is going to be moved in when the first data is moved out. However, in the configuration of the shift register encountered some trouble, 8, the depth of the shift register is generally used with the configured number of taps multiplied by the distance value (recommended for the configuration of the shift register is not familiar with the friend reference privilege classmate of another blog post "Cyclone M4K shift Register Use "). And here the distance value can only be configured to 256, requires 1280 registers, and only with a taps of the idea burst, and then thought: found 256*5/128*10/64*20 are feasible way.




Figure 7


At the beginning of the configuration is not a lot of consideration, chose the 64*20 scheme, that is, configuration taps = 24 (because the taps value can only be optional 1/2/3/4/5/6/7/8/12/16/24/32/48/64/96/128, here is configured as 24 taps, While using the time to take taps output of the bit159-152, the actual synthesis of the time in fact will be 4 unused taps optimized off), distance = 64.


As shown in 8. If you are careful enough, you should find the resource usage in the lower left corner is 6 m4k.




Figure 8


Then, with this configuration, you can use the previously mentioned methods to see how memory resources are being used after compilation. Because we are focused on the utilization of the resources on the FPGA chip, it is still a fitteràresource Sectionàresource Usage summary the report. As shown in 9, total block memory bits in this report have not changed before, both are 62%, and M4ks occupies 2 more, the corresponding M4ks occupancy rate and total block memory implementation bits occupancy rate increased to 78%. Calculate, (62%/78%) = 79.5%, drop by nearly 7%. Perhaps this parameter does not explain the problem, but when the resources are tight, the problem is the most flexible one.




Figure 9


Again to mention the privileges of the students found the problem, how to improve the utilization of the optimization here (in fact, if the ep2c8q to complete the project, do not have to do this optimization work, but the final design is to achieve in the backward-compatible ep2c5q, so, the story of the birth of this article ... )? Very simple, the front actually has already given everybody to imply, the shift register's storage resource utilization is not high because of itself the storage quantity is big (only 1280*8bit=10kbit, needs 3 m4k enough), but because generates the taps to occupy the port too much, Before configuring 24 taps to occupy 6 m4k blocks, what if you configure 12 taps,distance values to 128? What happens to 6 taps,distance with a value of 256? The answer is immediately revealed, as shown in 10. In fact, both of them occupy 3 m4k. The change here is the ultimate success of the optimization of the mystery.




Figure 10


As shown in 11, the project will eventually achieve ep2c5q, but without some optimizations like the shift register example, memory resources are still very tense.




Figure 11


Speaking of which, although has been voluminous illustrated a long article. However, we would like to mention some of the problems related to storage resources on FPGA chip. About the Cyclone/cyclone II of the m4k to Cyclone III m9k, there may be some M512, in the future do not know if there will be any m32k/m128k/m1m concept out. But on the part of the privileged classmate of the current device experience, the larger the MXX, although the total amount of storage will be more and more (not to deny it can meet the needs of the on-chip storage applications), but the corresponding in the many small storage applications required in the project, the utilization of memory blocks will be lower. Because, for any user-instantiated memory, if you use the M4K block to achieve a 8bit 512b/256b/128b/64b even if only 1 B applications, they will need to occupy 1 m4k blocks. To play a more image more extreme example, my design requires two 1*8bit FIFO (of course, no one in the actual application so silly, ^o^), then after the compilation of the example, my m4k resources do not occupy 2, this is the problem. This is also a restrictive majority of the application, the privileged classmate mentioned in the FPGA on-chip storage resource utilization is not 100% reasons. In fact, this is also the recently privileged classmate of another project built in the NIOS2 platform, 12, a variety of simple peripherals are to occupy a bit of on-chip memory (not fully utilize the resources of m9k), directly resulting in a low overall utilization of the reason. In response to this situation, do not know whether the device manufacturers have considered, perhaps for them, is also in a kind of fish and bear can not have both the contradiction.




Figure 12

FPGA Development (3)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.