Thread Grid (GRID)

Source: Internet
Author: User

In parallel operation, the reasonable processing of thread grid can obtain higher performance speedup for the program, and how to use the thread grid rationally to make the parallel program more efficient.

A thread grid consists of several line Cheng, each of which is a two-dimensional Cheng, divided into x-axis and y-axis. At this point, you can open up to y*x*t threads at a time. Now we have an in-depth understanding of an example. For simple periods, we limit the y-axis direction to only one row of threads.
Let's say we're looking at a standard HD picture, and this image has a resolution of 1 080. Usually the thread number of threads is preferably an integer multiple of the size of a thread bundle, which is an integer multiple of 32. Since the device is scheduled for the entire line Cheng, if we do not set the number of threads on the thread block to an integer multiple of 32, it is useless to have a subset of threads in the last thread bundle. So we have to set a condition to limit it, to prevent the processed elements from exceeding the range specified in the x-axis direction. In the following sections we will see that if you do not do this, the performance of the program will be reduced. In order to prevent unreasonable memory merging, we try to make the distribution of memory in the thread distribution reach one by one mappings: if we do not do this, the performance of the program may be reduced a lot. In the program, avoid using small thread blocks as much as possible, as this will make the most of your hardware. In this example, we will open 192 threads on each thread block. Typically, 192 is the minimum number of threads that we consider. With each thread block of 192 threads, it is easy to figure out that processing a row of images requires 10 lines Cheng ().

Here, the choice of 192 is because the x-axis processing data size is its integer multiples, but also the size of the thread bundle integer times, which makes our programming more convenient. In practical programming, we also try to do this.

We can get the index of the thread at the top of the X -axis and we can get the line number in the Y-axis direction. Since each row only handles one row of pixels, each line has a total of ten thread blocks, so we need to theline to process the entire picture, altogether1080*10=10800 a thread block. According to this one thread processing a pixel, each line Cheng open 192 Threads, a dispatch of a multi-million threads. when we treat individual pixels or data in a single process, or when we process data on the same row, this particularlayout method is very useful.. hardware in the current Fermi architectureon,a SM can handle 8 thread blocks, so the above programfrom the application layer point of view, there needs to be a 1350 ( total Total of three thread blocks divided by 8 of each SM can dispatch Thread Block) SM to fully implement parallelism. But the hardware of the current Fermi architecture onlythere are more than one SM to use (GT x 580), that is, each SM will be assigned a 675 thread block for processing .

The above example is simple, the data distribution is aligned,so it's easy to find a good solution, but what if our data is not line-based?? because of the existence of arrays, data may not always be one-dimensionalthe. at this point, we can use a two-dimensional thread block. For example, an 8x8 thread block is used in many image algorithms toprocess pixels.

??

Thread Grid (GRID)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.