Using C + + for Windows development: Exploring High-performance Algorithms

Last Update:2017-02-27 Source: Internet

Author: User

Tags data structures fast web

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Issues such as coordination, asynchronous behavior, responsiveness, and scalability can be the focus of attention in concurrent space. These are some of the more esoteric topics that developers must consider when designing applications. However, perhaps due to inexperience or lack of proper performance tools, some equally important topics are often overlooked. A high-performance algorithm is one example.

At the enterprise level, developers carefully weigh issues such as distributed file systems and caching, clustering, message queues, and databases. But if the core algorithms and data structures are inefficient, what's the use of considering these?

The efficiency of the algorithm is not as simple as you think. Well-designed algorithms on a single processor can often outperform inefficient implementations on multiprocessor processors. But now, when a multiprocessor is available, a well-designed algorithm also shows measurable scalability and efficiency. Because of the complexity of the problem, algorithms optimized for a single processor are often difficult to execute in parallel, while less efficient algorithms can often perform better in a multiprocessor environment.

To illustrate this point, I will use Visual C + + to present a very simple algorithm development process, but in fact it is not simple, even if at first glance like this. Here are some of the things we need to implement:

void MakeGrayscale(BYTE* bitmap, 　　　　　　　　　 const int width, 　　　　　　　　　 const int height, 　　　　　　　　　 const int stride);

Bitmap parameter, pointing to an image of 32 bits per pixel. Again, this is the focus of this article. The absolute value of a span that indicates the number of bytes in memory for a row of pixels to the next line of pixels. There may be padding at the end of each line. The symbol for the stepover, indicating whether the rows are Top-down (positive stepover) or bottom-up (negative stepover) in memory.

Let's first identify the starting point. We can use the following structure to represent the pixels in memory:

typedef unsigned char BYTE; // from windef.h struct Pixel { 　　BYTE Blue; 　　BYTE Green; 　　BYTE Red; 　　BYTE Alpha; };

With a fast Web search, we determine that a pixel of a given color can get a reasonable grayscale value by mixing 30% of red, 59% of Green, and 11% of Blue. The following is a simple function that converts a pixel to a gray level:

void MakeGrayscale(Pixel& pixel) { 　　const BYTE scale = static_cast<BYTE>(0.30 * pixel.Red + 　　　　　　　　　　　　　　　　　　　　 0.59 * pixel.Green + 　　　　　　　　　　　　　　　　　　　　 0.11 * pixel.Blue); 　　pixel.Red = scale; 　　pixel.Green = scale; 　　pixel.Blue = scale; }

To calculate the byte offset of a particular pixel within a bitmap, you can calculate the product of its horizontal position and pixel size and the product of its vertical position and span, and then add these values:

offset = x * sizeof(Pixel) + y * stride

So, how do you implement the Makegrayscale function? If you skip this section without other considerations, you may write an algorithm line similar to the ones shown in Figure 1. At first glance this seems reasonable, and using this method seems to be a good way to handle small bitmaps. But what about larger bitmaps? What about a 20,000 * 20,000 pixel bitmap?

Fig. 1 A low-efficiency single-threaded algorithm

void MakeGrayscale(BYTE* bitmap, 　　　　　　　　　 const int width, 　　　　　　　　　 const int height, 　　　　　　　　　 const int stride) { 　　for (int x = 0; x < width; ++x) 　　for (int y = 0; y < height; ++y) 　　{ 　　　　const int offset = x * sizeof(Pixel) + y * stride; 　　　　Pixel& pixel = *reinterpret_cast<Pixel*>(bitmap + offset); 　　　　MakeGrayscale(pixel); 　　} }

I happen to have a Dell PowerEdge with a four-core Intel Xeon X3210 processor. The clock speed of this machine is 2.13GHz, the front side bus is 1066MHz, the two cache is 8MB, in addition also has various other super dazzle function. Admittedly, it is not the latest Intel Xeon processor, but it is indeed commendable. It is powered by a 64-bit version of Windows Server 2008 and is ideal for performance testing.

With these support, I ran the algorithm shown in Figure 1 for a bitmap with a width of 20,000 pixels and a height of 20,000 pixels. On average, it took over 10 iterations in 46 seconds. Admittedly, this bitmap is quite large, accounting for about 1.5GB of space. But is this really the problem? My server has 4GB of RAM, so no paging disk is required. But Figure 2 shows the processor usage view that everyone is very familiar with.

Figure 2 Processor usage for low-efficiency single-threaded algorithms

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More