Using C + + for Windows development: Exploring High-performance Algorithms

Source: Internet
Author: User
Tags data structures fast web

Issues such as coordination, asynchronous behavior, responsiveness, and scalability can be the focus of attention in concurrent space. These are some of the more esoteric topics that developers must consider when designing applications. However, perhaps due to inexperience or lack of proper performance tools, some equally important topics are often overlooked. A high-performance algorithm is one example.

At the enterprise level, developers carefully weigh issues such as distributed file systems and caching, clustering, message queues, and databases. But if the core algorithms and data structures are inefficient, what's the use of considering these?

The efficiency of the algorithm is not as simple as you think. Well-designed algorithms on a single processor can often outperform inefficient implementations on multiprocessor processors. But now, when a multiprocessor is available, a well-designed algorithm also shows measurable scalability and efficiency. Because of the complexity of the problem, algorithms optimized for a single processor are often difficult to execute in parallel, while less efficient algorithms can often perform better in a multiprocessor environment.

To illustrate this point, I will use Visual C + + to present a very simple algorithm development process, but in fact it is not simple, even if at first glance like this. Here are some of the things we need to implement:

void MakeGrayscale(BYTE* bitmap,
          const int width,
          const int height,
          const int stride);

Bitmap parameter, pointing to an image of 32 bits per pixel. Again, this is the focus of this article. The absolute value of a span that indicates the number of bytes in memory for a row of pixels to the next line of pixels. There may be padding at the end of each line. The symbol for the stepover, indicating whether the rows are Top-down (positive stepover) or bottom-up (negative stepover) in memory.

Let's first identify the starting point. We can use the following structure to represent the pixels in memory:

typedef unsigned char BYTE; // from windef.h
struct Pixel
{
  BYTE Blue;
  BYTE Green;
  BYTE Red;
  BYTE Alpha;
};

With a fast Web search, we determine that a pixel of a given color can get a reasonable grayscale value by mixing 30% of red, 59% of Green, and 11% of Blue. The following is a simple function that converts a pixel to a gray level:

void MakeGrayscale(Pixel& pixel)
{
  const BYTE scale = static_cast<BYTE>(0.30 * pixel.Red +
                     0.59 * pixel.Green +
                     0.11 * pixel.Blue);
  pixel.Red = scale;
  pixel.Green = scale;
  pixel.Blue = scale;
}

To calculate the byte offset of a particular pixel within a bitmap, you can calculate the product of its horizontal position and pixel size and the product of its vertical position and span, and then add these values:

offset = x * sizeof(Pixel) + y * stride

So, how do you implement the Makegrayscale function? If you skip this section without other considerations, you may write an algorithm line similar to the ones shown in Figure 1. At first glance this seems reasonable, and using this method seems to be a good way to handle small bitmaps. But what about larger bitmaps? What about a 20,000 * 20,000 pixel bitmap?

Fig. 1 A low-efficiency single-threaded algorithm

void MakeGrayscale(BYTE* bitmap,
          const int width,
          const int height,
          const int stride)
{
  for (int x = 0; x < width; ++x)
  for (int y = 0; y < height; ++y)
  {
    const int offset = x * sizeof(Pixel) + y * stride;
    Pixel& pixel = *reinterpret_cast<Pixel*>(bitmap + offset);
    MakeGrayscale(pixel);
  }
}

I happen to have a Dell PowerEdge with a four-core Intel Xeon X3210 processor. The clock speed of this machine is 2.13GHz, the front side bus is 1066MHz, the two cache is 8MB, in addition also has various other super dazzle function. Admittedly, it is not the latest Intel Xeon processor, but it is indeed commendable. It is powered by a 64-bit version of Windows Server 2008 and is ideal for performance testing.

With these support, I ran the algorithm shown in Figure 1 for a bitmap with a width of 20,000 pixels and a height of 20,000 pixels. On average, it took over 10 iterations in 46 seconds. Admittedly, this bitmap is quite large, accounting for about 1.5GB of space. But is this really the problem? My server has 4GB of RAM, so no paging disk is required. But Figure 2 shows the processor usage view that everyone is very familiar with.

Figure 2 Processor usage for low-efficiency single-threaded algorithms

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.