Concurrent memory allocation and TBB Solutions

Source: Internet
Author: User
Document directory
  • Memory Allocation Problems
  • Memory distributor
  • Use a distributor in an STL container

Memory allocation is not only the basic task of programming, but also a major challenge affecting efficiency in multi-core programming. In C ++, we can use a custom memory distributor to replace std: allocator. Threading Building Blocks provides a Scalable Memory distributor compatible with std: allocator. Copyright 52bc.net

Each memory distributor has its own characteristics. The extensible distributor of TBB is dedicated to scalability and speed. In some cases, there is a price: virtual space is wasted. Specifically, it will waste a lot of space when 9 K to 12 K blocks are allocated. 52bc.net

Memory Allocation Problems

In multi-threaded programs, normal memory allocation will become a serious performance bottleneck. This is because a general memory distributor uses a global lock to allocate and release memory blocks from a single global heap. When each thread allocates and releases the memory, competition is triggered. Due to this competition, programs with frequent memory allocation may reduce the advantage of multiple cores. Programs using the C ++ standard library (STL) may be more rigorous, because their memory allocation is often hidden behind the scenes.

 

# Include <iostream> # include <tbb/parallel_for.h> # include <tbb/tick_count.h> # include <vector>
Using namespace tbb; void alloctask (int) {// space std: vector <int> data (100);} int main () {tick_count t1 = tick_count: now (); // records the time spent in parallel_for (100,000, 1, alloctask); // executes alloctask (concurrency) tick_count t2 = tick_count: now (); std: cout <(t2-t1 ). seconds () <std: endl; return 0;} You can run this code to view the time spent, we can easily modify the above Code to use the TBB Scalable Memory distributor, which will speed up nearly twice (on my dual-core CPU, more CPU cores are expected to have better performance ). Content from 52bc.net

Note: If you have read this article about the TBB loop, it may be strange that there is no task_scheduler_init here, And the parallel_for parameters are different. In fact, this Code is based on the latest TBB2.2 version, which no longer requires task_scheduler_init. Parallel_for also has several additional loads for ease of use. Copyright 52bc.net

"False sharing" is another serious problem in concurrent programs. It often occurs when the memory blocks used by multiple threads are close together. There is a high-speed cache zone called "cache lines" in the processor kernel. It can only be accessed by the same thread in the same cache zone. Otherwise, a cache switch is triggered, this can easily cause a waste of hundreds of clock cycles. Content from 52bc.net

To explain why fake sharing has such performance loss, we can look at the additional overhead caused when two threads access the memory that is closely together. Assume that the cache area contains 64 bytes, and two threads share the cache. 52bc.net

First, the program defines two arrays, including 1000 float (4 bytes): copyright 52bc.net

Float A_array [1, 1000]; float B _array [2, 1000];

Content from 52bc.net

Due to the sequential allocation of the compiler, these two arrays are likely to be close together. Consider the following action: content from 52bc.net

  1. Thread a writes a_array [999];
    The processor caches 64 bytes containing the_array [999] element.
  2. Thread B writes B _array [0];
    Additional Overhead: the processor must refresh the cache and save a_array [999] To the memory. Load 64 bytes containing B _array [0] into the cache and Mark thread a as invalid.
  3. Continue working. Thread a writes a_array [1];
    Additional Overhead: the processor must refresh the cache to save B _array [0] To the memory. Re-load the cache for thread a and set the cache mark of thread B to invalid.

Look, even if thread A and thread B use their respective memory, it will cause A great deal of overhead. The solution to false sharing is to align the array according to the cache boundary. Copyright 52bc.net

Memory distributor

The extended memory distributor of TBB can be used to solve the problem described above. TBB provides two allocators: scalable_allocator and cache_aligned_allocator, which are defined in tbb/scalable_allocator.h and tbb/allocate respectively.

This article is from 52bc.net

 

  • Scalable_allocator solves the allocation competition and does not completely prevent false sharing. However, each thread obtains the memory from different memory pools, which can avoid false sharing in a certain order.
  • Cache_aligned_allocator solves the issue of allocation competition and false sharing. Because the allocated memory is a multiple of the cache size, more space is required, especially when a large number of small spaces are allocated. We should use cache_aligned_allocator when determining that false sharing has become a performance bottleneck. It is a good idea to use two splitters in your program to test performance to determine which one is used at the end.
Use a distributor in an STL container

Scalable_allocator and cache_aligned_allocator are compatible with std: allocator. We can use them like std: allocator. The following example demonstrates the use of cache_aligned_allocator as the std: vector distributor.

Copyright 52bc.net

 

std::vector< int, cache_aligned_allocator<int> >; copyright 52bc.net 

Now we can modify the previous Code: This article is from 52bc.net

Void alloctask (int) {// space std: vector <int, scalable_allocator <int> data (100);} will be applied and released during vector construction and analysis. Running result:
Expandable memory distributor without TBB: 0.405 s
Use scalable_allocator: 0.1843 s
Use cache_aligned_allocator: 0.187084 s
From http://www.52bc.net/html/biancheng/C_C__/20091030/534.html

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.