修改ncnn的openmp非同步處理方法 附C++範例代碼

來源:互聯網
上載者:User

標籤:其他   uimage   procs   使用   clu   tencent   郵箱   改版   編譯   

ncnn剛發布不久,博主在ios下嘗試編譯。

遇上了openmp的編譯問題。

尋找各種解決方案無果,親自操刀。

採用std::thread 替換 openmp。

ncnn項目地址:

https://github.com/Tencent/ncnn

後來詢問ncnn的作者才知道在ios下的編譯方法。

至此,當時的臨時方案 採用std::thread 替換 openmp。

想想也許在一些特定情況下還是比較適用的,當前方便兩者之間進行切換驗證。

抽空寫了一個樣本項目。

項目地址:

https://github.com/cpuimage/ParallelFor

貼上完整代碼:

#include <stdio.h>#include <stdlib.h>   #include <iostream>#if defined(_OPENMP)// compile with: /openmp  #include <omp.h>auto const epoch = omp_get_wtime();double now() {    return omp_get_wtime() - epoch;};#else #include <chrono>auto const epoch = std::chrono::steady_clock::now();double now() {    return std::chrono::duration_cast<std::chrono::milliseconds>(std::chrono::steady_clock::now() - epoch).count() / 1000.0;};#endiftemplate<typename FN>double bench(const FN &fn) {    auto took = -now();    return (fn(), took + now());}#include <functional>#if defined(_OPENMP)#    include <omp.h>#else #include <thread>#include <vector>#endif#ifdef _OPENMPstatic int processorCount = static_cast<int>(omp_get_num_procs());#elsestatic int processorCount = static_cast<int>(std::thread::hardware_concurrency());#endifstatic void ParallelFor(int inclusiveFrom, int exclusiveTo, std::function<void(size_t)> func){#if defined(_OPENMP)#pragma omp parallel for num_threads(processorCount)    for (int i = inclusiveFrom; i < exclusiveTo; ++i)    {        func(i);    }    return;#else      if (inclusiveFrom >= exclusiveTo)        return;    static    size_t thread_cnt = 0;    if (thread_cnt == 0)    {        thread_cnt = std::thread::hardware_concurrency();    }    size_t entry_per_thread = (exclusiveTo - inclusiveFrom) / thread_cnt;    if (entry_per_thread < 1)    {        for (int i = inclusiveFrom; i < exclusiveTo; ++i)        {            func(i);        }        return;    }    std::vector<std::thread> threads;    int start_idx, end_idx;    for (start_idx = inclusiveFrom; start_idx < exclusiveTo; start_idx += entry_per_thread)    {        end_idx = start_idx + entry_per_thread;        if (end_idx > exclusiveTo)            end_idx = exclusiveTo;        threads.emplace_back([&](size_t from, size_t to)        {            for (size_t entry_idx = from; entry_idx < to; ++entry_idx)                func(entry_idx);        }, start_idx, end_idx);    }    for (auto& t : threads)    {        t.join();    }#endif}void test_scale(int i, double* a, double* b) {    a[i] = 4 * b[i];}int main(){    int N = 10000;    double* a2 = (double*)calloc(N, sizeof(double));    double* a1 = (double*)calloc(N, sizeof(double));    double* b = (double*)calloc(N, sizeof(double));    if (a1 == NULL || a2 == NULL || b == NULL)    {        if (a1)        {            free(a1);        }if (a2)        {            free(a2);        }if (b)        {            free(b);        }        return -1;    }    for (int i = 0; i < N; i++)    {        a1[i] = i;        a2[i] = i;        b[i] = i;    }    double beforeTime = bench([&] {        for (int i = 0; i < N; i++)        {            test_scale(i, a1, b);        }    });    std::cout << " \nbefore: " << int(beforeTime * 1000) << "ms" << std::endl;    double afterTime = bench([&] {        ParallelFor(0, N, [a2, b](size_t i)        {            test_scale(i, a2, b);        });    });    std::cout << " \nafter: " << int(afterTime * 1000) << "ms" << std::endl;    for (int i = 0; i < N; i++)    {        if (a1[i] != a2[i]) {            printf("error %f : %f \t", a1[i], a2[i]);            getchar();        }    }    free(a1);    free(a2);    free(b);    getchar();    return 0;}

要使用OPENMP,加個編譯選項/openmp  或者定義一下 _OPENMP 即可。

建議c++11編譯。

範例程式碼比較簡單。

ncnn代碼修改例子如下:

   #pragma omp parallel for        for (int q=0; q<channels; q++)        {            const Mat m = src.channel(q);            Mat borderm = dst.channel(q);            copy_make_border_image(m, borderm, top, left, type, v);        }

 改為

    ParallelFor(0, channels, [&](int  q) {                {                    const Mat m = src.channel(q);                    Mat borderm = dst.channel(q);                    copy_make_border_image(m, borderm, top, left, type, v);                }});

 

本來計劃抽點時間把ncnn整體都改一下,發個修改版本出來。

想想還是把做法貼出來,給有需求的人吧。

自己動手豐衣足食。

若有其他相關問題或者需求也可以郵件聯絡俺探討。

郵箱地址是: 
[email protected]

修改ncnn的openmp非同步處理方法 附C++範例代碼

聯繫我們

該頁面正文內容均來源於網絡整理,並不代表阿里雲官方的觀點,該頁面所提到的產品和服務也與阿里云無關,如果該頁面內容對您造成了困擾,歡迎寫郵件給我們,收到郵件我們將在5個工作日內處理。

如果您發現本社區中有涉嫌抄襲的內容,歡迎發送郵件至: info-contact@alibabacloud.com 進行舉報並提供相關證據,工作人員會在 5 個工作天內聯絡您,一經查實,本站將立刻刪除涉嫌侵權內容。

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.