对于以下代码,这里有一些上下文.
Mat img0; // 1280x960 grayscale
-
timer.start(); for (int i = 0; i < img0.rows; i++) { vectorv; uchar* p = img0.ptr (i); for (int j = 0; j < img0.cols; ++j) { v.push_back(p[j]); } } cout << "Single thread " << timer.end() << endl;
和
timer.start(); concurrency::parallel_for(0, img0.rows, [&img0](int i) { vectorv; uchar* p = img0.ptr (i); for (int j = 0; j < img0.cols; ++j) { v.push_back(p[j]); } }); cout << "Multi thread " << timer.end() << endl;
结果:
Single thread 0.0458856 Multi thread 0.0329856
加速几乎不引人注意.
我的处理器是Intel i5 3.10 GHz
RAM 8 GB DDR3
编辑
我尝试了一种稍微不同的方法.
vectorimgs = split(img0, 2,1); // `split` is my custom function that, in this case, splits `img0` into two images, its left and right half
-
timer.start(); concurrency::parallel_for(0, (int)imgs.size(), [imgs](int i) { Mat img = imgs[i]; vectorv; for (int row = 0; row < img.rows; row++) { uchar* p = img.ptr (row); for (int col = 0; col < img.cols; ++col) { v.push_back(p[col]); } } }); cout << " Multi thread Sectored " << timer.end() << endl;
而且我得到了更好的结果:
Multi thread Sectored 0.0232881
所以,当我跑步时,我看起来正在创建960个线程
parallel_for(0, img0.rows, ...
而且效果不佳.
(我必须补充一点,肯尼的评论是正确的.不要过多地关注我在这里说的具体数字.当测量这些小的间隔时,有很大的变化.但总的来说,我在编辑中写的,关于分裂与旧方法相比,图像减半,性能得到改善.)