当前位置:  开发笔记 > 程序员 > 正文

如何帮助gcc矢量化C代码

如何解决《如何帮助gcc矢量化C代码》经验,为你挑选了0个好方法。

我有以下C代码.第一部分只是从标准中读入一个复数的矩阵,称为矩阵M.有趣的部分是第二部分.

#include 
#include 
#include 
#include 
#include 

int main() {
    int n, m, c, d;
    float re, im;

    scanf("%d %d", &n, &m);
    assert(n==m);
    complex float M[n][n];

    for(c=0; c

我编译gcc -fopt-info-vec-all -O3 -ffast-math -march=bdver2 permanent-in-c.c -lm.这向我解释了为什么几乎没有循环被矢量化.

性能最重要的部分是47-50行,它们是:

for (i = 0; i < n; i++) {
    v[i] -= 2.*delta[j]*M[j][i];
    prod *= v[i];
}

gcc告诉我:

permanent-in-c.c:47:7: note: reduction used in loop.
permanent-in-c.c:47:7: note: Unknown def-use cycle pattern.
permanent-in-c.c:47:7: note: reduction used in loop.
permanent-in-c.c:47:7: note: Unknown def-use cycle pattern.
permanent-in-c.c:47:7: note: Unsupported pattern.
permanent-in-c.c:47:7: note: not vectorized: unsupported use in stmt.
permanent-in-c.c:47:7: note: unexpected pattern.
[...]
permanent-in-c.c:48:26: note: SLP: step doesn't divide the vector-size.
permanent-in-c.c:48:26: note: Unknown alignment for access: IMAGPART_EXPR <*M.4_40[j_202]{lb: 0 sz: pretmp_291 * 4}[i_200]>
permanent-in-c.c:48:26: note: SLP: step doesn't divide the vector-size.
permanent-in-c.c:48:26: note: Unknown alignment for access: REALPART_EXPR <*M.4_40[j_202]{lb: 0 sz: pretmp_291 * 4}[i_200]>
[...]
permanent-in-c.c:48:26: note: Build SLP failed: unrolling required in basic block SLP
permanent-in-c.c:48:26: note: Failed to SLP the basic block.
permanent-in-c.c:48:26: note: not vectorized: failed to find SLP opportunities in basic block.

如何解决阻止此部分被矢量化的问题?


奇怪的是这部分是矢量化的,但我不确定为什么:

for (j = 0; j 


gcc -fopt-info-vec-all -O3 -ffast-math -march = bdver2 permanent-in-cc -lm的完整输出位于https://bpaste.net/show/18ebc3d66a53.

推荐阅读
雯颜哥_135
这个屌丝很懒,什么也没留下!
DevBox开发工具箱 | 专业的在线开发工具网站    京公网安备 11010802040832号  |  京ICP备19059560号-6
Copyright © 1998 - 2020 DevBox.CN. All Rights Reserved devBox.cn 开发工具箱 版权所有