我有以下C代码.第一部分只是从标准中读入一个复数的矩阵,称为矩阵M
.有趣的部分是第二部分.
#include#include #include #include #include int main() { int n, m, c, d; float re, im; scanf("%d %d", &n, &m); assert(n==m); complex float M[n][n]; for(c=0; c 我编译
gcc -fopt-info-vec-all -O3 -ffast-math -march=bdver2 permanent-in-c.c -lm
.这向我解释了为什么几乎没有循环被矢量化.性能最重要的部分是47-50行,它们是:
for (i = 0; i < n; i++) { v[i] -= 2.*delta[j]*M[j][i]; prod *= v[i]; }gcc告诉我:
permanent-in-c.c:47:7: note: reduction used in loop. permanent-in-c.c:47:7: note: Unknown def-use cycle pattern. permanent-in-c.c:47:7: note: reduction used in loop. permanent-in-c.c:47:7: note: Unknown def-use cycle pattern. permanent-in-c.c:47:7: note: Unsupported pattern. permanent-in-c.c:47:7: note: not vectorized: unsupported use in stmt. permanent-in-c.c:47:7: note: unexpected pattern. [...] permanent-in-c.c:48:26: note: SLP: step doesn't divide the vector-size. permanent-in-c.c:48:26: note: Unknown alignment for access: IMAGPART_EXPR <*M.4_40[j_202]{lb: 0 sz: pretmp_291 * 4}[i_200]> permanent-in-c.c:48:26: note: SLP: step doesn't divide the vector-size. permanent-in-c.c:48:26: note: Unknown alignment for access: REALPART_EXPR <*M.4_40[j_202]{lb: 0 sz: pretmp_291 * 4}[i_200]> [...] permanent-in-c.c:48:26: note: Build SLP failed: unrolling required in basic block SLP permanent-in-c.c:48:26: note: Failed to SLP the basic block. permanent-in-c.c:48:26: note: not vectorized: failed to find SLP opportunities in basic block.如何解决阻止此部分被矢量化的问题?
奇怪的是这部分是矢量化的,但我不确定为什么:
for (j = 0; j
gcc -fopt-info-vec-all -O3 -ffast-math -march = bdver2 permanent-in-cc -lm的完整输出位于https://bpaste.net/show/18ebc3d66a53.