前段时间我写了一小段代码来询问有关访谈的内容,看看人们如何理解缓存和内存的概念:
#include "stdafx.h" #include#include #include #define TOTAL 0x20000000 using namespace std; __int64 count(int INNER, int OUTER) { int a = 0; int* arr = (int*) HeapAlloc(GetProcessHeap(), 0, INNER * sizeof(int)); if (!arr) { cerr << "HeapAlloc failed\n"; return 1; } LARGE_INTEGER freq; LARGE_INTEGER startTime, endTime; __int64 elapsedTime, elapsedMilliseconds; QueryPerformanceFrequency(&freq); QueryPerformanceCounter(&startTime); /* ?????? ?????? */ for (int i = 0; i < OUTER; i++) { for (int j = 0; j < INNER; j++) { a |= i; arr[j] = a; } } /* ????? ?????? */ QueryPerformanceCounter(&endTime); elapsedTime = endTime.QuadPart - startTime.QuadPart; elapsedMilliseconds = (1000 * elapsedTime) / freq.QuadPart; HeapFree(GetProcessHeap(), 0, arr); return elapsedMilliseconds; } int _tmain(int argc, _TCHAR* argv[]) { __int64 time; for (int INNER = 0x10; INNER <= 0x2000000; INNER <<= 1) { int OUTER = TOTAL / INNER; time = count(INNER, OUTER); cout << INNER << "\t" << OUTER << "\t" << time << "\n"; } }
这就是它编译的内容(循环本身):
00401062 xor ecx,ecx 00401064 test ebp,ebp 00401066 jle count+83h (401083h) 00401068 xor eax,eax 0040106A test ebx,ebx 0040106C jle count+7Ch (40107Ch) 0040106E mov edi,edi 00401070 or esi,ecx 00401072 mov dword ptr [edi+eax*4],esi 00401075 add eax,1 00401078 cmp eax,ebx 0040107A jl count+70h (401070h) 0040107C add ecx,1 0040107F cmp ecx,ebp 00401081 jl count+68h (401068h)
这就是程序在我的机器上输出的内容:
LOG2(INNER) LOG2(OUTER) Time, ms 4 25 523 5 24 569 6 23 441 7 22 400 8 21 367 9 20 358 10 19 349 11 18 364 12 17 378 13 16 384 14 15 357 15 14 377 16 13 379 17 12 390 18 11 386 19 10 419 20 9 995 21 8 1,015 22 7 1,038 23 6 1,071 24 5 1,082 25 4 1,119
我请人们解释发生了什么.随着内部阵列的增长,循环次数随着时间的推移而减少.随着内部数组超出缓存,缓存未命中率开始发生,时间增加.到目前为止一切都好.
但是:当INNER数组大小为16(这给我们64字节的数据)时,尽管代码数量更多,但几乎没有性能提升jmps
.它很少(523对569),但可重复.
问题是:为什么这个提升?
可能是因为64是您计算机上的缓存行大小,并且您基本上完全从单个缓存行运行每次迭代.