我目前正在尝试理解堆栈是如何工作的,所以我决定自学一些汇编语言,我正在使用这本书:
http://savannah.nongnu.org/projects/pgubook/
我正在使用Gas并在Linux Mint上进行开发.
我有点困惑:
据我所知,堆栈只是一个数据结构.所以我假设如果我在汇编编码我必须自己实现堆栈.然而,这似乎并非如此,因为有像这样的命令
pushl popl
因此,当在x86架构的汇编中编码并使用Gas语法时:堆栈只是已经实现的数据结构吗?或者它实际上是在硬件级别实现的?或者是别的什么?其他芯片组的大多数汇编语言也已经实现了堆栈吗?
我知道这是一个愚蠢的问题,但实际上我很困惑.
我认为主要是你在program's stack
和... 之间感到困惑any old stack
.
一堆
是一种抽象数据结构,由Last In First Out系统中的信息组成.你把任意物品放到堆叠上,然后再将它们取下来,就像进/出托盘一样,顶部的物品总是被取下的,你总是把它放在上面.
程序堆栈
是堆栈,它是执行期间使用的一部分内存,它通常具有每个程序的静态大小,并且经常用于存储函数参数.在调用函数时将参数压入堆栈,函数直接寻址堆栈或从堆栈中弹出变量.
程序堆栈通常不是硬件(虽然它保存在内存中因此可以这样说),但是指向堆栈当前区域的堆栈指针通常是CPU寄存器.这使得它比LIFO堆栈更灵活,因为您可以更改堆栈寻址的点.
您应该阅读并确保您理解维基百科文章,因为它可以很好地描述您正在处理的硬件堆栈.
本教程还根据旧的16位寄存器解释了堆栈,但可能会有所帮助,另外还有一个特别关于堆栈的堆栈.
来自Nils Pipenbrinck:
值得注意的是,有些处理器没有实现访问和操作堆栈的所有指令(推送,弹出,堆栈指针等),但x86因为它的使用频率而做.在这些情况下,如果你想要一个堆栈,你必须自己实现它(一些MIPS和一些ARM处理器是在没有堆栈的情况下创建的).
例如,在MIP中,推送指令将实现如下:
addi $sp, $sp, -4 # Decrement stack pointer by 4 sw $t0, ($sp) # Save $t0 to stack
并且Pop指令看起来像:
lw $t0, ($sp) # Copy from stack to $t0 addi $sp, $sp, 4 # Increment stack pointer by 4
(我已经对这个答案中的所有代码作了一个要点,以防你想玩它)
在2003年的CS101课程中,我只在asm中做过最基本的事情.而且我从来没有真正"理解"asm和堆栈工作,直到我意识到它基本上就像用C或C++编程一样......但没有局部变量,参数和功能.可能听起来不容易:)让我告诉你(对于x86 asm与英特尔语法).
1.什么是堆栈
堆栈是在启动时为每个线程分配的连续内存块.你可以随意存放.用C++的说法(代码片段#1):
const int STACK_CAPACITY = 1000;
thread_local int stack[STACK_CAPACITY];
2.堆栈的顶部和底部
原则上,您可以将值存储在stack
数组的随机单元格中(代码段#2.1):
cin >> stack[333];
cin >> stack[517];
stack[555] = stack[333] + stack[517];
但是想象一下,要记住哪些细胞stack
已经被使用并且哪些细胞是"免费的"是多么困难.这就是我们将新值存储在堆栈中的原因.
关于(x86)asm的堆栈的一个奇怪的事情是你从最后一个索引开始添加东西并移动到较低的索引:stack [999],然后堆栈[998]等等(片段#2.2):
cin >> stack[999];
cin >> stack[998];
stack[997] = stack[999] + stack[998];
而且(好吧,你现在会感到困惑)"官方"名称stack[999]
是堆栈的底部.
最后使用的单元格(stack[997]
在上面的示例中)称为堆栈顶部(请参阅堆栈顶部位于x86上的位置).
3.堆栈指针(SP)
堆栈不是你的asm代码中唯一可见的东西.您还可以操作CPU寄存器(请参见通用寄存器).它们就像全局变量:
int AX, BX, SP, BP, ...;
int main(){...}
有专用的CPU寄存器(SP)来跟踪添加到堆栈的最后一个元素.顾名思义,它就是一个指针(保存一个像0xAAAABBCC这样的内存地址).但是为了这篇文章的目的,我将它用作索引.
在线程开始时SP == STACK_CAPACITY
,然后在需要时减少它.规则是你不能写入超出堆栈顶部的堆栈单元格,任何索引少于SP无效,因此
首先递减SP 然后将值写入新分配的单元格.
如果您知道将在堆栈中连续添加多个值,则可以预先为所有这些值保留空间(代码段#3):
SP -= 3;
cin >> stack[999];
cin >> stack[998];
stack[997] = stack[999] + stack[998];
注意.现在您可以看到为什么在堆栈上"分配"速度如此之快.你实际上没有分配任何东西(如new
关键字或malloc
),它只是一个整数减少.
4.摆脱局部变量
让我们来看看这个简单的函数(代码片段#4.1):
int triple(int a) {
int result = a * 3;
return result;
}
并在没有局部变量的情况下重写它(片段#4.2):
int triple_noLocals(int a) {
SP -= 1; // move pointer to unused cell, where we can store what we need
stack[SP] = a * 3;
return stack[SP];
}
用法(摘录#4.3):
// SP == 1000
someVar = triple_noLocals(11);
// now SP == 999, but we don't need the value at stack[999] anymore
// and we will move the stack index back, so we can reuse this cell later
SP += 1; // SP == 1000 again
5.按下/弹出
在堆栈顶部添加一个新元素是一种频繁的操作,CPU有一个特殊的指令,push
.我们将像这样实现它(代码片段5.1):
void push(int value) {
--SP;
stack[SP] = value;
}
同样,采用堆栈的顶部元素(代码段5.2):
void pop(int& result) {
result = stack[SP];
++SP; // note that `pop` decreases stack's size
}
push/pop的常用用法模式暂时保存了一些值.比方说,我们在变量中有一些有用的东西myVar
,由于某种原因,我们需要进行覆盖它的计算(代码片段5.3):
int myVar = ...;
push(myVar); // SP == 999
myVar += 10;
... // do something with new value in myVar
pop(myVar); // restore original value, SP == 1000
6.摆脱参数
现在让我们使用stack传递参数(片段#6):
int triple_noL_noParams() { // `a` is at index 999, SP == 999
SP -= 1; // SP == 998, stack[SP + 1] == a
stack[SP] = stack[SP + 1] * 3;
return stack[SP];
}
int main(){
push(11); // SP == 999
assert(triple(11) == triple_noL_noParams());
SP += 2; // cleanup 1 local and 1 parameter
}
7. Getting rid of return
statements
Let's return value in AX register (snippet #7):
void triple_noL_noP_noReturn() { // `a` at 998, SP == 998
SP -= 1; // SP == 997
stack[SP] = stack[SP + 1] * 3;
AX = stack[SP];
SP += 1; // finally we can cleanup locals right in the function body, SP == 998
}
void main(){
... // some code
push(AX); // save AX in case there is something useful there, SP == 999
push(11); // SP == 998
triple_noL_noP_noReturn();
assert(triple(11) == AX);
SP += 1; // cleanup param
// locals were cleaned up in the function body, so we don't need to do it here
pop(AX); // restore AX
...
}
8. Stack base pointer (BP) (also known as frame pointer) and stack frame
Lets take more "advanced" function and rewrite it in our asm-like C++ (snippet #8.1):
int myAlgo(int a, int b) {
int t1 = a * 3;
int t2 = b * 3;
return t1 - t2;
}
void myAlgo_noLPR() { // `a` at 997, `b` at 998, old AX at 999, SP == 997
SP -= 2; // SP == 995
stack[SP + 1] = stack[SP + 2] * 3;
stack[SP] = stack[SP + 3] * 3;
AX = stack[SP + 1] - stack[SP];
SP += 2; // cleanup locals, SP == 997
}
int main(){
push(AX); // SP == 999
push(22); // SP == 998
push(11); // SP == 997
myAlgo_noLPR();
assert(myAlgo(11, 22) == AX);
SP += 2;
pop(AX);
}
Now imagine we decided to introduce new local variable to store result there before returning, as we do in tripple
(snippet #4.1). The body of the function will be (snippet #8.2):
SP -= 3; // SP == 994
stack[SP + 2] = stack[SP + 3] * 3;
stack[SP + 1] = stack[SP + 4] * 3;
stack[SP] = stack[SP + 2] - stack[SP + 1];
AX = stack[SP];
SP += 3;
You see, we had to update every single reference to function parameters and local variables. To avoid that, we need an anchor index, which doesn't change when the stack grows.
We will create the anchor right upon function entry (before we allocate space for locals) by saving current top (value of SP) into BP register. Snippet #8.3:
void myAlgo_noLPR_withAnchor() { // `a` at 997, `b` at 998, SP == 997
push(BP); // save old BP, SP == 996
BP = SP; // create anchor, stack[BP] == old value of BP, now BP == 996
SP -= 2; // SP == 994
stack[BP - 1] = stack[BP + 1] * 3;
stack[BP - 2] = stack[BP + 2] * 3;
AX = stack[BP - 1] - stack[BP - 2];
SP = BP; // cleanup locals, SP == 996
pop(BP); // SP == 997
}
The slice of stack, wich belongs to and is in full control of the function is called function's stack frame. E.g. myAlgo_noLPR_withAnchor
's stack frame is stack[996 .. 994]
(both idexes inclusive).
Frame starts at function's BP (after we've updated it inside function) and lasts until the next stack frame. So the parameters on the stack are part of the caller's stack frame (see note 8a).
Notes:
8a. Wikipedia says otherwise about parameters, but here I adhere to Intel software developer's manual, see vol. 1, section 6.2.4.1 Stack-Frame Base Pointer and Figure 6-2 in section 6.3.2 Far CALL and RET Operation. Function's parameters and stack frame are part of function's activation record (see The gen on function perilogues).
8b. positive offsets from BP point to function parameters and negative offsets point to local variables. That's pretty handy for debugging
8c. stack[BP]
stores the address of the previous stack frame, stack[stack[BP]]
stores pre-previous stack frame and so on. Following this chain, you can discover frames of all the functions in the programm, which didn't return yet. This is how debuggers show you call stack
8d. the first 3 instructions of myAlgo_noLPR_withAnchor
, where we setup the frame (save old BP, update BP, reserve space for locals) are called function prologue
9. Calling conventions
In snippet 8.1 we've pushed parameters for myAlgo
from right to left and returned result in AX
.
We could as well pass params left to right and return in BX
. Or pass params in BX and CX and return in AX. Obviously, caller (main()
) and
called function must agree where and in which order all this stuff is stored.
Calling convention is a set of rules on how parameters are passed and result is returned.
In the code above we've used cdecl calling convention:
Parameters are passed on the stack, with the first argument at the lowest address on the stack at the time of the call (pushed last <...>). The caller is responsible for popping parameters back off the stack after the call.
the return value is placed in AX
EBP and ESP must be preserved by the callee (myAlgo_noLPR_withAnchor
function in our case), such that the caller (main
function) can rely on those registers not having been changed by a call.
All other registers (EAX, <...>) may be freely modified by the callee; if a caller wishes to preserve a value before and after the function call, it must save the value elsewhere (we do this with AX)
(Source: example "32-bit cdecl" from Stack Overflow Documentation; copyright 2016 by icktoofay and Peter Cordes ; licensed under CC BY-SA 3.0. An archive of the full Stack Overflow Documentation content can be found at archive.org, in which this example is indexed by topic ID 3261 and example ID 11196.)
10. Getting rid of function calls
Now the most interesting part. Just like data, executable code is also stored in memory (completely unrelated to memory for stack) and every instruction has an address.
When not commanded otherwise, CPU executes instructions one after another, in the order they are stored in memory. But we can command CPU to "jump" to another location in memory and execute instructions from there on.
In asm it can be any address, and in more high-level languages like C++ you can only jump to addresses marked by labels (there are workarounds but they are not pretty, to say the least).
Let's take this function (snippet #10.1):
int myAlgo_withCalls(int a, int b) {
int t1 = triple(a);
int t2 = triple(b);
return t1 - t2;
}
And instead of calling tripple
C++ way, do the following:
copy whole tripple
's body inside myAlgo
at myAlgo
entry jump over tripple
's code with goto
when we need to execute tripple
's code, save on the stack address of the code line just after tripple
call, so we can return here later and continue execution (PUSH_ADDRESS
macro below)
jump to address of the tripple
function and execute it to the end (3. and 4. together are CALL
macro)
at the end of the tripple
(after we've cleaned up locals), take return address from the top of the stack and jump there (RET
macro)
Because there is no easy way to jump to particular code address in C++, we will use labels to mark places of jumps. I won't go into detail how macros below work, just believe me they do what I say they do (snippet #10.2):
// pushes the address of the code at label's location on the stack
// NOTE1: this gonna work only with 32-bit compiler (so that pointer is 32-bit and fits in int)
// NOTE2: __asm block is specific for Visual C++. In GCC use https://gcc.gnu.org/onlinedocs/gcc/Labels-as-Values.html
#define PUSH_ADDRESS(labelName) { \
void* tmpPointer; \
__asm{ mov [tmpPointer], offset labelName } \
push(reinterpret_cast(tmpPointer)); \
}
// why we need indirection, read /sf/ask/17360801/
#define TOKENPASTE(x, y) x ## y
#define TOKENPASTE2(x, y) TOKENPASTE(x, y)
// generates token (not a string) we will use as label name.
// Example: LABEL_NAME(155) will generate token `lbl_155`
#define LABEL_NAME(num) TOKENPASTE2(lbl_, num)
#define CALL_IMPL(funcLabelName, callId) \
PUSH_ADDRESS(LABEL_NAME(callId)); \
goto funcLabelName; \
LABEL_NAME(callId) :
// saves return address on the stack and jumps to label `funcLabelName`
#define CALL(funcLabelName) CALL_IMPL(funcLabelName, __LINE__)
// takes address at the top of stack and jump there
#define RET() { \
int tmpInt; \
pop(tmpInt); \
void* tmpPointer = reinterpret_cast(tmpInt); \
__asm{ jmp tmpPointer } \
}
void myAlgo_asm() {
goto my_algo_start;
triple_label:
push(BP);
BP = SP;
SP -= 1;
// stack[BP] == old BP, stack[BP + 1] == return address
stack[BP - 1] = stack[BP + 2] * 3;
AX = stack[BP - 1];
SP = BP;
pop(BP);
RET();
my_algo_start:
push(BP); // SP == 995
BP = SP; // BP == 995; stack[BP] == old BP,
// stack[BP + 1] == dummy return address,
// `a` at [BP + 2], `b` at [BP + 3]
SP -= 2; // SP == 993
push(AX);
push(stack[BP + 2]);
CALL(triple_label);
stack[BP - 1] = AX;
SP -= 1;
pop(AX);
push(AX);
push(stack[BP + 3]);
CALL(triple_label);
stack[BP - 2] = AX;
SP -= 1;
pop(AX);
AX = stack[BP - 1] - stack[BP - 2];
SP = BP; // cleanup locals, SP == 997
pop(BP);
}
int main() {
push(AX);
push(22);
push(11);
push(7777); // dummy value, so that offsets inside function are like we've pushed return address
myAlgo_asm();
assert(myAlgo_withCalls(11, 22) == AX);
SP += 1; // pop dummy "return address"
SP += 2;
pop(AX);
}
Notes:
10a. because return address is stored on the stack, in principle we can change it. This is how stack smashing attack works
10b. the last 3 instructions at the "end" of triple_label
(cleanup locals, restore old BP, return) are called function's epilogue
11. Assembly
Now let's look at real asm for myAlgo_withCalls
. To do that in Visual Studio:
set build platform to x86
build type: Debug
set break point somewhere inside myAlgo_withCalls
run, and when execution stops at break point press Ctrl + Alt + D
One difference with our asm-like C++ is that asm's stack operate on bytes instead of ints. So to reserve space for one int
, SP will be decremented by 4 bytes.
Here we go (snippet #11.1, line numbers in comments are from the gist):
; 114: int myAlgo_withCalls(int a, int b) {
push ebp ; create stack frame
mov ebp,esp
; return address at (ebp + 4), `a` at (ebp + 8), `b` at (ebp + 12)
sub esp,0D8h ; reserve space for locals. Compiler can reserve more bytes then needed. 0D8h is hexadecimal == 216 decimal
push ebx ; cdecl requires to save all these registers
push esi
push edi
; fill all the space for local variables (from (ebp-0D8h) to (ebp)) with value 0CCCCCCCCh repeated 36h times (36h * 4 == 0D8h)
; see /sf/ask/17360801/
; I guess that's for ease of debugging, so that stack is filled with recognizable values
; 0CCCCCCCCh in binary is 110011001100...
lea edi,[ebp-0D8h]
mov ecx,36h
mov eax,0CCCCCCCCh
rep stos dword ptr es:[edi]
; 115: int t1 = triple(a);
mov eax,dword ptr [ebp+8] ; push parameter `a` on the stack
push eax
call triple (01A13E8h)
add esp,4 ; clean up param
mov dword ptr [ebp-8],eax ; copy result from eax to `t1`
; 116: int t2 = triple(b);
mov eax,dword ptr [ebp+0Ch] ; push `b` (0Ch == 12)
push eax
call triple (01A13E8h)
add esp,4
mov dword ptr [ebp-14h],eax ; t2 = eax
mov eax,dword ptr [ebp-8] ; calculate and store result in eax
sub eax,dword ptr [ebp-14h]
pop edi ; restore registers
pop esi
pop ebx
add esp,0D8h ; check we didn't mess up esp or ebp. this is only for debug builds
cmp ebp,esp
call __RTC_CheckEsp (01A116Dh)
mov esp,ebp ; destroy frame
pop ebp
ret
And asm for tripple
(snippet #11.2):
push ebp
mov ebp,esp
sub esp,0CCh
push ebx
push esi
push edi
lea edi,[ebp-0CCh]
mov ecx,33h
mov eax,0CCCCCCCCh
rep stos dword ptr es:[edi]
imul eax,dword ptr [ebp+8],3
mov dword ptr [ebp-8],eax
mov eax,dword ptr [ebp-8]
pop edi
pop esi
pop ebx
mov esp,ebp
pop ebp
ret
Hope, after reading this post, assembly doesn't look as cryptic as before :)
Here are links from the post's body and some further reading:
Eli Bendersky, Where the top of the stack is on x86 - top/bottom, push/pop, SP, stack frame, calling conventions
Eli Bendersky, Stack frame layout on x86-64 - args passing on x64, stack frame, red zone
University of Mariland, Understanding the Stack - a really well-written introduction to stack concepts. (It's for MIPS (not x86) and in GAS syntax, but this is insignificant for the topic). See other notes on MIPS ISA Programming if interested.
x86 Asm wikibook, General-Purpose Registers
x86 Disassembly wikibook, The Stack
x86 Disassembly wikibook, Functions and Stack Frames
Intel software developer's manuals - I expected it to be really hardcore, but surprisingly it's pretty easy read (though amount of information is overwhelming)
Jonathan de Boyne Pollard, The gen on function perilogues - prologue/epilogue, stack frame/activation record, red zone
关于堆栈是否在硬件中实现,这篇维基百科文章可能有所帮助.
某些处理器系列(例如x86)具有用于操作当前正在执行的线程的堆栈的特殊指令.其他处理器系列(包括PowerPC和MIPS)没有明确的堆栈支持,而是依赖于约定并将堆栈管理委托给操作系统的应用程序二进制接口(ABI).
该文章及其链接到的其他文章可能有助于了解处理器中的堆栈使用情况.