DISASSEMBLY NAVIGATION : Différence entre versions

De Wiki expérimental
(Page créée avec « In this and the following chapter we cover the heart of what pots the Interactive in DA Pro, which is, in a nutshell, ease of navigation and ease of manipulation. The focu... »)
 
 
(8 révisions intermédiaires par le même utilisateur non affichées)
Ligne 1 : Ligne 1 :
 
In this and the following chapter we cover the heart of what pots the Interactive in DA Pro, which is, in a nutshell, ease of navigation and ease of manipulation. The focus of this chapter is navigation; specifically, we show how IDA facilitates moving around a disassemhly in a logical
 
In this and the following chapter we cover the heart of what pots the Interactive in DA Pro, which is, in a nutshell, ease of navigation and ease of manipulation. The focus of this chapter is navigation; specifically, we show how IDA facilitates moving around a disassemhly in a logical
 
manner. So far, we have shown that ai a basic level IDA simply combines the features of many common reverse engineering tools into an integrated disassembly dïsplay. Navigating around the display is one of the essential skills requïred in order to master IDA. Static dïsassembly listings offer no inherent navigational capabilïty other than scrolling op and down the listing. Evert wïth the best text editors, such dead listings are very difficuit to navigate, as the best they have to offer is generaily nothing more than an integrated, grep-style search. As you shall sec, IDAs database underpïnnïngs provide for exceptional navigational features.
 
manner. So far, we have shown that ai a basic level IDA simply combines the features of many common reverse engineering tools into an integrated disassembly dïsplay. Navigating around the display is one of the essential skills requïred in order to master IDA. Static dïsassembly listings offer no inherent navigational capabilïty other than scrolling op and down the listing. Evert wïth the best text editors, such dead listings are very difficuit to navigate, as the best they have to offer is generaily nothing more than an integrated, grep-style search. As you shall sec, IDAs database underpïnnïngs provide for exceptional navigational features.
 +
 +
Basic IDA Navigation
 +
 +
In your initial experience with IDA, you may be happy to make use of nothing more than the navigational features that IDA bas to offer. In addition to offering fairly standard search features that you are accustomed to front your use of text editors or word processors, IDA develops and displays a comprehensive list of cross-references that behave in a manner similar to hyperlinks on a web page. The end result is that, in most cases, navigating to locations of interest requires nothing more than a double-click.
 +
 +
Double-Click Navigation
 +
 +
When a program is disassembled, every location in the program is assigned a virtual address. As a result, we can navigate anywhere within a program by providing the virtual address of the location we are interested in visiting. Unfortunately for us, maintaining a catalog of addresses in our head is not a trivial task. This fact rnotivated early programmers to assign symbolic names to program locations that they wished to reference, making things a whole lot casier on themselves. The assïgnment of symbolic names to program addresses was not unlike the assignment of mnemonïc instruction names to program opcodes; programs became easïer to read and write by making them casier to rernember.
 +
 +
As we dïscussed previously, IDA generates symbolic names during the analysis phase by examining a binarys symbol table or by autornatically generating a name based on how a location is referenced within the binary. In addition to its symbolic purpose, any name dïsplayed in the dïsassembly window is a potential navigation target similar to a hyperlïnk on a web page. The two differences between these names and standard hyperlïnks are (1) that the names are neyer highlighted in any way to indicate that they can be followed and (2) that IDA requires a double-click to follow rather than the single-click requïred by a hyperlink. We have already seen the use of names in various subwïndows such as the Functions, Imports, and Exports windows. RecalI that for each of these windows, double-clicking a name caused the disassembly view tojump to the referenced location. This is one example of the double-click navigation at work. In the following listing, each of the symbols labeled O represents a named navigational target. Double-clicking any of them will cause IDA to relocate the display to the selected location.
 +
 +
For navigational purposes, IDA treats two additional display entities as navigational targets. Fïrst, cross-references (shown at O here) are treated as navigational targets. Cross-references are generaily formated as a name and a hex offset. The cross-reference at the right of loc_40134E in the previous listing refers to a location that is 4D16 or 771, bytes beyond the start of sub_4012E4. Double-clicking the cross-reference text will jump the display to the referencing location (00401331 in this case). Cross-references are covered in more detail in Chapter 9.
 +
 +
The second type of display entity afforded special treatment in a navigational sense is one that uses hexadecimal values. If a displayed hexadecimal value represents a valid virtual address within the binary, then double-clicking the value will reposition the dïsassembly window to display the selected virtual address. In the listing that follows, double-clicking any of the values indicated by O willjump the display, because each isa valid virtual address within the given binary, whïle double-clicking any of the values indicated by O will have no effect.
 +
 +
A final note about double-click navigation concerns the IDA Output window, which is most often used to display informational messages. When a navigational target, as previously described, appears as the first item in a message, double-clicking the message will jump the display to the indicated target.
 +
 +
In the Output window excerpt just shown, the two messages indicated by O can be used to navigate to the addresses indicated at the start of the respective messages. Double-chckïng any of the other messages, including those at O, will result in no action at ail.
 +
 +
Jump to Address
 +
 +
Occasionally, you will know exactly what address you would like to navigate to, yet no name will be handy in the disassembly window to offer simple double-click navigation. In such a case, you have a few options. The first, and most primitive, option is to use the disassembly window scroll bar to scroll the display up or down until the desired location cornes into view. This is usually feasible only when the location you are navigating to is known by ils virtual address, since the disassembly window is organized linearly by vicinal address. If ail you know is a named location such as a subroutine named foobar, then navigating via the scroll bar becomes something of a needle-in-a-haystack search. At that point, you might choose to sort the Fonctions window alpha betically, scroll to the desired name, and double-click the name. A third option is to use one of IDAs search features available via the Search menu, which typ ically involves specïfying some search criteria before asking IDA to perform a search. In the case of searching for a known location, ibis is usually overkill.
 +
 +
Ultimately, the easïest way to get to a known disassembly location is to make use of the Jump to Address dialog shown in Figure 6-1.
 +
 +
The Jump to Address dialog is accessed via Jump > Jump to Address, or by using the G hotkey while the disassembly window is active. Thïnkïng of this dialog as the Go dialog may help you remember the associated hotkey. Navigating to any location in the binary is as simple as specifyïng the address (a name or hex value will do) and clicking OK, which will ïmmediatelyjump the dïsplay to the desired location. Values entered into the dialog are remembered and made available on subsequent use via a drop-down list. This history feature makes returning to previously requested locations somewhat easier.
 +
 +
Navigation History
 +
 +
If we compare ifiAs document-navigation functions to those of  web browser, we might equate names and addresses to hyperlïnks, as each can be followed relatively easily to view a new location. Another feature IDA shares with traditional web browsers is the concept of forward and backward navigation based on the order in which you navigate the disassembly. Each time you navigate to a new location within a disassembly, your current location is appended to a history list. Two menu operations are available for traversing ibis list. First, Jump > Jump to Previous Position repositions the disassembly to the most recent entry in the history list. The behavior is conceptually identical to a web browsers back button. The assocïated hotkey is ESC, and it is one of the most useful hotkeys that you can commit to memory. Be forewarned, however, that using ESC when any window other than the disassembly window is
 +
active causes the active wïndow to be closed. (You can aiways reopen windows that you closed accïdentally via View > Open Subviews.) Backward navigation is extremely handy when you have followed a chain of function calis several levels deep and you decide that you want to navigate back to your original position within the disassembly.
 +
Jump > Jump to Next Position is the counterpart operation that moves the disassembly window forward in the history lïst in a mariner similar to a web browser's forwardbutton. For the sake of completeness, the associated hotkey for ibis operation is CTRL-ENTER, though it tends to be less useful than using ESC for backward navigation.
 +
 +
Finaily, two of the more useful toolbar buttons, shown in Figure 6-2, provide the familiar
 +
browser-style forward and backward behavior.
 +
 +
Each of the buttons is associated with a drop- down history list that offers you instant access
 +
to any location in the navigation history without having to trace your steps through the entire hst.
 +
 +
Stack Frames
 +
 +
Because IDA Pro is such a low-level analysis toril, many of ils features and displays expect the user to be somewhat familiar with the low-level detaïls of compiled languages, many of which center on the specifïcs of generating machine language and managing the memory used by a hïgh-level program. Therefore, [rom time to time this book covers soute of the theory of compiled programs in order to make sense of the related IDA displays.
 +
 +
One such low-level concept is that of the stack frame. Stack frarnes are blocks of memory allocated within a programs runtime stack and dedicated to a speciflc invocation of a function. Programmers typically group executable statements into nuits called fwictions (also called procedwes, subroutines, or rnethods). In some cases ibis may be a requirement of the language being used. In most cases it is consïdered gond programming practice to buïld programs from such functional units.
 +
When a function is not executing, it typïcally requires little to no memory. When a function is called, however, it may require memory for several reasons. First, the caller of a function may wish to pass information into the function in the form of parameters (arguments), and these parameters need to be stored somewhere the function can find them. Second, the function may need temporary storage space whïle performing ils task. This temporary space is often allocated by a programmer through the declaration of local variables, which can be used within the function but cannot be accessed once the function bas completed.
 +
 +
Compilers utilize stack [rames (also called activation records) to make the allocation and deallocation of function parameters and local variables transparent to the programmer. A compiler inserts code to place a functions parameters into the stack frame prior to transferring control to the function itself, at which point the compiler inserts code to allocate enough memory to
 +
hold the functions local variables. As a consequence of the way stack [rames are constructed, the address to which the fonction should return is also stored within the new stack [rame. A pleasant result of the use of stack [rames is that recursion becomes possible, as each recursive cali to a fonction is given its own stack frame, neatly segregating each cali from its predecessor. The [ollowing steps detail the operations that take place when a fonction is called:
 +
 +
1. The caller places any parameters required by the fonction being called into locations as dictated by the calling convention (sec "Calling Conventions" on page 85) employed by the called fonction. This operation may result in a change to the program stack pointer if parameters are placed on the runtime stack.
 +
 +
2. The caller transfers control to the fonction being called. This is usually performed with an instruction such as the x86 CAL  or the MIPS JAL. A return address is typically saved onto the program stack or in a CPU register.
 +
 +
3. I[ necessary, the called fonction takes steps to configure a frame pointer' and saves any register values that the caller expects to remain unchanged.
 +
 +
4. The called fonction allocates space [or any local variables that it may require. This is often done by adjusting the program stack pointer to reserve space on the runtime stack.
 +
 +
5. The called fonction per[orms its operations, potentially generating a result. In the course of per.forming its operations, the called fonction may access the parameters passed to it by the calling fonction. If the function returns a result, the result is o[ten placed into a specific register or registers that the caller can examine once the fonction returns.
 +
 +
6. Once the fonction bas completed its operations, any stack space reserved [or local variables is released. This is o[ten done by reversing the actions performed in step 4.
 +
 +
7. Any registers whose values were saved (in step 3) on behalf of the caller are restored to their original values. This includes the restoration of the caller's [rame pointer register.
 +
 +
8. The called fonction returns control to the caller. Typical instructions for this include the x86 RET and the MIPS JR instructions. Depending on the calling convention in use, this operation may also serve to clear one or more parameters from the program stack.
 +
 +
9. Once the caller regains control, it may need to remove parameters from the program stack. In such cases a stack adj ustment may be required to restore the program stack pointer to the value that it held prior to step 1.
 +
 +
Steps 3 and 4 are SO commonly performed upon entry to a function that together they are called the functions prologue. Similarly, steps 6 through 8 are su frequently performed ai the end of a function that together they make up the functions epilogue. With the exception of step 5, which represents the body of the function, ail of these operations constitute the overhead associated with calling a function.
 +
 +
Calling Conventions
 +
 +
With a basic understanding of what stack frames are, we can take a doser look at exactly how they are structured. The examples that foilow reference the x86 architecture and the behavior associated with commun x86 compilers such as Microsoft Visuai C/C++ or GNU's gcc/g++. One of the most important steps in the creation of a stack frame involves the placement of function parameters onto the stack by the calling function. The calling function must store parameters exactly as the function being called expects to find them; otherwïse, serions problems can arise. Fonctions advertise the manner in which they expect to receive their arguments by seiecting and adhering to a specific calling convention.
 +
A caiing convention dictates exactly where a caller should place any parameters that a function requires. Calling conventions may require parameters to be placed in specific registers, on the program stack, or in both registers and on the stack. Equally important to when parameters are passed on the program stack is determining who is responsibie for removing them [rom the stack once the called function bas completed. Some calling conventions dictate that the caller is responsible for removing parameters that it placed on the stack, whïle other calling conventions dictate that the called function wili take care of removing the parameters [rom the stack. Adherence to pubhcized calling conventions is essentiai in maïntaining the integrity of the program stack pointer.
 +
 +
The C Calling Convention
 +
 +
The default calling convention used by most C compiiers for the x86 architecture is called the C calling convention. The _cdecl modifier may be used by C/C++ programs to force compiiers to utïhze the C calling convention when the default calling convention may have been overridden. We will refer to this calling convention as the cdecl calling convention front here on. The cdecl calling convention specifies that the caller place parameters to a function on the stack in rïght-to-left order and that the calier (as opposed to the calice) remove the parameters front the stack after the called function completes.
 +
 +
One resuit of placing parameters on the stack in right-to-ieft order is that the leftmost (first) parameter of the function wili aiways be on the top of the stack when the function is called. This makes the fïrst parameter easy to flnd regardless of the number of parameters the function expects, and it makes the cdecl calling convention ideally suïted for use with functions that can take a variable number of arguments (such as printf).
 +
 +
Requiring the calling function to rernove parameters front the stack means that you will often see instructions that make an adjustment to the program stack pointer ïmrnedïately following the return front a called function. In the case of fonctions that can accept a variable number of arguments, the caller is ideally suited to make this adjustment, as the caller knows exactly how many arguments it bas chosen to pass to the function and can easily make the correct adjustment, whereas the called function neyer knows ahead of time how many parameters it may receive and would have a difficuit tirne making the necessary stack adjustment.
 +
In the following examples we consider calis to a function having the following prototype:
 +
void demo cdecl(int w, int x, int y, int z);
 +
By default, this function will use the cdecl calling convention, expecting the four parameters to be pushed in rïght-to-left order and requiring the caller to clean the parameters off the stack. A compiler mïght generate code for a call to this function as foliows:
 +
 +
        democdecl(1, 2, 3, 4) //programmer calis demo_cdecl
 +
O push 4 push parameter z
 +
push 3 push parameter y
 +
push 2 push parameter X
 +
push 1 push parameter w
 +
cali democdecl cali the function
 +
        add esp, 16 adjust esp to its former value
 +
 +
The four push operations beginning at O resuit in a net change to the program stack pointer (ESP) of 16 bytes ( * sizeof(int) on a 32-bit architecture), whïch is undone at O following the return front democdecl. If dernocdecl is called 50 times, each calI will be followed by an adjustment similar to that at O. The following example also adheres to the cdecl calling convention while eliminating the need for the caller to explïcitly clean parameters off the stack following each cali to derno_cdecl.
 +
democdecl(1, 2, 3, 4) //programmer calis demo_cdecl
 +
 +
mov [esp+12], 4 ; move parameter z to fourth position on stack
 +
mov [esp+8], 3 ; move parameter y to third position on stack
 +
mov [esp+4], 2 ; move parameter X to second position on stack
 +
mov [esp], 1 ; move parameter w to top of stack
 +
cali democdecl ; cail the function
 +
 +
In this example, the compiler bas preallocated storage space for the parameters to demo_cdecl at the top of the stack during the function prologue. When the parameters for derno_cdecl are placed on the stack, there is no change to the program stack pointer, which eliminates the need to adjust the stack pointer when the calI to demo_cdecl completes. The GNU compilers (gcc and g++) utilize this technique to place function parameters onto the stack.
 +
 +
Note that either method results in the stack pointer pointing to the leftmost argument when the function is called.
 +
 +
The Standard Calling Convention
 +
 +
Standard in this case isa bit of  mïsnomer as it isa narne that Microsoft created for ils own calling convention rnarked by the use of the _stdcall modifier in a function declaration, as shown here:
 +
 +
void stdcall demo stdcall(int w, int x, int y)
 +
 +
In order to avoid any confusion surrounding the word standard, we will refer to this calling convention as the stdcall calling convention for the remainder of the book.
 +
As with the cdecl calling convention, stdcall requires that function parameters be placed on the program stack in right-to-left order. The difference when using stdcall is that the called function is responsible for clearing the function parameters from the stack when the function bas finïshed. In order fora function to do this, the function must know exactly how many parameters are on the stack. This is possible oniy for fonctions that accept a fïxed number of parameters. As a resuit, variable argument fonctions such as print-f cannot make use of the stdcall calling convention. The derno_stdcall function, for example, expects three integer parameters, occupying a total of 12 bytes on the stack ( * sizeof(int) on a 32-bit architecture). An x86 compiler can use a special form of the RET instruction to sïmultaneously pop the return address from the top of the stack and add 12 to the stack pointer to clear the function parameters. In the case of derno_stdcall, we rnight see the following instruction used to return to the caller:
 +
 +
ret 12 return and clear 12 bytes from the stack
 +
 +
The prïmary advantage to the use of stdcall is the elirnination of code to clean parameters off the stack following every function cali, which resuits in slïghtly srnaller, slightly faster programs. By convention Microsoft utilizes the stdcall convention for ail fixed-argument functions exported from shared lïbrary (DLL) files. This is an important point to remember if von are attempting to generate function prototypes or bïnary-compatible replacements for any shared library components.
 +
 +
The fastcall Convention for x86
 +
 +
A variation on the stdcall convention, the -fastcall calling convention passes up to two parameters in CPU registers rather than on the program stack. The Microsoft Visual C/C++ and GNU gcc/g++ (version 3.4 and later) compilers recognize the -fastcall modifier in function declarations. When -fastcall is specïfled, the first two parameters passed to a function will be placed in the ECX and EDX registers, respectïvely. Any remaining parameters are placed on the stack in right-to-left order similar to stdcall. Also similar to stdcall, -fastcall fonctions are responsible for removing parameters [rom the stack when they return to their caller. The following declaration demonstrates the use of the -fastcall modifier.
 +
 +
void -fastcall demo fastcall(int w, int x, int y, int z)
 +
 +
A compiler mïght generate the following code in order to calI
 +
 +
demofastcall:
 +
demo-fastcall(1, 2, 3, 4); //programmer calis demo_fastcall
 +
push 4 move parameter z to second position on stack
 +
push 3 move parameter y to top position on stack
 +
mov edx, 2 move parameter x to edx
 +
mov ecx, 1 move parameter w to ecx
 +
call demo-fastcall cali the -function
 +
 +
Note that no stack adjustrnent is required upon return [rom the call to demo-fastcall, as demo_fastcall is responsible for clearing pararneters y and z [rom the stack as it returns to the caller. It is important to understand that because two arguments are passed in registers, the called fonction needs to clear moly 8 bytes from the stack even though there are four arguments to the fonction.
 +
 +
C++ Calling Conventions
 +
 +
Nonstatic member fonctions in C++ classes differ front standard fonctions in that they must make available the this pointer, which points to the object used to invoke the fonction. The address o[ the object used to invoke the fonction must be supplied by the caller and is therefore provided as a parameter when calling nonstatic member fonctions. The C++ language standard dors not specify how this should be passed to nonstatic member fonctions, so it should corne as no surprise that di[[erent compilers use di[[erent techniques when passing this.
 +
 +
Microsoft Visual C++ offers the thiscali calling convention, which passes this in the ECX register and requires the nonstatic member fonction to clean pararneters off the stack as in stdcall. The GNU g++ compiler treats this as the implïed fïrst parameter to any nonstatic member fonction and behaves in alI other respects as i[ the cdecl convention is being used. Thus, for g++-compiled code, this is placed on top of the stack prion to calling the nonstatic member fonction, and the caller is responsible for removing pararneters (there will always be ai least one) from the stack once the fonction returns. Addïtional features of compiled C++ are discussed in Chapter 8.
 +
 +
Other Calling Conventions
 +
 +
Cornplete coverage of every existing calling convention would require a book in its own right. Calling conventions are often language-, compiler-, and CPU-specific, and some research on your part may be required as you encounter code generated by less-common cornpilers. A few situations deserve special mention, however: optimïzed code, custom assembly language code, and system calls.
 +
 +
When fonctions are exported for use by other programmers (such as Iibrary fonctions), it is important that they adhere to weli-known calling conventions so that programmers can easily interface to those fonctions. On the other hand, if a function is intended for internai program use oniy, then the calling convention used by that function need be known only within that functions program. In such cases, optimizing compiiers may choose to use aiternate calling conventions in order to generate faster code. Instances in which this may cœur include the use of the /GL option wïth Microsoft Visual
 +
C++ and the use of the regparm keyword with GNU gcc/g++.
 +
 +
When programmers go to the trouble of using assembly ianguage, they gain compiete control over how parameters wili be passed to any functions that they happen to create. Unless they wish to make their functions avaiiable to other programmers, assembly language programmers are free to pass parameters in any way they see fit. As a resuit, you may need to take extra care when analyzing custom assembly code. Custom assembly code is often encountered in obfuscation routines and sheilcode.
 +
 +
A system callis a speciai type of function cail used to request an operating system service. System cails usuaHy effect a state transition front user mode to kernel mode in order for the operating system kernel to service the users request. The manner in which system rails are initiated varies across operating systems and CPUs. For example, Linux x86 system cails may be ïnitïated using the int 0x80 instruction or the sysenter instruction, while other x86 operating systems may use oniy the sysenter instruction or alternate ïnterrupt numbers. On many x86 systems (Linux being an exception) parameters for system rails are placed on the runtime stack, and a system calI number is placed in the EAX register immedïateiy prior to initiating the system rail. Linux system calis accept their parameters in specific registers and occasionally in memory when there are more parameters than available registers.
 +
 +
Local Variable Layout
 +
 +
Unlike the calling conventions that dictate the manner in which parameters are passed into a function, there are no conventions that mandate the layout of  functions local variables. When compiling a function, one task a compiler is faced wïth is to compute the amount of space required by a functions local variables. Another task is to determine whether those variables can be allocated in Cpu registers or whether they must be aHocated on the program stack. The exact manner in which these allocations are made is ïrreievant to both the cailer of  function and to any fonctions that may, in turc, be called. Most notably, it is typïcaily impossible to determine a functions local variable layout based on examination of the functions source code.
 +
 +
Stack Frame Examples
 +
 +
Consider the following function compïied on a 32-bit x86-based computer:
 +
 +
void bar(int j, int k); II a -function to cail
 +
void demo stack-frame(int a, int b, int c) { int x
 +
 +
char bu-f-fer[64];
 +
int y
 +
int z
 +
/7 body cf function not terribly relevant other than
 +
bar(z, y);
 +
We compute the minimum amount of stack space required for local variables as 76 bytes (three 4-byte integers and a 64-byte buffer). This fonction could use eïther stdcall or cdecl, and the stack frame will look the saute. Figure 6-3 shows one possible implementation of a stack frame for an invocation of demo stack-frame, assuming that no [rame pointer register is used (thus the stack pointer, ESP, serves as the frame pointer). This frame would be set up on entry to demo_stacktrame with the one-line prologue:
 +
sub esp, 76 ; allocate su-fficient space -for ail local variables
 +
The Offset column indicates the base-i-displacement address required to reference any of the local variables or parameters in the stack [rame.
 +
 +
Variable Offset
 +
saved eip [esp+76]
 +
a [esp+80]
 +
b [esp+84]
 +
c [esp+88]
 +
parameters
 +
esp z [esp] } local variables
 +
y [esp~4]
 +
buffer [esp~8]
 +
x [esp~72]
 +
 +
Generating fonctions that utilize the stack pointer to compute aIl variable references requires a little more effort on the part of the compiler, as the stack pointer changes frequently and the compiler must make sure that proper offsets are used at aIl times when referencing any variables within the stack [rame. Consider the rail made to bar in fonction demo stack-frame, the code [or which is shown here:
 +
 +
push dword [esp+4] push y
 +
push dword [esp+4] push z
 +
call bar
 +
add esp, S ; cdecl requires caller te clear parameters
 +
 +
The push at O correctly pushes local variable y per the offset in Figure 6-3. At first glance if might appear that the push at O incorrectly references local variable y a second time. However, because we are dealing with an ESP-based frame and the push at O modifies ESP, all of the offsets in Figure 6-3 must be temporarily adjusted each time ESP changes. Following O, the new offset for local variable z becomes [esp+4] as correctly referenced in the push at O. When examining fonctions that reference stack frame variables using the stack pointer, you must be careful to note any changes to the stack pointer and adjust all future variable offsets accordingly. One advantage of using the stack pointer to reference ail stack frame variables is that ail other registers remain available for other purposes.
 +
 +
Once demo_stack-frame bas completed, it needs to return to the caller. Ultimately a ret instruction will be used to pop the desired return address off the top of the stack into the instruction pointer register (EIP in this case). Belote the return address can be popped, the local variables need to be removed from the top of the stack so that the stack pointer correctly points to the saved return address when the ret instruction is executed. For this particular fonction the resulting epilogue becomes
 +
 +
add esp, 76 adjust esp to point to the saved return address
 +
ret return to the caller
 +
 +
At the expense of dedicating a register for use as a frame pointer and some code to conflgure the frame pointer on entry to the function, the job of computing local variable offsets can be made easïer. In x86 programs, the EBP (extended base pointer) register is typïcally dedïcated for use as a stack frame pointer. By default, most compilers generate code to use a frame pointer, though options typïcally exïst for specifyïng that the stack pointer should be used instead. GNU gcc/g++, for example, offers the -fornit-frame-pointer compiler option, which generates functions that do not rely on a fïxed-frame pointer register.
 +
 +
In order to see what the stack frame for derno_stack-f rame will look like using a dedicated frame pointer, we need to consider this new prologue code:
 +
 +
push ebp save the callers ebp value
 +
mov ebp, esp make ebp point to the saved register value
 +
sub esp, 76 allocate space -for local variables
 +
 +
The push instruction at O saves the value of EBP currently being used by the caller. Fonctions that adhere to the System V Application Binary Interface for Intel 32-bit Processors2 are allowed to modïfy the EAX, ECX, and EDX registers but are requïred to preserve the callers values for ail other registers. Therefore, if we wïsh to use EBP as a frame pointer, we must save the current value of EBP belote we change il, and we must restore the value of EBP belote we return to the caller. If any other registers need to be saved on behalf of the caller (EST or Et», for example), compilers may choose to save them at the same time EBP is saved, or they may defer saving them until
 +
local variables have been allocated. Thus, there is no standard location within a stack [rame [or the storage of saved registers.
 +
Once EBP bas been saved, it can be changed to point to the current stack location. This is accornplïshed by the mov instruction at O, which copies the current value o[ the stack pointer loto EBP. Finally, as in the non-EBP-based stack frame, space for local variables is allocated at O. The resulting stack [rame layout is shown in Figure 6-4.
 +
 +
Variable Offset
 +
 +
saved eip [ebp+4]
 +
a [ebp+8]
 +
b [ebp+12]
 +
C [ebp+16]
 +
Figure 64: An FBP•based stack frarne
 +
} parameters
 +
z
 +
[ebp-76]
 +
y
 +
[ebp-72]
 +
buffer
 +
[ebp-68]
 +
x
 +
[ebp-4]
 +
esp
 +
local variables
 +
ebp saved ebp [ebp] } saved register(s)
 +
 +
With a dedicated frame pointer, ail variable offsets are cornputed relative to the frame pointer register. It is most often (though not necessarily) the case that positive o[fsets are used to access fonction parameters, while negative o[[sets are required to access local variables. With a dedicated [rame pointer in use, the stack pointer may be freely changed without a[[ecting the offset to any variables within the [rame. The calI to [unction bar can now be implernented as follows:
 +
 +
push dword [ebp-72] push y
 +
push dword [ebp-76] ; push z
 +
cali bar
 +
add esp, S ; cdecl requires caller to clear parameters
 +
 +
The fact that the stack pointer bas changed [ollowing the push at O bas no effect on the access to local variable z in the succeeding push.
 +
 +
Fïnally, the use of  frame pointer necessitates a slightly different epilogue once the fonction completes, as the callers [rame pointer must be restored prior to returning. Local variables must be cleared from the stack before the old value of the [rame pointer can be retrieved, but this is made easy by the fact that the current frame pointer points to the old frame pointer. In x86 programs utilizing EBP as a frame pointer, the following code represents a typical epilogue:
 +
 +
mov esp, ebp clears local variables by reseting esp
 +
POP ebp restore the callers value of ebp
 +
ret pop return address to return to the caller
 +
 +
This operation is so commun that the x86 architecture offers the leave instruction as an abbreviated means of accomplishing the saute task.
 +
 +
leave copies ebp to esp AND then pops into ebp
 +
ret pop return address to return to the caller
 +
 +
While the riantes of registers and instructions used will certainly differ for other processor architectures, the basic process of building stack frames will remain the saute. Regardless of the architecture, you will want to familiarize yoursel[ with typical prologue and epilogue sequences su that you can quïckly move on to analyzing more interesting code within fonctions.
 +
 +
IDA Stack Views
 +
 +
Stack [rames are clearly a runtime concept; a stack [rame cant exist without a stack and without a running program. While this is truc, it doesn't mean that you should ignore the concept of  stack frame when you are performing static analysis with tools such as IDA. Ail of the code required to set up stack frames for each fonction is present within a binary. Through careful analysis of this code, we can gain a detailed understanding of the structure of any functions stack frame even when the fonction is not running. In fact, some of IDA most sophisticated analysis is performed specïfically to determine the layout of stack frames for every function that IDA disassembles. During initial analysis, IDA goes to great lengths to monitor the behavior o[ the the stack pointer over the course o[ a function by making note of every pusb or pop operation along with any arïthmetic operations that may change the stack pointer, such as adding or subtracting constant values. The [irst goal o[ this analysis is to determine the exact size of the local variable area allocated to a functions stack frame. Additional goals include determining whether a dedicated [rame pointer is in use in a given [unction (by recognizing a push ebp/mov ebp, esp sequence, [or example) and recognizing ail memory references to variables within a functions stack frame. For example, i[ IDA noted the 1'ollowing instruction in the body of derno_stackfrarne
 +
 +
mov eax, [ebp+8]
 +
 +
it would understand that the first argument to the fonction (a in ibis case) is being loaded into the EAX register (refer to Figure 6-4). Through careful analysis o[ the stack [rame structure, IDA can distinguïsh between memory references that access fonction arguments (those that lie below the saved return address) and references that access local variables (those that lie above the saved return address). IDA takes the additional step of determining which memory locations within a stack [rame are directly re[erenced. For example, while the stack [rame in Figure 6-4 is 96 bytes in size, there are only seven variables that we are likely to see referenced (four locals and three parameters).
 +
Understanding the behavior of a fonction o[ten comes down to understanding the types o[ data that the fonction manipulates. When reading a dïsassembly listing, one o[ the fïrst opportunities that you will have to understand the data a fonction manipulates is to vïew the breakdown of the [unctions stack [rame. IDA offers two views into any functions stack [rame: a summary vïew and a detail view. In order to understand these two views, we will refer to the [ollowing version o[ demo_stackframe, which we have compiled using gcc.
 +
 +
void demo stack-frame(int a, int b, int c) {
 +
int X = C;
 +
char bu-f-fer[64];
 +
int y = b;
 +
int z = 10;
 +
bu-f-fer[o] = Â;
 +
bar(z, y);
 +
 +
In this example, local variables x and y are initialized [rom parameters c and b, respectively. Local variable z is initialized with the constant value 10, and the [irst character in the 64-byte local array, named butter, is ïnitialïzed to the letter A.
 +
 +
There are many points to cover in this listing as we begin to acquaint ourselves with IDAs disassembly notation. We begin ai O by noting that IDA believes this fonction uses the EBP register as a frame pointer based on analysis of the fonction prologue. At O we learn that gcc bas allocated 120 bytes (18h equates to 120) of local variable space in the stack [rame. This includes 8 bytes for passing the two parameters to bar ai O, but it is still far greater than the 76 bytes we had estimated prevïously and dernonstrates that compilers occasionally pad the local variable space with extra bytes in order to ensure a particular alïgnment within the stack [rame. Beginning ai O, IDA provides a summary stack view that lists every variable that is directly referenced within the stack frarne, along with the variables size and offset distance from the [rame pointer.
 +
 +
IDA assigns names to variables based on their location relative to the saved return address. Local variables lie above the saved return address, while fonction parameters lie below the saved return address. Local variable narnes are derived using the var_ prefïxjoined with a hexadecimal suffix that indicates the distance, in bytes, that the variable lies above the saved [rame pointer. Local variable var C, in this case, is a 4-byte (dword) variable that lies 12 bytes above the saved frame pointer ([ebp-och]). Fonction parameter riantes are generated using the arg_ prefix combined with a hexadecimal suffix that represents the relative distance from the topmost parameter. Thus the topmost 4-byte parameter would be named arg_o, whïle successive parameters would be named arg_4, arg_8, arg_C, and so on. In this particular example arg_o is not lïsted because the fonction makes no use of parameter a. Because IDA [ails to locate any memory reference to [ebp+8] (the location of the first parameter), arg_o is ont listed in the summary stack view. A quick scan of the summary stack view reveals that there are many stack locations that IDA bas faïled to riante because no direct references to those locations exist in the program code.
 +
 +
NOTE The only stack variables that IDA w]]] autornatical]y generate narnes for are those that are dhect]y referenced within a function.
 +
 +
An important difference between IDAs disassembly listing and the stack [rame analysis that we performed earlïer is the fact that nowhere in the disassembly listing do we see memory references similar to [ebp-12]. Instead,
 +
IDA bas replaced ail constant offsets with symbolic names corresponding to the symbols in the stack vïew and their relative offsets from the stack frame pointer. This is in keeping with IDAs goal of generating a hïgher-levei disassembly. It is sïmply casier to deal with symbohc names than numeric constants. In fact, as we wili sec iater, IDA ailows us to change the names of any stack variable to whatever we wish, making the names that much easïer for us to remember. The summary stack view serves as a map from IDA-generated names to their corresponding stack frame offsets. For example, where the memory reference [ebp+arg_8] appears in the disassembiy, [ebp+ioh] or [ebp+16] could be used instead. If you prefer numeric offsets, IDA will happily show them to you. Right-chcking arg_8 at O yïeids the context-sensitive menu shown in Figure 6-5, which contains several options to change the dïsplay format.
 +
 +
In this example, since we have source code availabie for comparison, we can map the IDA-generated variable names back to the narres used in the original source using a variety of dues available in the disassembly.
 +
 +
1. Fïrst, demo_stack-frame takes three parameters: a, b, and c. These correspond to variables arg_o, arg_4, and arg_8 respectïveiy (though arg_o is missing in the disassembly because it is neyer referenced).
 +
 +
2. Local variable x is ïnitiaiïzed from parameter c. Thus var_C corresponds to x since it is initiahzed from arg_8 at 0.
 +
 +
3. Sïmiiarly, local variable y is ïnïtialïzed from parameter b. Thus, varsC corresponds to y since it is ïnitïalïzed from arg_4 at O.
 +
 +
4. Local variable z corresponds to var_60 since it is inïtialized with the value 10 at O.
 +
 +
5. The 64-byte character array buffer begins at var-58 since bu-f-fer[o] is initialized with A (ASCII 0x41) at O.
 +
 +
6. The two arguments for the rail to bar are moved into the stack at O rather than being pushed onto the stack. This is typical of current versions of gcc (versions 3.4 and later). IDA recognizes this convention and elects not to create local variable references for the two items at the top of the stack frame.
 +
 +
In addition to the summary stack view, IDA offers a detailed stack [rame view in which every byte allocated to a stack frame is accounted for. The detailed view is accessed by double-clicking any variable trame associated with a given stack frame. Double-clicking var_C in the previous listing would bring up the stack frame view shown in Figure 6-6 (ESC closes the wïndow).
 +
 +
Because the detailed view accounts for every byte in the stack frame, it occupies signïficantly more space than the summary view, which lists moly referenced variables. The portion of the stack [rame shown in Figure 6-6 spans a total of 32 bytes, which represents only a small portion of the entire stack frame. Note that no riantes are assïgned to bytes that are not referenced directly within the fonction. For example, parameter a, corresponding to arg_o, was neyer referenced within derno_ stack -frame. With no memory reference to analyze, IDA opts to do nothing wïth the corresponding bytes in the stack frame, which occupy offsets +00000008 through +0000000B. On the other hand, arg_4 was directly referenced at O in the dïsassembly listing, where its contents were loaded into the 32-bit EAX register. Based on the fact that 32 bits of data were moved, IDA is able to infer that the arg_4 is a 4-byte quantïty and labels it as such (db defines 1 byte of storage; dw defines 2 bytes of storage, also called a wont and dd defines 4 bytes of storage, also called a double won!).
 +
 +
Two special values shown in Figure 6-6 are" s" and" r" (each starts with a leading space). These pseudo variables are IDA's special representation of the saved return address (" r") and the saved register value (s) (" s" representing only EBP in this example). These values are included in the stack frame view for completeness, as every byte in the stack frame is accounted for.
 +
Stack frame view offers a detaïled look at the inner workings of compilers. In Figure 6-6 it is clear that the compiler bas inserted 8 extra bytes between the saved frame pointer s" and the local variable x (var_C). These bytes occupy offsets -0000000i through -00000008 in the stack frame. Further, a lit-de math performed on the offset assocïated with each variable listed in the summary view reveals that the compiler bas allocated 76 (rather than 64 per the source code) bytes to the character buffer at var _58. Unless you happen to be a compiler writer yourself or are wïllïng to dïg deep into the source code for gcc, ail you can do is speculate as to why these extra bytes are allocated in this mariner. In most cases we can chalk up the extra bytes to padding for alignment, and usually the presence of these extra bytes bas no impact on a program's behavior. After ail, if a programmer asks for 64 bytes and is given 76, the program should behave no dïfferently, especially since the programmer shouldn't be using more than the 64 bytes requested. On the other hand, if you happen to be an exploit developer and learn that it is possible to overflow this particular buffer, then you mïght be very interested in the fact that nothing interesting can even begin to happen outil you have supplied at least 76 bytes, which is the effective size of the buffer as far as the compiler is concerned. In Chapter 8 we will return to the stack frame view and ils uses in dealing with more complex datatypes such as arrays and structures.
 +
 +
Searching the Database
 +
 +
IDA makes it easy to navigate to things that you know about and designs many of ils data displays to summarize specific types of information (trames, strings, imports, and so on), making them easy to find as well. However, what features are offered to help you conduct more general searches through your databases? If you take time to revïew the contents of the Search menu, you will find a long lïst of options, the majorïty of which take you to the next item in some category. For example, Search > Next Code moves the cursor to the next location containing an instruction. You may also wish to familiarize yourself with the options available on the Jump menu. For many of these, you are presented with a lïst of locations to choose front. Jump > Jump to Function, for example, brïngs up a lïst of aIl fonctions, allowing you to quïckly choose one and navigate to it. While these canned search features may often be useful, two types of general-purpose searches are worth more detailed discussion: text searches and binary searches.
 +
 +
Text Searches
 +
 +
IDA text searches amount to substrïng searches through the disassembly listing view. Text searches are ïnïtïated via Search > Text (hotkey: ALTT), which opens the dialog shown in Figure 6-7. A number of seif-explanatory options dictate specific detaïls concerning the search to be performed. As shown, POSIX-style regular expressions are permïtted. The Identifier search is somewhat mïsnamed. In reality it restricts the search to [md whole words only and can match any whole word on an assembly une, including opcode mnemonïcs or constant values. An Identifier search for 401116 would [ail to find a symbol named 1OC_401116.
 +
 +
Selecting Find ail occurences causes the search results to be opened in a new window, allowing easy navigation to any single match of the search criteria. Finaily, the previous search can be repeated to locate the next match using CTRL-T or Search > Next Text.
 +
 +
The Case-sensitive option can be a cause of confusion. For string searches it is fairly straightforward; a search for "hello" will successfully find HELLO" if Case-sensitive is not selected. Things get a littie interesting if you perform a hex search and leave Case-sensitive unchecked. If you conduct a caseinsensitive search for E9 41 C3, you may be surprised when your search matches E9 61 C3. The two strings are considered to match because 0x41 corresponds to the character A whïle 0x61 corresponds to a. So, even though you have specïfled a hex search, 0x41 is considered equivalent to 0x61 because you failed to specïfy a case-sensitive search.
 +
 +
Searching for subsequent matches for binary data is done using CTRL-B or Search > Next Sequence of Bytes. Finaily, it is not necessary to conduct your binary searches from within the Hex Vïew window. IDA allows you to specify binary search criteria while the disassembly view is active, in which case a successful search will jump the disassembly window to the location whose underlying bytes match the specïfïed search criteria.

Version actuelle en date du 15 août 2019 à 21:36

In this and the following chapter we cover the heart of what pots the Interactive in DA Pro, which is, in a nutshell, ease of navigation and ease of manipulation. The focus of this chapter is navigation; specifically, we show how IDA facilitates moving around a disassemhly in a logical manner. So far, we have shown that ai a basic level IDA simply combines the features of many common reverse engineering tools into an integrated disassembly dïsplay. Navigating around the display is one of the essential skills requïred in order to master IDA. Static dïsassembly listings offer no inherent navigational capabilïty other than scrolling op and down the listing. Evert wïth the best text editors, such dead listings are very difficuit to navigate, as the best they have to offer is generaily nothing more than an integrated, grep-style search. As you shall sec, IDAs database underpïnnïngs provide for exceptional navigational features.

Basic IDA Navigation

In your initial experience with IDA, you may be happy to make use of nothing more than the navigational features that IDA bas to offer. In addition to offering fairly standard search features that you are accustomed to front your use of text editors or word processors, IDA develops and displays a comprehensive list of cross-references that behave in a manner similar to hyperlinks on a web page. The end result is that, in most cases, navigating to locations of interest requires nothing more than a double-click.

Double-Click Navigation

When a program is disassembled, every location in the program is assigned a virtual address. As a result, we can navigate anywhere within a program by providing the virtual address of the location we are interested in visiting. Unfortunately for us, maintaining a catalog of addresses in our head is not a trivial task. This fact rnotivated early programmers to assign symbolic names to program locations that they wished to reference, making things a whole lot casier on themselves. The assïgnment of symbolic names to program addresses was not unlike the assignment of mnemonïc instruction names to program opcodes; programs became easïer to read and write by making them casier to rernember.

As we dïscussed previously, IDA generates symbolic names during the analysis phase by examining a binarys symbol table or by autornatically generating a name based on how a location is referenced within the binary. In addition to its symbolic purpose, any name dïsplayed in the dïsassembly window is a potential navigation target similar to a hyperlïnk on a web page. The two differences between these names and standard hyperlïnks are (1) that the names are neyer highlighted in any way to indicate that they can be followed and (2) that IDA requires a double-click to follow rather than the single-click requïred by a hyperlink. We have already seen the use of names in various subwïndows such as the Functions, Imports, and Exports windows. RecalI that for each of these windows, double-clicking a name caused the disassembly view tojump to the referenced location. This is one example of the double-click navigation at work. In the following listing, each of the symbols labeled O represents a named navigational target. Double-clicking any of them will cause IDA to relocate the display to the selected location.

For navigational purposes, IDA treats two additional display entities as navigational targets. Fïrst, cross-references (shown at O here) are treated as navigational targets. Cross-references are generaily formated as a name and a hex offset. The cross-reference at the right of loc_40134E in the previous listing refers to a location that is 4D16 or 771, bytes beyond the start of sub_4012E4. Double-clicking the cross-reference text will jump the display to the referencing location (00401331 in this case). Cross-references are covered in more detail in Chapter 9.

The second type of display entity afforded special treatment in a navigational sense is one that uses hexadecimal values. If a displayed hexadecimal value represents a valid virtual address within the binary, then double-clicking the value will reposition the dïsassembly window to display the selected virtual address. In the listing that follows, double-clicking any of the values indicated by O willjump the display, because each isa valid virtual address within the given binary, whïle double-clicking any of the values indicated by O will have no effect.

A final note about double-click navigation concerns the IDA Output window, which is most often used to display informational messages. When a navigational target, as previously described, appears as the first item in a message, double-clicking the message will jump the display to the indicated target.

In the Output window excerpt just shown, the two messages indicated by O can be used to navigate to the addresses indicated at the start of the respective messages. Double-chckïng any of the other messages, including those at O, will result in no action at ail.

Jump to Address

Occasionally, you will know exactly what address you would like to navigate to, yet no name will be handy in the disassembly window to offer simple double-click navigation. In such a case, you have a few options. The first, and most primitive, option is to use the disassembly window scroll bar to scroll the display up or down until the desired location cornes into view. This is usually feasible only when the location you are navigating to is known by ils virtual address, since the disassembly window is organized linearly by vicinal address. If ail you know is a named location such as a subroutine named foobar, then navigating via the scroll bar becomes something of a needle-in-a-haystack search. At that point, you might choose to sort the Fonctions window alpha betically, scroll to the desired name, and double-click the name. A third option is to use one of IDAs search features available via the Search menu, which typ ically involves specïfying some search criteria before asking IDA to perform a search. In the case of searching for a known location, ibis is usually overkill.

Ultimately, the easïest way to get to a known disassembly location is to make use of the Jump to Address dialog shown in Figure 6-1.

The Jump to Address dialog is accessed via Jump > Jump to Address, or by using the G hotkey while the disassembly window is active. Thïnkïng of this dialog as the Go dialog may help you remember the associated hotkey. Navigating to any location in the binary is as simple as specifyïng the address (a name or hex value will do) and clicking OK, which will ïmmediatelyjump the dïsplay to the desired location. Values entered into the dialog are remembered and made available on subsequent use via a drop-down list. This history feature makes returning to previously requested locations somewhat easier.

Navigation History

If we compare ifiAs document-navigation functions to those of web browser, we might equate names and addresses to hyperlïnks, as each can be followed relatively easily to view a new location. Another feature IDA shares with traditional web browsers is the concept of forward and backward navigation based on the order in which you navigate the disassembly. Each time you navigate to a new location within a disassembly, your current location is appended to a history list. Two menu operations are available for traversing ibis list. First, Jump > Jump to Previous Position repositions the disassembly to the most recent entry in the history list. The behavior is conceptually identical to a web browsers back button. The assocïated hotkey is ESC, and it is one of the most useful hotkeys that you can commit to memory. Be forewarned, however, that using ESC when any window other than the disassembly window is active causes the active wïndow to be closed. (You can aiways reopen windows that you closed accïdentally via View > Open Subviews.) Backward navigation is extremely handy when you have followed a chain of function calis several levels deep and you decide that you want to navigate back to your original position within the disassembly. Jump > Jump to Next Position is the counterpart operation that moves the disassembly window forward in the history lïst in a mariner similar to a web browser's forwardbutton. For the sake of completeness, the associated hotkey for ibis operation is CTRL-ENTER, though it tends to be less useful than using ESC for backward navigation.

Finaily, two of the more useful toolbar buttons, shown in Figure 6-2, provide the familiar browser-style forward and backward behavior.

Each of the buttons is associated with a drop- down history list that offers you instant access to any location in the navigation history without having to trace your steps through the entire hst.

Stack Frames

Because IDA Pro is such a low-level analysis toril, many of ils features and displays expect the user to be somewhat familiar with the low-level detaïls of compiled languages, many of which center on the specifïcs of generating machine language and managing the memory used by a hïgh-level program. Therefore, [rom time to time this book covers soute of the theory of compiled programs in order to make sense of the related IDA displays.

One such low-level concept is that of the stack frame. Stack frarnes are blocks of memory allocated within a programs runtime stack and dedicated to a speciflc invocation of a function. Programmers typically group executable statements into nuits called fwictions (also called procedwes, subroutines, or rnethods). In some cases ibis may be a requirement of the language being used. In most cases it is consïdered gond programming practice to buïld programs from such functional units. When a function is not executing, it typïcally requires little to no memory. When a function is called, however, it may require memory for several reasons. First, the caller of a function may wish to pass information into the function in the form of parameters (arguments), and these parameters need to be stored somewhere the function can find them. Second, the function may need temporary storage space whïle performing ils task. This temporary space is often allocated by a programmer through the declaration of local variables, which can be used within the function but cannot be accessed once the function bas completed.

Compilers utilize stack [rames (also called activation records) to make the allocation and deallocation of function parameters and local variables transparent to the programmer. A compiler inserts code to place a functions parameters into the stack frame prior to transferring control to the function itself, at which point the compiler inserts code to allocate enough memory to hold the functions local variables. As a consequence of the way stack [rames are constructed, the address to which the fonction should return is also stored within the new stack [rame. A pleasant result of the use of stack [rames is that recursion becomes possible, as each recursive cali to a fonction is given its own stack frame, neatly segregating each cali from its predecessor. The [ollowing steps detail the operations that take place when a fonction is called:

1. The caller places any parameters required by the fonction being called into locations as dictated by the calling convention (sec "Calling Conventions" on page 85) employed by the called fonction. This operation may result in a change to the program stack pointer if parameters are placed on the runtime stack.

2. The caller transfers control to the fonction being called. This is usually performed with an instruction such as the x86 CAL or the MIPS JAL. A return address is typically saved onto the program stack or in a CPU register.

3. I[ necessary, the called fonction takes steps to configure a frame pointer' and saves any register values that the caller expects to remain unchanged.

4. The called fonction allocates space [or any local variables that it may require. This is often done by adjusting the program stack pointer to reserve space on the runtime stack.

5. The called fonction per[orms its operations, potentially generating a result. In the course of per.forming its operations, the called fonction may access the parameters passed to it by the calling fonction. If the function returns a result, the result is o[ten placed into a specific register or registers that the caller can examine once the fonction returns.

6. Once the fonction bas completed its operations, any stack space reserved [or local variables is released. This is o[ten done by reversing the actions performed in step 4.

7. Any registers whose values were saved (in step 3) on behalf of the caller are restored to their original values. This includes the restoration of the caller's [rame pointer register.

8. The called fonction returns control to the caller. Typical instructions for this include the x86 RET and the MIPS JR instructions. Depending on the calling convention in use, this operation may also serve to clear one or more parameters from the program stack.

9. Once the caller regains control, it may need to remove parameters from the program stack. In such cases a stack adj ustment may be required to restore the program stack pointer to the value that it held prior to step 1.

Steps 3 and 4 are SO commonly performed upon entry to a function that together they are called the functions prologue. Similarly, steps 6 through 8 are su frequently performed ai the end of a function that together they make up the functions epilogue. With the exception of step 5, which represents the body of the function, ail of these operations constitute the overhead associated with calling a function.

Calling Conventions

With a basic understanding of what stack frames are, we can take a doser look at exactly how they are structured. The examples that foilow reference the x86 architecture and the behavior associated with commun x86 compilers such as Microsoft Visuai C/C++ or GNU's gcc/g++. One of the most important steps in the creation of a stack frame involves the placement of function parameters onto the stack by the calling function. The calling function must store parameters exactly as the function being called expects to find them; otherwïse, serions problems can arise. Fonctions advertise the manner in which they expect to receive their arguments by seiecting and adhering to a specific calling convention. A caiing convention dictates exactly where a caller should place any parameters that a function requires. Calling conventions may require parameters to be placed in specific registers, on the program stack, or in both registers and on the stack. Equally important to when parameters are passed on the program stack is determining who is responsibie for removing them [rom the stack once the called function bas completed. Some calling conventions dictate that the caller is responsible for removing parameters that it placed on the stack, whïle other calling conventions dictate that the called function wili take care of removing the parameters [rom the stack. Adherence to pubhcized calling conventions is essentiai in maïntaining the integrity of the program stack pointer.

The C Calling Convention

The default calling convention used by most C compiiers for the x86 architecture is called the C calling convention. The _cdecl modifier may be used by C/C++ programs to force compiiers to utïhze the C calling convention when the default calling convention may have been overridden. We will refer to this calling convention as the cdecl calling convention front here on. The cdecl calling convention specifies that the caller place parameters to a function on the stack in rïght-to-left order and that the calier (as opposed to the calice) remove the parameters front the stack after the called function completes.

One resuit of placing parameters on the stack in right-to-ieft order is that the leftmost (first) parameter of the function wili aiways be on the top of the stack when the function is called. This makes the fïrst parameter easy to flnd regardless of the number of parameters the function expects, and it makes the cdecl calling convention ideally suïted for use with functions that can take a variable number of arguments (such as printf).

Requiring the calling function to rernove parameters front the stack means that you will often see instructions that make an adjustment to the program stack pointer ïmrnedïately following the return front a called function. In the case of fonctions that can accept a variable number of arguments, the caller is ideally suited to make this adjustment, as the caller knows exactly how many arguments it bas chosen to pass to the function and can easily make the correct adjustment, whereas the called function neyer knows ahead of time how many parameters it may receive and would have a difficuit tirne making the necessary stack adjustment. In the following examples we consider calis to a function having the following prototype: void demo cdecl(int w, int x, int y, int z); By default, this function will use the cdecl calling convention, expecting the four parameters to be pushed in rïght-to-left order and requiring the caller to clean the parameters off the stack. A compiler mïght generate code for a call to this function as foliows:

       democdecl(1, 2, 3, 4)	//programmer calis demo_cdecl

O push 4 push parameter z push 3 push parameter y push 2 push parameter X push 1 push parameter w cali democdecl cali the function

       add	esp, 16	adjust esp to its former value

The four push operations beginning at O resuit in a net change to the program stack pointer (ESP) of 16 bytes ( * sizeof(int) on a 32-bit architecture), whïch is undone at O following the return front democdecl. If dernocdecl is called 50 times, each calI will be followed by an adjustment similar to that at O. The following example also adheres to the cdecl calling convention while eliminating the need for the caller to explïcitly clean parameters off the stack following each cali to derno_cdecl. democdecl(1, 2, 3, 4) //programmer calis demo_cdecl

mov	[esp+12], 4	; move parameter z to fourth position on stack
mov	[esp+8], 3	; move parameter y to third position on stack
mov	[esp+4], 2	; move parameter X to second position on stack
mov	[esp], 1	; move parameter w to top of stack
cali	democdecl ; cail the function

In this example, the compiler bas preallocated storage space for the parameters to demo_cdecl at the top of the stack during the function prologue. When the parameters for derno_cdecl are placed on the stack, there is no change to the program stack pointer, which eliminates the need to adjust the stack pointer when the calI to demo_cdecl completes. The GNU compilers (gcc and g++) utilize this technique to place function parameters onto the stack.

Note that either method results in the stack pointer pointing to the leftmost argument when the function is called.

The Standard Calling Convention

Standard in this case isa bit of mïsnomer as it isa narne that Microsoft created for ils own calling convention rnarked by the use of the _stdcall modifier in a function declaration, as shown here:

void stdcall demo stdcall(int w, int x, int y)

In order to avoid any confusion surrounding the word standard, we will refer to this calling convention as the stdcall calling convention for the remainder of the book. As with the cdecl calling convention, stdcall requires that function parameters be placed on the program stack in right-to-left order. The difference when using stdcall is that the called function is responsible for clearing the function parameters from the stack when the function bas finïshed. In order fora function to do this, the function must know exactly how many parameters are on the stack. This is possible oniy for fonctions that accept a fïxed number of parameters. As a resuit, variable argument fonctions such as print-f cannot make use of the stdcall calling convention. The derno_stdcall function, for example, expects three integer parameters, occupying a total of 12 bytes on the stack ( * sizeof(int) on a 32-bit architecture). An x86 compiler can use a special form of the RET instruction to sïmultaneously pop the return address from the top of the stack and add 12 to the stack pointer to clear the function parameters. In the case of derno_stdcall, we rnight see the following instruction used to return to the caller:

ret 12	return and clear 12 bytes from the stack

The prïmary advantage to the use of stdcall is the elirnination of code to clean parameters off the stack following every function cali, which resuits in slïghtly srnaller, slightly faster programs. By convention Microsoft utilizes the stdcall convention for ail fixed-argument functions exported from shared lïbrary (DLL) files. This is an important point to remember if von are attempting to generate function prototypes or bïnary-compatible replacements for any shared library components.

The fastcall Convention for x86

A variation on the stdcall convention, the -fastcall calling convention passes up to two parameters in CPU registers rather than on the program stack. The Microsoft Visual C/C++ and GNU gcc/g++ (version 3.4 and later) compilers recognize the -fastcall modifier in function declarations. When -fastcall is specïfled, the first two parameters passed to a function will be placed in the ECX and EDX registers, respectïvely. Any remaining parameters are placed on the stack in right-to-left order similar to stdcall. Also similar to stdcall, -fastcall fonctions are responsible for removing parameters [rom the stack when they return to their caller. The following declaration demonstrates the use of the -fastcall modifier.

void -fastcall demo fastcall(int w, int x, int y, int z)

A compiler mïght generate the following code in order to calI

demofastcall:
demo-fastcall(1, 2, 3, 4);	//programmer calis demo_fastcall
push	4	move parameter z to second position on stack
push	3	move parameter y to top position on stack
mov	edx, 2	move parameter x to edx
mov	ecx, 1	move parameter w to ecx
call	demo-fastcall	cali the -function

Note that no stack adjustrnent is required upon return [rom the call to demo-fastcall, as demo_fastcall is responsible for clearing pararneters y and z [rom the stack as it returns to the caller. It is important to understand that because two arguments are passed in registers, the called fonction needs to clear moly 8 bytes from the stack even though there are four arguments to the fonction.

C++ Calling Conventions

Nonstatic member fonctions in C++ classes differ front standard fonctions in that they must make available the this pointer, which points to the object used to invoke the fonction. The address o[ the object used to invoke the fonction must be supplied by the caller and is therefore provided as a parameter when calling nonstatic member fonctions. The C++ language standard dors not specify how this should be passed to nonstatic member fonctions, so it should corne as no surprise that di[[erent compilers use di[[erent techniques when passing this.

Microsoft Visual C++ offers the thiscali calling convention, which passes this in the ECX register and requires the nonstatic member fonction to clean pararneters off the stack as in stdcall. The GNU g++ compiler treats this as the implïed fïrst parameter to any nonstatic member fonction and behaves in alI other respects as i[ the cdecl convention is being used. Thus, for g++-compiled code, this is placed on top of the stack prion to calling the nonstatic member fonction, and the caller is responsible for removing pararneters (there will always be ai least one) from the stack once the fonction returns. Addïtional features of compiled C++ are discussed in Chapter 8.

Other Calling Conventions

Cornplete coverage of every existing calling convention would require a book in its own right. Calling conventions are often language-, compiler-, and CPU-specific, and some research on your part may be required as you encounter code generated by less-common cornpilers. A few situations deserve special mention, however: optimïzed code, custom assembly language code, and system calls.

When fonctions are exported for use by other programmers (such as Iibrary fonctions), it is important that they adhere to weli-known calling conventions so that programmers can easily interface to those fonctions. On the other hand, if a function is intended for internai program use oniy, then the calling convention used by that function need be known only within that functions program. In such cases, optimizing compiiers may choose to use aiternate calling conventions in order to generate faster code. Instances in which this may cœur include the use of the /GL option wïth Microsoft Visual C++ and the use of the regparm keyword with GNU gcc/g++.

When programmers go to the trouble of using assembly ianguage, they gain compiete control over how parameters wili be passed to any functions that they happen to create. Unless they wish to make their functions avaiiable to other programmers, assembly language programmers are free to pass parameters in any way they see fit. As a resuit, you may need to take extra care when analyzing custom assembly code. Custom assembly code is often encountered in obfuscation routines and sheilcode.

A system callis a speciai type of function cail used to request an operating system service. System cails usuaHy effect a state transition front user mode to kernel mode in order for the operating system kernel to service the users request. The manner in which system rails are initiated varies across operating systems and CPUs. For example, Linux x86 system cails may be ïnitïated using the int 0x80 instruction or the sysenter instruction, while other x86 operating systems may use oniy the sysenter instruction or alternate ïnterrupt numbers. On many x86 systems (Linux being an exception) parameters for system rails are placed on the runtime stack, and a system calI number is placed in the EAX register immedïateiy prior to initiating the system rail. Linux system calis accept their parameters in specific registers and occasionally in memory when there are more parameters than available registers.

Local Variable Layout

Unlike the calling conventions that dictate the manner in which parameters are passed into a function, there are no conventions that mandate the layout of functions local variables. When compiling a function, one task a compiler is faced wïth is to compute the amount of space required by a functions local variables. Another task is to determine whether those variables can be allocated in Cpu registers or whether they must be aHocated on the program stack. The exact manner in which these allocations are made is ïrreievant to both the cailer of function and to any fonctions that may, in turc, be called. Most notably, it is typïcaily impossible to determine a functions local variable layout based on examination of the functions source code.

Stack Frame Examples

Consider the following function compïied on a 32-bit x86-based computer:

void bar(int j, int k);	II a -function to cail
void demo stack-frame(int a, int b, int c) { int x

char bu-f-fer[64]; int y int z /7 body cf function not terribly relevant other than bar(z, y); We compute the minimum amount of stack space required for local variables as 76 bytes (three 4-byte integers and a 64-byte buffer). This fonction could use eïther stdcall or cdecl, and the stack frame will look the saute. Figure 6-3 shows one possible implementation of a stack frame for an invocation of demo stack-frame, assuming that no [rame pointer register is used (thus the stack pointer, ESP, serves as the frame pointer). This frame would be set up on entry to demo_stacktrame with the one-line prologue: sub esp, 76 ; allocate su-fficient space -for ail local variables The Offset column indicates the base-i-displacement address required to reference any of the local variables or parameters in the stack [rame.

Variable	Offset
saved eip	[esp+76]
a	[esp+80]
b	[esp+84]
c	[esp+88]
parameters
esp	z	[esp]	} local variables

y [esp~4] buffer [esp~8] x [esp~72]

Generating fonctions that utilize the stack pointer to compute aIl variable references requires a little more effort on the part of the compiler, as the stack pointer changes frequently and the compiler must make sure that proper offsets are used at aIl times when referencing any variables within the stack [rame. Consider the rail made to bar in fonction demo stack-frame, the code [or which is shown here:

push	dword [esp+4]	push y
push	dword [esp+4]	push z
call bar
add	esp, S	; cdecl requires caller te clear parameters

The push at O correctly pushes local variable y per the offset in Figure 6-3. At first glance if might appear that the push at O incorrectly references local variable y a second time. However, because we are dealing with an ESP-based frame and the push at O modifies ESP, all of the offsets in Figure 6-3 must be temporarily adjusted each time ESP changes. Following O, the new offset for local variable z becomes [esp+4] as correctly referenced in the push at O. When examining fonctions that reference stack frame variables using the stack pointer, you must be careful to note any changes to the stack pointer and adjust all future variable offsets accordingly. One advantage of using the stack pointer to reference ail stack frame variables is that ail other registers remain available for other purposes.

Once demo_stack-frame bas completed, it needs to return to the caller. Ultimately a ret instruction will be used to pop the desired return address off the top of the stack into the instruction pointer register (EIP in this case). Belote the return address can be popped, the local variables need to be removed from the top of the stack so that the stack pointer correctly points to the saved return address when the ret instruction is executed. For this particular fonction the resulting epilogue becomes

add esp, 76 adjust esp to point to the saved return address ret return to the caller

At the expense of dedicating a register for use as a frame pointer and some code to conflgure the frame pointer on entry to the function, the job of computing local variable offsets can be made easïer. In x86 programs, the EBP (extended base pointer) register is typïcally dedïcated for use as a stack frame pointer. By default, most compilers generate code to use a frame pointer, though options typïcally exïst for specifyïng that the stack pointer should be used instead. GNU gcc/g++, for example, offers the -fornit-frame-pointer compiler option, which generates functions that do not rely on a fïxed-frame pointer register.

In order to see what the stack frame for derno_stack-f rame will look like using a dedicated frame pointer, we need to consider this new prologue code:

push	ebp	save the callers ebp value
mov	ebp, esp	make ebp point to the saved register value
sub	esp, 76	allocate space -for local variables

The push instruction at O saves the value of EBP currently being used by the caller. Fonctions that adhere to the System V Application Binary Interface for Intel 32-bit Processors2 are allowed to modïfy the EAX, ECX, and EDX registers but are requïred to preserve the callers values for ail other registers. Therefore, if we wïsh to use EBP as a frame pointer, we must save the current value of EBP belote we change il, and we must restore the value of EBP belote we return to the caller. If any other registers need to be saved on behalf of the caller (EST or Et», for example), compilers may choose to save them at the same time EBP is saved, or they may defer saving them until local variables have been allocated. Thus, there is no standard location within a stack [rame [or the storage of saved registers. Once EBP bas been saved, it can be changed to point to the current stack location. This is accornplïshed by the mov instruction at O, which copies the current value o[ the stack pointer loto EBP. Finally, as in the non-EBP-based stack frame, space for local variables is allocated at O. The resulting stack [rame layout is shown in Figure 6-4.

Variable	Offset
saved eip	[ebp+4]
a	[ebp+8]
b	[ebp+12]
C	[ebp+16]
Figure 64: An FBP•based stack frarne
} parameters
z
[ebp-76]
y
[ebp-72]
buffer
[ebp-68]
x
[ebp-4]
esp
local variables
ebp	saved ebp	[ebp] 	} saved register(s)

With a dedicated frame pointer, ail variable offsets are cornputed relative to the frame pointer register. It is most often (though not necessarily) the case that positive o[fsets are used to access fonction parameters, while negative o[[sets are required to access local variables. With a dedicated [rame pointer in use, the stack pointer may be freely changed without a[[ecting the offset to any variables within the [rame. The calI to [unction bar can now be implernented as follows:

push	dword [ebp-72]	push y
push	dword [ebp-76]	; push z
cali bar
add	esp, S	; cdecl requires caller to clear parameters

The fact that the stack pointer bas changed [ollowing the push at O bas no effect on the access to local variable z in the succeeding push.

Fïnally, the use of frame pointer necessitates a slightly different epilogue once the fonction completes, as the callers [rame pointer must be restored prior to returning. Local variables must be cleared from the stack before the old value of the [rame pointer can be retrieved, but this is made easy by the fact that the current frame pointer points to the old frame pointer. In x86 programs utilizing EBP as a frame pointer, the following code represents a typical epilogue:

mov	esp, ebp	clears local variables by reseting esp
POP	ebp	restore the callers value of ebp
ret	pop return address to return to the caller

This operation is so commun that the x86 architecture offers the leave instruction as an abbreviated means of accomplishing the saute task.

leave	copies ebp to esp AND then pops into ebp
ret	pop return address to return to the caller

While the riantes of registers and instructions used will certainly differ for other processor architectures, the basic process of building stack frames will remain the saute. Regardless of the architecture, you will want to familiarize yoursel[ with typical prologue and epilogue sequences su that you can quïckly move on to analyzing more interesting code within fonctions.

IDA Stack Views

Stack [rames are clearly a runtime concept; a stack [rame cant exist without a stack and without a running program. While this is truc, it doesn't mean that you should ignore the concept of stack frame when you are performing static analysis with tools such as IDA. Ail of the code required to set up stack frames for each fonction is present within a binary. Through careful analysis of this code, we can gain a detailed understanding of the structure of any functions stack frame even when the fonction is not running. In fact, some of IDA most sophisticated analysis is performed specïfically to determine the layout of stack frames for every function that IDA disassembles. During initial analysis, IDA goes to great lengths to monitor the behavior o[ the the stack pointer over the course o[ a function by making note of every pusb or pop operation along with any arïthmetic operations that may change the stack pointer, such as adding or subtracting constant values. The [irst goal o[ this analysis is to determine the exact size of the local variable area allocated to a functions stack frame. Additional goals include determining whether a dedicated [rame pointer is in use in a given [unction (by recognizing a push ebp/mov ebp, esp sequence, [or example) and recognizing ail memory references to variables within a functions stack frame. For example, i[ IDA noted the 1'ollowing instruction in the body of derno_stackfrarne

mov	eax, [ebp+8]

it would understand that the first argument to the fonction (a in ibis case) is being loaded into the EAX register (refer to Figure 6-4). Through careful analysis o[ the stack [rame structure, IDA can distinguïsh between memory references that access fonction arguments (those that lie below the saved return address) and references that access local variables (those that lie above the saved return address). IDA takes the additional step of determining which memory locations within a stack [rame are directly re[erenced. For example, while the stack [rame in Figure 6-4 is 96 bytes in size, there are only seven variables that we are likely to see referenced (four locals and three parameters). Understanding the behavior of a fonction o[ten comes down to understanding the types o[ data that the fonction manipulates. When reading a dïsassembly listing, one o[ the fïrst opportunities that you will have to understand the data a fonction manipulates is to vïew the breakdown of the [unctions stack [rame. IDA offers two views into any functions stack [rame: a summary vïew and a detail view. In order to understand these two views, we will refer to the [ollowing version o[ demo_stackframe, which we have compiled using gcc.

void demo stack-frame(int a, int b, int c) {
int X = C;
char bu-f-fer[64];
int y = b;
int z = 10;
bu-f-fer[o] = Â;
bar(z, y);

In this example, local variables x and y are initialized [rom parameters c and b, respectively. Local variable z is initialized with the constant value 10, and the [irst character in the 64-byte local array, named butter, is ïnitialïzed to the letter A.

There are many points to cover in this listing as we begin to acquaint ourselves with IDAs disassembly notation. We begin ai O by noting that IDA believes this fonction uses the EBP register as a frame pointer based on analysis of the fonction prologue. At O we learn that gcc bas allocated 120 bytes (18h equates to 120) of local variable space in the stack [rame. This includes 8 bytes for passing the two parameters to bar ai O, but it is still far greater than the 76 bytes we had estimated prevïously and dernonstrates that compilers occasionally pad the local variable space with extra bytes in order to ensure a particular alïgnment within the stack [rame. Beginning ai O, IDA provides a summary stack view that lists every variable that is directly referenced within the stack frarne, along with the variables size and offset distance from the [rame pointer.

IDA assigns names to variables based on their location relative to the saved return address. Local variables lie above the saved return address, while fonction parameters lie below the saved return address. Local variable narnes are derived using the var_ prefïxjoined with a hexadecimal suffix that indicates the distance, in bytes, that the variable lies above the saved [rame pointer. Local variable var C, in this case, is a 4-byte (dword) variable that lies 12 bytes above the saved frame pointer ([ebp-och]). Fonction parameter riantes are generated using the arg_ prefix combined with a hexadecimal suffix that represents the relative distance from the topmost parameter. Thus the topmost 4-byte parameter would be named arg_o, whïle successive parameters would be named arg_4, arg_8, arg_C, and so on. In this particular example arg_o is not lïsted because the fonction makes no use of parameter a. Because IDA [ails to locate any memory reference to [ebp+8] (the location of the first parameter), arg_o is ont listed in the summary stack view. A quick scan of the summary stack view reveals that there are many stack locations that IDA bas faïled to riante because no direct references to those locations exist in the program code.

NOTE The only stack variables that IDA w]]] autornatical]y generate narnes for are those that are dhect]y referenced within a function.

An important difference between IDAs disassembly listing and the stack [rame analysis that we performed earlïer is the fact that nowhere in the disassembly listing do we see memory references similar to [ebp-12]. Instead, IDA bas replaced ail constant offsets with symbolic names corresponding to the symbols in the stack vïew and their relative offsets from the stack frame pointer. This is in keeping with IDAs goal of generating a hïgher-levei disassembly. It is sïmply casier to deal with symbohc names than numeric constants. In fact, as we wili sec iater, IDA ailows us to change the names of any stack variable to whatever we wish, making the names that much easïer for us to remember. The summary stack view serves as a map from IDA-generated names to their corresponding stack frame offsets. For example, where the memory reference [ebp+arg_8] appears in the disassembiy, [ebp+ioh] or [ebp+16] could be used instead. If you prefer numeric offsets, IDA will happily show them to you. Right-chcking arg_8 at O yïeids the context-sensitive menu shown in Figure 6-5, which contains several options to change the dïsplay format.

In this example, since we have source code availabie for comparison, we can map the IDA-generated variable names back to the narres used in the original source using a variety of dues available in the disassembly.

1. Fïrst, demo_stack-frame takes three parameters: a, b, and c. These correspond to variables arg_o, arg_4, and arg_8 respectïveiy (though arg_o is missing in the disassembly because it is neyer referenced).

2. Local variable x is ïnitiaiïzed from parameter c. Thus var_C corresponds to x since it is initiahzed from arg_8 at 0.

3. Sïmiiarly, local variable y is ïnïtialïzed from parameter b. Thus, varsC corresponds to y since it is ïnitïalïzed from arg_4 at O.

4. Local variable z corresponds to var_60 since it is inïtialized with the value 10 at O.

5. The 64-byte character array buffer begins at var-58 since bu-f-fer[o] is initialized with A (ASCII 0x41) at O.

6. The two arguments for the rail to bar are moved into the stack at O rather than being pushed onto the stack. This is typical of current versions of gcc (versions 3.4 and later). IDA recognizes this convention and elects not to create local variable references for the two items at the top of the stack frame.

In addition to the summary stack view, IDA offers a detailed stack [rame view in which every byte allocated to a stack frame is accounted for. The detailed view is accessed by double-clicking any variable trame associated with a given stack frame. Double-clicking var_C in the previous listing would bring up the stack frame view shown in Figure 6-6 (ESC closes the wïndow).

Because the detailed view accounts for every byte in the stack frame, it occupies signïficantly more space than the summary view, which lists moly referenced variables. The portion of the stack [rame shown in Figure 6-6 spans a total of 32 bytes, which represents only a small portion of the entire stack frame. Note that no riantes are assïgned to bytes that are not referenced directly within the fonction. For example, parameter a, corresponding to arg_o, was neyer referenced within derno_ stack -frame. With no memory reference to analyze, IDA opts to do nothing wïth the corresponding bytes in the stack frame, which occupy offsets +00000008 through +0000000B. On the other hand, arg_4 was directly referenced at O in the dïsassembly listing, where its contents were loaded into the 32-bit EAX register. Based on the fact that 32 bits of data were moved, IDA is able to infer that the arg_4 is a 4-byte quantïty and labels it as such (db defines 1 byte of storage; dw defines 2 bytes of storage, also called a wont and dd defines 4 bytes of storage, also called a double won!).

Two special values shown in Figure 6-6 are" s" and" r" (each starts with a leading space). These pseudo variables are IDA's special representation of the saved return address (" r") and the saved register value (s) (" s" representing only EBP in this example). These values are included in the stack frame view for completeness, as every byte in the stack frame is accounted for. Stack frame view offers a detaïled look at the inner workings of compilers. In Figure 6-6 it is clear that the compiler bas inserted 8 extra bytes between the saved frame pointer s" and the local variable x (var_C). These bytes occupy offsets -0000000i through -00000008 in the stack frame. Further, a lit-de math performed on the offset assocïated with each variable listed in the summary view reveals that the compiler bas allocated 76 (rather than 64 per the source code) bytes to the character buffer at var _58. Unless you happen to be a compiler writer yourself or are wïllïng to dïg deep into the source code for gcc, ail you can do is speculate as to why these extra bytes are allocated in this mariner. In most cases we can chalk up the extra bytes to padding for alignment, and usually the presence of these extra bytes bas no impact on a program's behavior. After ail, if a programmer asks for 64 bytes and is given 76, the program should behave no dïfferently, especially since the programmer shouldn't be using more than the 64 bytes requested. On the other hand, if you happen to be an exploit developer and learn that it is possible to overflow this particular buffer, then you mïght be very interested in the fact that nothing interesting can even begin to happen outil you have supplied at least 76 bytes, which is the effective size of the buffer as far as the compiler is concerned. In Chapter 8 we will return to the stack frame view and ils uses in dealing with more complex datatypes such as arrays and structures.

Searching the Database

IDA makes it easy to navigate to things that you know about and designs many of ils data displays to summarize specific types of information (trames, strings, imports, and so on), making them easy to find as well. However, what features are offered to help you conduct more general searches through your databases? If you take time to revïew the contents of the Search menu, you will find a long lïst of options, the majorïty of which take you to the next item in some category. For example, Search > Next Code moves the cursor to the next location containing an instruction. You may also wish to familiarize yourself with the options available on the Jump menu. For many of these, you are presented with a lïst of locations to choose front. Jump > Jump to Function, for example, brïngs up a lïst of aIl fonctions, allowing you to quïckly choose one and navigate to it. While these canned search features may often be useful, two types of general-purpose searches are worth more detailed discussion: text searches and binary searches.

Text Searches

IDA text searches amount to substrïng searches through the disassembly listing view. Text searches are ïnïtïated via Search > Text (hotkey: ALTT), which opens the dialog shown in Figure 6-7. A number of seif-explanatory options dictate specific detaïls concerning the search to be performed. As shown, POSIX-style regular expressions are permïtted. The Identifier search is somewhat mïsnamed. In reality it restricts the search to [md whole words only and can match any whole word on an assembly une, including opcode mnemonïcs or constant values. An Identifier search for 401116 would [ail to find a symbol named 1OC_401116.

Selecting Find ail occurences causes the search results to be opened in a new window, allowing easy navigation to any single match of the search criteria. Finaily, the previous search can be repeated to locate the next match using CTRL-T or Search > Next Text.

The Case-sensitive option can be a cause of confusion. For string searches it is fairly straightforward; a search for "hello" will successfully find HELLO" if Case-sensitive is not selected. Things get a littie interesting if you perform a hex search and leave Case-sensitive unchecked. If you conduct a caseinsensitive search for E9 41 C3, you may be surprised when your search matches E9 61 C3. The two strings are considered to match because 0x41 corresponds to the character A whïle 0x61 corresponds to a. So, even though you have specïfled a hex search, 0x41 is considered equivalent to 0x61 because you failed to specïfy a case-sensitive search.

Searching for subsequent matches for binary data is done using CTRL-B or Search > Next Sequence of Bytes. Finaily, it is not necessary to conduct your binary searches from within the Hex Vïew window. IDA allows you to specify binary search criteria while the disassembly view is active, in which case a successful search will jump the disassembly window to the location whose underlying bytes match the specïfïed search criteria.