DISASSEMBLY MANIPULATION : Différence entre versions

De Wiki expérimental
(Page créée avec « After navigation, the next most significant features of IDA are designed to allow you to nodify the disassembly to suit your needs. In this chapter we will show that becau... »)
 
 
(10 révisions intermédiaires par le même utilisateur non affichées)
Ligne 1 : Ligne 1 :
 
After navigation, the next most significant features of IDA are designed to allow you to nodify the disassembly to suit your needs. In this chapter we will show that because of IDA's underlying database nature, changes that you make to a disassembly are easily propagated to ail IDA subviews
 
After navigation, the next most significant features of IDA are designed to allow you to nodify the disassembly to suit your needs. In this chapter we will show that because of IDA's underlying database nature, changes that you make to a disassembly are easily propagated to ail IDA subviews
 
to maïntain a consistent picture of your disassembly. One of the most powerful features that IDA offers is the ability to easily manipulate disassemblies to add new information or reformat a listing to suit your particular needs. IDA autornatically handles operations such as global search and replace when it makes sense to do so and makes trivial work of reformatting instructions and data and vice versa, features not avaïlable in other disassembly tools.
 
to maïntain a consistent picture of your disassembly. One of the most powerful features that IDA offers is the ability to easily manipulate disassemblies to add new information or reformat a listing to suit your particular needs. IDA autornatically handles operations such as global search and replace when it makes sense to do so and makes trivial work of reformatting instructions and data and vice versa, features not avaïlable in other disassembly tools.
 +
 +
Names and Naming
 +
 +
At this point, we have encountered two categories of names in IDA disassemblies: names associated with virtual addresses (named locations) and names associated with stack frarne variables. In the majority of cases IDA will autornatïcally generate ail of these names according to the guidelines prevïously discussed. IDA refers to such automatically generated names as durnrny naines.
 +
Unfortunately, these names seldom hint at the intended purpose of a location or variable and therefore dont generally add to our understanding of a programs behavior. As you begin to analyze any program, one of the first and most common ways that you will want to manipulate a disassembly listing is to change default names into more meaningful names. Fortunately, IDA aliows you to easily change any name and handies ail of the details of propagating ail name changes throughout the entire disassembly. In most cases, changing a name is as simple as ciicking the name you wish to change (this highlights the name) and using the N hotkey to open a name-change dialog. Alternatively, right-chcking the name to be changed generally presents a context-sensitive menu that contains a Rename option, as shown in Figure 6-5. The name-change process does differ somewhat between stack variables and named locations, and these differences are detailed in the foilowing sections.
 +
 +
Parameters and Local Variables
 +
 +
Names associated with stack variables are the simplest form of name in a disassembly listing, primarily because they are not associated with a specific virtual address and thus can neyer appear in the Nantes wïndow. As in most programming languages, such names are consïdered to be restricted in scope based on the fonction to whïch a given stack frame belongs. Thus, every fonction in a program rnïght have its own stack variable named arg_o, but no function may have more than one variable named arg_o. The dïalog shown in Figure 7-1 is used to renarne a stack variable.
 +
 +
Named Locations
 +
 +
Renaming a named location or adding a naine to an unnamed location is slightly different [rom changing the naine of a stack variable. The process [or accessing the name-change dialog is identical (hotkey N), but thïngs quickly change. Figure 7-2 shows the renaming dialog associated with named locations.
 +
 +
This dialog ïnforms you exactly what address you are naming along with a lïst of attributes that can be associated with the naine. The maximum naine length merely echoes a value [rom one of ifiAs configuration files (<IDADIR>/ cfg/ida.cfg. You are free to use naines longer than ibis value, which will cause IDA to complain weakly by in[orming you that you have exceeded the maximum naine length and offering to increase the maximum naine length for you. Should you choose to do so, the new maximum naine length value will be enforced (weakiy) oniy in the current database. Any new databases that you create will continue to be governed by the maximum naine length contained in the configuration flic.
 +
 +
Local name
 +
 +
A local name is restricted in scope to the current fonction, so the uniqueness of local names is enforced only within a given fonction. Like local variables, two different fonctions may contain identical local names, but a single fonction cannot contain two local names that are identical. Named locations that exist outside fonction boundaries cannot be designated as local names. These include names that represent fonction names as well as global variables. The niost commun use for local names is to provide symbolic names for the targets ofjumps within a function, such as those associated with branching control structures.
 +
 +
Include in names list
 +
 +
Selecting this option causes a name to be added to the Nanies window, which can make the name casier to find when you wish to return to it. Autogenerated (dunimy) names are neyer included in the Nanies window by default.
 +
 +
Public name
 +
 +
A public name is typically a name that is being exported by a binary such as a shared library. IDAs parsers typïcally discover public names whïle parsing file headers during initial loading into the database. You can force a symbol to be treated as public by selecting this attribute. In general, this bas very little effect on the dïsassernbly other than to cause public annotations to be added to the name in the disassembly listing and in the Naines window.
 +
 +
Autogenerated name
 +
 +
This attribute appears to have no discernible effect on disassemblies. Selecting it dors not cause IDA to autornatically generate a name.
 +
 +
Weak name
 +
 +
A weak symbol is a specialized form of public symbol utilized only when no public symbol of the saute name is found to override il. Marking a symbol as weak bas soute significance to an assembler but littie signiflcance in an IDA disassembly.
 +
 +
Create name anyway
 +
 +
As discussed previously, no two locations within a fonction may be given the saute name. Similarly, no two locations outside any fonction (in the global scope) may be given the saute name. This option is somewhat confusing, as it behaves dïfferently depending on the type of name you are attempting to create.
 +
 +
If you are editing a name at the global scope (such as a fonction name or global variable) and you attempt to assign a name that is already in
 +
use in the database, IDA will display the conflicting name dialog, shown in Figure 7-3, offering to automatically generate a unique numeric sufflx to resolve the conflict. This dialog is presented regardless of whether you have selected the Create name anyway option or not.
 +
 +
If, however, you are editing a local name within a fonction and you attempt to assign a name that is already in use, the default behavior is simply to reject the attempt. If you are determined to use the given name, you must select Create name anyway in order to force IDA to generate a unique numeric suffix for the local name. 0f course, the sïmplest way to resolve any name conflïct is to choose a name that is not already in use.
 +
 +
Commenting in IDA
 +
 +
Another useful feature in IDA is the ability to embed comments in your databases. Comments are a particularly useful way to leave notes for yourself regarding your progress as you analyze a program. In particular, comments are helpful for describing sequences of assembly language instructions in a hïgher-level fashion. For example, you might opt to wrïte comments using C language statements to summarize the behavior of a particular function. On subsequent analysis of the function, the comments would serve to refresh your memory faster than reanalyzing the assembly language statements.
 +
IDA offers several styles of comments, each suited for a different pur-pose. Comments may be associated with any une of the disassembly listing using options available front Edit > Comments. Hotkeys or context menus offer alternate access to IDAs commenting features.
 +
 +
The majority of IDA comments are prefixed with a semicolon to indicate that the remainder of the une is to be considered a comment. This is similar to commenting styles used by many assemblers and equates to #-style comments in many scripting languages or //-style comments in C++.
 +
 +
Regular Comments
 +
 +
The most straightforward comment is the regular comment Regular comments are placed at the end of existing assembly unes, as at O in the preceding listing. Right-click in the right margin of the disassembly or use the colon () hotkey to activate the comment entry dialog. Regular comments will spart multiple lines if you enter multiple lines in the comment entry dialog. Each of the lines will be indented to fine up on the right side of the disassembly. To edit or delete a comment, you must reopen the comment entry dialog and edit or delete ail of the comment text as approprier. By default, regular comments are displayed as blue text.
 +
 +
IDA itself makes extensive use of regular comments. During the analysis phase, IDA inserts regular comments to describe parameters that are being pushed for fonction calls. This cœurs only when IDA bas parameter name or type information for the fonction being called. This information is typically contaïned within type hbraries, which are discussed in Chapter 8 and Chapter 13, but also may be entered manually.
 +
 +
Repea table Comments
 +
 +
A repeatable comment is a comment that is entered once but that may appear automatically in many locations throughout the disassembly. Location e in the previous listing shows a repeatable comment. In a disassembly listing the default color for repeatable comments is blue, making them indïstinguishable front regular comments. It is the behavior rather than the appearance that matters in this case. The behavior of repeatable comments is tïed to the concept of cross-references. When one program location refers to a second location that contaïns a repeatable comment, the comment assocïated wïth the second location is echoed at the first location. By default, the echoed comment appears as gray text, making the repeated comment distinguishable front other comments. The hotkey for repeatable comments is the semicolon (;), making it very easy to confuse repeatable comments and regular comments.
 +
 +
In the previous listing, note that the comment at O is identical to the comment at e. The comment at e bas been repeated because the instruction at O (jge short loc_40106C) refers to the address of O (0040106c).
 +
 +
A regular comment added at a location that is displaying a repeated comment overrides the repeated comment so that only the regular comment will be displayed. If you entered a regular comment at O, the repeatable comment inherited from O would no longer be displayed at O. If you then deleted the regular comment at O, the repeatable comment would once again be displayed.
 +
 +
A variant form of repeatable comment is associated with strings. Whenever IDA automatically creates a string variable, a virtual repeatable comment is added at ail locations referencing the string variable. We say virtual because the comment cannot be edïted by the user. The content of the virtual comment is set to the content of the string variable and displayed throughout the database just as a repeatable comment would be. As a resuit, any program locations that refer to the string variable will dïsplay the contents of the string variable as a repeated comment. The three comments annotated O demonstrate such comments displayed as a result of references to string variables.
 +
 +
Anterior and Posterior Unes
 +
 +
Anterior and posterior unes are full-line comments that appear either immediately before (anterior) or after (posterior) a given disassembly line. These comments are the only IDA comments that are notpreflxed with the semicoion character. An exampie of an anterior line comment appears at O in the prevïous listing. You can dïstinguish an anterior line from a posterior line by comparing the address associated with the line to the address associated with the instruction immedïately preceding or following the line.
 +
 +
Function Comments
 +
 +
Fwiction comments allow you to group comments for dispiay at the top of a functions disassembly listing. An example of  fonction comment is shown at 0, where the fonction prototype bas been entered. You enter fonction comments by flrst highhghtïng the fonction name at the top of the fonction (0) and then addïng eïther a regular or repeatable comment. Repeatable fonction comments are echoed at any locations that rail the commented fonction. IDA will automatically generate fonction prototype-style comments when you use the Set Fonction Type command discussed in Chapter 8.
 +
Basic Code Transformations
 +
In many cases you will be perfectly content with the disassembly listings that IDA generates. In soute cases you wont. As the types of files that you analyze diverge farther and farther from ordïnary executabies generated with commun compilers, you may find that you need to take more control of the disassembly analysis and dispiay processes. This will be especïally truc if you find yourself performing analysis of obfuscated code or files that utihze a custom (unknown to IDA) file format.
 +
 +
Code transformations facilitated by IDA include the following:
 +
* Converting data into code
 +
* Converting code into data
 +
* Designating a sequence of instructions as a fonction
 +
* Changing the starting or ending address of an existing function
 +
* Changing the display format for instruction operands
 +
 +
The degree to which you utilize these operations depends on a wide variety of factors and personal preferences. In general, if a binary is very complex, or if IDA is not familiar wïth the code sequences generated by the compiler used to build the binary, then IDA will encounter more problems during the analysis phase, and you will need to make manual adjustments to the disassembled code.
 +
 +
Code Display Options
 +
 +
The sïmplest transformations that you can make to a disassembly listing involve customizing the amount of information that IDA generates for each disassembly line. Each dïsassembled une can be considered as a collection of parts that IDA refers to, not surprisïngly, as disassernhly fine parts. Labels, mnemonics, and operands are always present in a disassembly line. You can select additional parts for each disassembly line via Options > General on the Disassembly tab, as shown in Figure 7-4.
 +
 +
The DisplayDisassernblyLine Parts section in the upper right offers several options for customizing disassembly unes. For IDAs text disassembly view, une prefixes, comments, and repeatable comments are selected by default. Each item is described here and shown in the listing that foliows.
 +
 +
Line prefixes
 +
 +
A une prefix is the section: address portion of each disassembly une. Deselecting this option causes the une prefix to be removed front each disassembly une (the default in graph view). To illustrate this option, we have disabled une prefixes in the next listing.
 +
 +
Stack pointer
 +
 +
IDA performs extensive analysis on each fonction in order to track changes to the program stack pointer. This analysis is essential in understanding the layout of each functions stack frame. Selecting the Stack pointer option causes IDA to display the relative change to the stack pointer throughout the course of each fonction. This may be useful in recognizing discrepancies in calling conventions (IDA may not understand that a particular fonction uses stdcall, for example) or unusual manipulations of the stack pointer. Stack pointer tracking is shown in the column under O. In this example, the stack pointer bas changed by four bytes following the fïrst instruction and a total of Ox7C bytes following the third instruction. By the time the fonction completes, the stack pointer is restored to its original value (a relative change of zero bytes). Whenever IDA encounters a fonction return statement and detects that the stack pointer value is not zero, an error condition is flagged and the instruction line hïghlighted in red. In some cases, this might be a deliberate attempt to frustrate automated analysis. In other cases, it may be that a compiler utilizes prologues and epilogues that IDA cant accurately analyze.
 +
 +
Comments and repeatable comments
 +
 +
Deselecting eïther of these options inhibits the display of the respective comment type. This may be useful if you wïsh to declutter a disassembly listing.
 +
 +
Auto comments
 +
 +
IDA can automatically comment some instruction types. This can serve as a reminder as to bons particular instructions behave. No comments are added for trivial instructions such as the x86 nov. The comments at O are examples of auto comments. User comments take precedence over auto comments; in this case if you want to sec IDAs automatic comment for a line, youll have to remove any comments you've added (regular or repeatable).
 +
 +
Bad instruction <BAD> marks
 +
 +
IDA can mark instructions that are legal for the processor but that may not be recognïzed by some assemblers. Undocumented (as opposed to illegal) CPU instructions may faIl in this category. In such cases IDA will disassemble the instruction as a sequence of data bytes and display the
 +
undocumented instruction as a comment prefaced with <BAD>. The intent is to generate a disassembly that most assemblers can handle. Refer to the IDA help file for more information on the use of <BAD> marks.
 +
 +
Number of opcode bytes
 +
 +
Most disassemblers are capable of generating listing files that display the generated machine language bytes side by side with the assembly language instructions from whïch they are derïved. IDA allows you to view the machine language bytes assocïated with each instruction by synchronizing a hex display to the disassembly listing display. You can optïonally view machine language bytes mixed with assembly language instructions by specïfyïng the number of machine language bytes that IDA should display for each instruction.
 +
 +
This is fairly straightforward when you are disassembling code for processors that have a flxed instruction size, but it is somewhat more diffïcult for variable-length instruction processors such as the x86, for whïch instructions may range [rom one to more than a dozen bytes in size. Regardless of the instruction length, IDA reserves display space in the disassembly listing for the number of bytes that you specify here, pushing the remaining portions of the disassembly Une to the right to accommodate the specïfled number of opcode bytes. Number of opcode bytes bas been set to 5 in the following disassembly and can be seen in the columns under O. The + symbol at O indicates that the specïfied instruction is ton long to be fully displayed given the current settïngs.
 +
 +
You can further customize the disassembly display by adjusting the indentation values and margins shown in the lower right of Figure 7-4. Any changes to these options affect only the current database. Global settings for each of these options are stored in the main configuration file, <IDADIR>/cfg/ida.cfg.
 +
 +
Formatting Instruction Operands
 +
 +
During the disassembly process, IDA makes many decisions regarding how to format operands associated with each instruction. The bïggest decisions generaily revolve around how to format various integer constants used by the wide variety of instruction types. Among other things, these constants can represent relative offsets injump or cali instructions, absolute addresses of global variables, values to be used in arïthmetic operations, or programmer
 +
-
 +
defïned constants. In order to make a disassembly more readable, IDA attentats to use symbolic names rather than numbers whenever possible. In some cases, formatting decisions are made based on the context of the instruction being disassembled (such as a calI instruction); in other cases, the decision is based on the data being used (such as access to a global variable or an offset into a stack frame). In many other cases, the exact context in which a constant is being used may not be clear. When ibis happens, the associated constant is typically formatted as a hexadecimal constant.
 +
If you happen not to be one of the few people in the world who rat, sleep, and breathe hex, then you will welcome IDAs operand formatting features. Right-clicking any constant in a disassembly opens a context-sensitive menu similar to that shown in Figure 7-5.
 +
 +
In this case, menu options are offered enabling the constant (41h) to be reformatted as decimal, octal, or binary values. Since the constant in ibis example falls within the ASCII printable range, an option is also presented to format the value as a character constant. In ail cases, the menu dïsplays the exact text that will replace the operand text should a particular option be selected.
 +
In many cases, programmers use named constants in their source code. Such constants may be the result of #define statements (or their equivalent), or they may belong to a set of enumerated constants. Unfortunately, by the time a compiler is finïshed with the source code, it is no longer possible to determine whether the source used a symbolic constant or a literai, numeric constant. IDA maïntains a large catalog of named constants associated with many commun libraries such as the C standard lïbrary or the Windows API.
 +
 +
This catalog is accessible via the Use standard symbolic constant option on the context-sensitive menu associated with any constant value. Selecting this option for the constant oAh in Figure 7-5 opens the symbol-selection diaiog shown in Figure 7-6.
 +
 +
The dïalog is populated from IDAs internai hst of constants after fiitering according to the value of the constant we are attempting to format. In this case we sec ail of the constants that IDA knows to be equated with the value oAh. If we determïned that the value was being used in conjonction with the creation of an X.25-style network connection, then we mïght select AF_CCITT and end up with the foliowing disassembiy hne:
 +
 +
.text:004010A2 mov [ebp+var_60], AFCCITT
 +
 +
The hst of standard constants is a useful way to determine whether a particular constant may be associated with a known name and can save a lot of time reading through API documentation in search of potentiai matches.
 +
 +
Manipulating Functions
 +
 +
There are a number of reasons that yen may wish to manipulate fonctions after the initiai autoanalysis bas been completed. In some cases, such as when IDA [ails to iocate a caH to a fonction, fonctions may not be recognïzed, as there may be no obvïous way to reach them. In other cases, IDA may faïl to properiy locale the end of a fonction, requiring some manual intervention on your part to correct the disassembiy. IDA may have trouble iocating the end of a fonction if a compiler bas spiit the fonction across several address ranges or when, in the process of optimizing code, a compiler merges common end sequences of two or more fonctions in order to save space.
 +
 +
Creating New Functions
 +
 +
Under certain circumstances, new functions can be created where no function exists. New functions can be created from existing instructions that do not already belong to a function, or they can be created [rom raw data bytes that have not been defined by IDA in any other mariner (such as double words or strings). You create functions by placing the cursor on the flrst byte or instruction to be included in the new function and selecting Edit Functions > Create Function. IDA attempts to couvert data to code if necessary. Then it scans forward to analyze the structure of the function and search for a return statement. If IDA can locate a suitable end of the function, it generates a new function riante, analyzes the stack [rame, and restructures the code in the form of a function. If it cant locate the end of the function or encounters any illegal instructions, then the operation fails.
 +
 +
Deleting Functions
 +
 +
You can delete existing functions using Edit F Functions F Delete Function. You may wish to delete a function if you believe that IDA bas erred in its autoanalysis.
 +
 +
Function Cliunks
 +
 +
Function chunks are commonly found in code generated by the Microsoft Visual C++ compiler. Chunks are the resuit of the compiler moving blocks of code that are less frequently executed in order to squeeze frequently executed blocks loto memory pages that are less likely to be swapped out.
 +
When a function is split in such a manner, IDA attempts to locate ail of the assocïated chunks by following the jumps that lead to each chunk. In most cases IDA dues a gond job of locating ail of the chunks and listing each chunk in the functions header, as shown in the following partial function disassembly:
 +
 +
Fonction chunks are easily reached by double-clicking the address associated with the chunk, as at O. Within the dïsassembly listing, function chunks are denoted by comments that delimit their instructions and that refer to the owning function, as shown in this listing:
 +
 +
ChunkedFunc
 +
 +
In some cases IDA may fail to locate every chunk assocïated with a function, or functions may be misidentified as chunks rather than as functions in their own right. In such cases, you may find that you need to create your own function chunks or delete existing function chunks.
 +
 +
You create new function chunks by selecting the range of addresses that belong to the chunk, which must not be part of any existing function, and selecting Edit > Fonctions > Append Fonction Tau. At this point you will be asked to select the parent function front a list of all defined functions.
 +
NOTE In disassernhly listings, [miction chunks air irferird to asjust that: function chunks. In the IDA menu system, functions chunks are instead re[erred to as function tails.
 +
 +
You can delete existing function chunks by positioning the cursor on any fine within the chunk to be deleted and selecting Edit > Fonctions > Remove Fonction Tail. At this point you will be asked to confirm your action prior to deleting the selected chunk.
 +
 +
If function chunks are turning out to be more trouble than they are worth, you can ask ifiA not to create function chunks by desehecting the Create function tails loader option when you first load a file into IDA. This option is one of the loader options accessible via Kernel Options (sec Chapter 4) in the initial file-load dïalog. If you disable function tails, the primary difference that you may notice is that functions that would otherwïse have contained tails containjumps to regions outside the function boundaries. IDA highlights such jumps using red lines and arrows in the arrow windows on the left side of the disassembhy. In the graph vïew for the corresponding function, the targets of suchjumps are not displayed.
 +
 +
Function Attributes
 +
 +
IDA associates a number of attributes with each function that it recognizes. The function properties dïalog shown in Figure 7-7 can be used to edit many ofthese attributes. Each attribute that can be modifïed is explained here.
 +
 +
Name of function
 +
 +
An alternative means for changing the name of a function.
 +
 +
Start address
 +
 +
The address of the fïrst instruction in the fonction. IDA rnost often dertermines this autornatically, either during analysis or from the address used during the create function operation.
 +
 +
End address
 +
 +
The address following the last instruction in the fonction. Most frequently, this is the address of the location that follows the functions return instruction. In most cases, this address is deterrnined automatically during the analysis phase or as part of fonction creation. In cases where IDA bas trouble determining the true end of a function, you niay need to edit this value rnanually. Remember, this address is not actually part of the function but foliows the last instruction in the fonction.
 +
 +
Local variables area
 +
 +
This represents the number of stack bytes dedicated to local variables (sec Figure 6-4) for the function. In rnost cases, this value is computed automatically based on analysis of stack pointer behavior within the fonction.
 +
 +
Saved registers
 +
 +
This is the number of bytes used to save registers (see Figure 6-4) on behalf of the caller. IDA considers the saved register region to lie on top of the saved return address and below any local variables assocïated wïth the fonction. Some compilers choose to save registers on top of a functions local variables. IDA considers the space required to save such registers as belonging to the local variable area rather than a dedïcated saved registers area.
 +
 +
Purged bytes
 +
 +
Purged bytes shows the number of bytes of parameters that a function removes from the stack when it returns to its caller. For cdecl functions, this value is always zero. For stdcall functions, this value represents the amount of space consumed by any parameters that are passed on the stack (see Figure 6-4). In x86 programs, IDA can automatically determine this value when it observes the use of the RET N variant of the return instruction.
 +
 +
Frame pointer delta
 +
 +
In some cases, compilers may adjust a functions frame pointer to point somewhere into the middle o[ the local variable area rather than ai the saved frame pointer ai the bottom o[ the local variable area. This distance [rom the adjusted frame pointer to the saved [rame pointer is termed the fraine pointer delta. In most cases any frame pointer delta will be computed automatically when the fonction is analyzed. Compilers utilize a stack [rame delta as a speed optimization. The purpose o[ the delta is to keep as many stack [rame variables as possible within reach of a 1-byte signed offset (-128+127) from the [rame pointer.
 +
 +
Additional attribute checkboxes are available to further characterize the [unction. As with other fields within the dialog, these checkboxes generally reflect the results of IDAs automatic analysis. The 1'ollowing attributes can be toggled on and off.
 +
 +
Dors not return
 +
 +
The fonction dues not return to ils caller. When such a function is called, IDA dues not assume that execution continues following the assocïated calI instruction.
 +
 +
Far fonction
 +
 +
Used to mark a [unction as a far [unction on segmented architectures. Callers o[ the fonction would need to specify both a segment and an offset value when calling the function. The need to use lac rails is typically dictated by the memory model in use within a program rather than by the fact that the architecture supports segmentation, for example, the use of the large (as opposed to flat) memory model on an x86.
 +
 +
Library fune
 +
 +
Flags a fonction as library code. Lïbrary code might include support routines ïncluded by a compiler or [unctions that are part of a statically lïnked library. Marking a fonction as a library fonction causes the fonction to be dïsplayed using the assïgned library fonction coloring to distinguish it [rom nonlibrary code.
 +
 +
Static func
 +
 +
Dues nothing other than display the static modifier in the [unctions attribute list.
 +
 +
BP hased frame
 +
 +
Indicates that the fonction utilizes a frame pointer. In most cases you determine this automatically by analyzing the functions prologue. If analysis fails to recognize that a frame pointer is used in the given finir-lion, you can manually select this attribute. If you do manually select this attribute, make sure that you adj ust the saved register size (usually
 +
Evert though this is an imported fonction, IDA aliows you to edit one piece of information concerning its behavior: the number of purged bytes associated with the fonction. By editing this fonction, you can specify the number of bytes that it clears off the stack when it returns, and IDA will propagate the information that you supply to every location that rails the fonction, ïnstantly correcting the stack pointer computations at each of those locations.
 +
 +
In order to improve its autornated analysis, IDA incorporates advanced techniques that attempt to resolve stack pointer discrepancies by solving a system of linear equations related to the behavior of the stack pointer. As a result, you may not even realize that IDA bas no prior knowledge of the details of fonctions such as sorne_imported_func. For more information on these techniques, refer to Ilfaks blog post titled "Simplex method in IDA Pro" at http://hexblog.corn/2006/06/
 +
Converting Data to Code (and Vice Versa)
 +
 +
During the automatic analysis phase, bytes are occasionally categorïzed incorrectly. Data bytes may be ïncorrectly classifïed as code bytes and dïsassernbled into instructions, or code bytes may be incorrectly classified as data bytes and forrnatted as data values. This happens for many remous, including the fact that sorne compilers embed data into the code section of programs or the fact that soute code bytes are neyer directly referenced as code and IDA opts not to disassemble them. Obfuscated programs in particular tend to blur the distinction between code sections and data sections.
 +
 +
Regardless of the reason that you wish to reformat your disassernbly, doïng su is fairly easy. The first option for reformatting anything is to rernove its current formatting (code or data). It is possible to undefine fonctions, code, or data by right-clicking the item that you wïsh to undefine and selecting Undefine (also Edit > Undefine or hotkey U) from the resulting contextsensitive menu. Undefining an item causes the underlyïng bytes to be reformatted as a list of raw byte values. Large regions can be undefined by using a click-and-drag operation to select a range of addresses prior to performing the undefine operation. As an example, consider the simple fonction listing that follows:
 +
 +
To disassemble a sequence of undefined bytes, rïght-click the first byte to be dïsassembled and select Code (also Edit F Code or hotkey C). This causes IDA to disassemble ail bytes outil it encounters a defined item or an illegal instruction. Large regions can be converted to code by using a clickand-drag operation to select a range of addresses prior to performing the code-conversion operation.
 +
 +
The complementary operation of converting code to data isa httle more compiex. First, it is not possible to convert code to data using the context menu. Available alternatives inciude Edit > Data and the D hotkey. Bulk conversions of instructions to data are easiest to accomplish by flrst undefining ah of the instructions that you wish to convert to data and then formatting the data appropriately. Basic data formatting is discussed in the foilowing section.
 +
 +
Basic Data Transformations
 +
 +
Properly formatted data can be as important in deveioping an understanding of  programs behavior as properiy formatted code. IDA takes information [rom a variety of sources and uses many algorïthms in order to determine the most appropriate way to format data within a dïsassembiy. A few exampies serve to iliustrate how data formats are seiected.
 +
 +
1. Datatypes and/or sizes can be inferred from the manner in whïch registers are used. An instruction observed to ioad a 32-bit register from memory implies that the associated memory location holds a 4-byte datatype (though we may not be able to distinguïsh between a 4-byte integer and a 4-byte pointer).
 +
 +
2. Fonction prototypes can be used to assign datatypes to fonction parameters. IDA maïntains a large library of fonction prototypes for exactly this purpose. Analysis is performed on the parameters passed to fonctions in an attempt to lie a parameter to a memory location. If such a relationship can be uncovered, then a datatype can be applied to the associated
 +
memory location. Consider a function whose single parameter is a pointer to a CRITICAL_SECTION (a Windows API datatype). If IDA can determine the address passed in a cali to this fonction, then IDA can flag that address as a CRITICAL_SECTION object.
 +
 +
3. Analysis of  sequence of bytes can reveal likely datatypes. This is precisely what happens when a binary is scanned for string content. When long sequences of ASCII characters are encountered, it is not unreasonable to assume that they represent character arrays.
 +
 +
In the next few sections we discuss some basic transformations that you can perform on data within your disassemblies.
 +
 +
Specifying Data Sizes
 +
 +
The sïmplest way to modify a piece of data is to adjust its size. IDA offers a number of data size/type specifiers. The most commonly encountered speciflers are db, dw, and dd, representing 1-, 2-, and 4-byte data, respectïvely. The flrst way to change a data items size is via the Options > Setup Data Types dialog shown in Figure 7-8.
 +
 +
There are two parts to this dialog.
 +
 +
The left side of the dialog contains a
 +
column of buttons used to immediately
 +
change the data size of the currently
 +
selected item. The right side of the dialog
 +
contains a column of checkboxes used
 +
to configure what IDA terms the data
 +
carousel. Note that for each button on
 +
the left, there is a corresponding checkbox on the right. The data carousel is a
 +
revolving lïst of datatypes that contains
 +
only those types whose checkboxes are
 +
selected. Modïfying the contents of the
 +
data carousel bas no immediate impact
 +
on the IDA display. Instead, each type on
 +
the data carousel is listed on the context-sensitive menu that appears when you
 +
right-click a data item. Thus, it is casier
 +
to reformat data to a type listed in the
 +
data carousel than to a type not listed
 +
in the data carousel. Given the datatypes selected in Figure 7-8, right-clïckïng
 +
a data item would offer you the opportunïty to reformat that item as byte,
 +
word, or double-word data.
 +
 +
The name for the data carousel derives front the behavior of the associated data formatting hotkey: D. When you press D, the item at the currently selected address is reformatted to the next type in the data carousel list. With the three-ïtem list specified prevïously, an item currently formatted as db toggies to dw, an item formatted as dw toggies to dd, and an item formatted as dd toggies back to db to complete the circuit around the carousel. Using the data hotkey on a nondata item such as code causes the item to be formatted as the first datatype in the carousel list (db in this case).
 +
 +
Toggiing through datatypes causes data items to grow, shrïnk, or remain the saine size. If an items size remaïns the saine, then the only observable change is in the way the datais formatted. If you reduce an items size, from dd (4 bytes) to db (1 byte) for example, any extra bytes (3 in ibis case) become undefïned. If you increase the size of an item, IDA complains if the bytes following the item are already defined and asks you, in a roundabout way, if you want IDA to undefine the next item in order to expand the current item. The message you encounter in such cases is "Directly couvert to data?" This message generally means that IDA will undeflne a sufflcient number of succeedïng items to satisfy your request. For example, when converting byte data (db) to double-word data (dd), 3 additïonal bytes must be consumed to form the new data item.
 +
 +
Datatypes and sizes can be specified for any location that describes data, including stack variables. To change the size of stack-allocated variables, open the detailed stack frame vïew by double-clicking the variable you wïsh to modify; then change the variables size as you would any other variable.
 +
 +
Working with Strings
 +
 +
IDA recognizes a large number of string formats. By default, IDA searches for and formats C-style null-terminated strings. To force data to be converted to a string, utilize the options on the Edit > Strings menu to select a specific string style. If the bytes beginning at the currently selected address form a string of the selected style, IDA groups those bytes together into a single-string variable. At any time, you can use the A hotkey to format the currently selected location in the default string style.
 +
 +
Two dialogs are responsible for the configuration of string data. The flrst, shown in Figure 7-9, is accessed via Options > ASCII String Style, though ASCII in ibis case is a bit of a mïsnomer, as a much wider variety of string styles are understood.
 +
 +
Similar to the datatype configuration dialog, the buttons on the left are used to create a string of the specified style at the currently selected location. A string is created only if the data at the current location conforms to the specified string format. For Character terininated strings, up to two termination characters can be specified toward the bottom of the dialog. The radio buttons on the right of the dialog are used to specify the default string style associated with the use of the strings hotkey (A).
 +
 +
The second dialog used to conflgure string operations is the Options General dïalog, shown in Figure 7-10, where the Strings tab aliows configuration of addïtïonal strings-related options. While you can specify the default string type here as well using the avaïlable drop-down box, the majority of available options deal wïth the narning and display of string data, regardless of their type. The Name generation area on the right of the dïalog is visible only when the Generate names option is selected. When nonne generation is turned off, string variables are given dummy nonnes beginning with the asc_ prefix.
 +
 +
When name generation is enabled, the Name generation options control how IDA generates names for string variables. When Generate serial names is not selected (the default), the specified prefix is combined with characters taken [rom the string to generate a name that dors not exceed the current maximum name length.
 +
 +
Title case is used in the name, and any characters that are not legal to use within names (such as spaces) are omitted when forming the name. The Mark as autogenerated option causes generated names to appear in a different color (dark blue by default) than user-specified names (blue by default). Preserve case forces the name to use characters as they appear within the string rather than converting them to title case. Finaily, Generate serial names causes IDA to serialize names by appending numeric suffixes (beginning with Number). The number of digits in generated suffixes is controlled by the Wïdth field. As confïgured in Figure 7-10, the first three names to be generated would be a000, aool, and a002.
 +
 +
Specifying Arrays
 +
 +
One of the drawbacks to disassembly listings derived front hïgher-level languages is that they provide very few dues regarding the size of arrays. In a disassembly listing, specifying an array can require a tremendous amount of space if each item in the array is specified on its own disassembly une. The following listing shows data declarations that follow the named variable unk_40206o. The fact that only the first item in the listing is referenced by any instructions suggests that it may be the first element in an array. Rather than being referenced directly, additïonal elements within arrays are often referenced using more complex index computations to offset front the beginning of the array.
 +
 +
IDA provides facilities for grouping consecutive data definitions together into a single array deflnition. To create an array, select the first element of the array (we chose unk_402060) and use Edit > Array to launch the array-creation dialog shown in Figure 7-11. If a data item bas been defined at a given location, then an Array option will be available when you right-click the item. The type of array to be created is dictated by the datatype associated with the item selected as the fïrst item in the array. In this case we are creating an array of bytes.
 +
 +
Prior to creating an array, make sure that you select the proper size l'or array elernents by changing the size of the first item in the array to the appropriate value.
 +
 +
Following are descriptions of useful fields for array creation:
 +
 +
Array element width
 +
 +
This value indicates the size of an individual array element (1 byte in this case) and is dictated by the size of the data value that was selected when the dialog was launched.
 +
 +
Maximum possible size
 +
 +
This value is automatically computed as the maximum number of elements (not bytes) that can be included in the array before another deflned data item is encountered. Specïfying a larger size may be possible but will require succeeding data items to be undefined in order to absorb them into the array.
 +
 +
Number of elements
 +
 +
This is where you specïfy the exact size of the array. The total number of bytes occupied by the array can be computed as Number of elements Array element width.
 +
 +
Items on a fine
 +
 +
Specifies the number of elements to be displayed on each disassembly fine. This can be used to redore the amount of space required to dïsplay the array.
 +
 +
Element width
 +
This value is for formatting purposes only and controls the column width when multiple items are displayed on a single une.
 +
 +
Use "dup" construct
 +
 +
This option causes identical data values to be grouped into a single item with a repetition specifier.
 +
 +
Signed elements
 +
 +
Dictates whether data is displayed as signed or unsigned values.
 +
 +
Display indexes
 +
 +
Causes array indexes to be displayed as regular comments. This is useful if you need to locate specific data values within large arrays. Selecting this option also enables the Indexes radio buttons so you can choose the display format for each index value.
 +
 +
Create as array
 +
 +
Not checking this may seem to go against the purpose of the dialog, and it is usually left checked. Uncheck it if your goal is sïmply to specify soute number of consecutive items without grouping them into an array.
 +
 +
Accepting the options specifïed in Figure 7-11 results in the following compact array declaration, which can be read as an array of bytes (db) named byte_402060 consisting of the value o repeated 416 (iAoh) times.

Version actuelle en date du 16 août 2019 à 02:23

After navigation, the next most significant features of IDA are designed to allow you to nodify the disassembly to suit your needs. In this chapter we will show that because of IDA's underlying database nature, changes that you make to a disassembly are easily propagated to ail IDA subviews to maïntain a consistent picture of your disassembly. One of the most powerful features that IDA offers is the ability to easily manipulate disassemblies to add new information or reformat a listing to suit your particular needs. IDA autornatically handles operations such as global search and replace when it makes sense to do so and makes trivial work of reformatting instructions and data and vice versa, features not avaïlable in other disassembly tools.

Names and Naming

At this point, we have encountered two categories of names in IDA disassemblies: names associated with virtual addresses (named locations) and names associated with stack frarne variables. In the majority of cases IDA will autornatïcally generate ail of these names according to the guidelines prevïously discussed. IDA refers to such automatically generated names as durnrny naines. Unfortunately, these names seldom hint at the intended purpose of a location or variable and therefore dont generally add to our understanding of a programs behavior. As you begin to analyze any program, one of the first and most common ways that you will want to manipulate a disassembly listing is to change default names into more meaningful names. Fortunately, IDA aliows you to easily change any name and handies ail of the details of propagating ail name changes throughout the entire disassembly. In most cases, changing a name is as simple as ciicking the name you wish to change (this highlights the name) and using the N hotkey to open a name-change dialog. Alternatively, right-chcking the name to be changed generally presents a context-sensitive menu that contains a Rename option, as shown in Figure 6-5. The name-change process does differ somewhat between stack variables and named locations, and these differences are detailed in the foilowing sections.

Parameters and Local Variables

Names associated with stack variables are the simplest form of name in a disassembly listing, primarily because they are not associated with a specific virtual address and thus can neyer appear in the Nantes wïndow. As in most programming languages, such names are consïdered to be restricted in scope based on the fonction to whïch a given stack frame belongs. Thus, every fonction in a program rnïght have its own stack variable named arg_o, but no function may have more than one variable named arg_o. The dïalog shown in Figure 7-1 is used to renarne a stack variable.

Named Locations

Renaming a named location or adding a naine to an unnamed location is slightly different [rom changing the naine of a stack variable. The process [or accessing the name-change dialog is identical (hotkey N), but thïngs quickly change. Figure 7-2 shows the renaming dialog associated with named locations.

This dialog ïnforms you exactly what address you are naming along with a lïst of attributes that can be associated with the naine. The maximum naine length merely echoes a value [rom one of ifiAs configuration files (<IDADIR>/ cfg/ida.cfg. You are free to use naines longer than ibis value, which will cause IDA to complain weakly by in[orming you that you have exceeded the maximum naine length and offering to increase the maximum naine length for you. Should you choose to do so, the new maximum naine length value will be enforced (weakiy) oniy in the current database. Any new databases that you create will continue to be governed by the maximum naine length contained in the configuration flic.

Local name

A local name is restricted in scope to the current fonction, so the uniqueness of local names is enforced only within a given fonction. Like local variables, two different fonctions may contain identical local names, but a single fonction cannot contain two local names that are identical. Named locations that exist outside fonction boundaries cannot be designated as local names. These include names that represent fonction names as well as global variables. The niost commun use for local names is to provide symbolic names for the targets ofjumps within a function, such as those associated with branching control structures.

Include in names list

Selecting this option causes a name to be added to the Nanies window, which can make the name casier to find when you wish to return to it. Autogenerated (dunimy) names are neyer included in the Nanies window by default.

Public name

A public name is typically a name that is being exported by a binary such as a shared library. IDAs parsers typïcally discover public names whïle parsing file headers during initial loading into the database. You can force a symbol to be treated as public by selecting this attribute. In general, this bas very little effect on the dïsassernbly other than to cause public annotations to be added to the name in the disassembly listing and in the Naines window.

Autogenerated name

This attribute appears to have no discernible effect on disassemblies. Selecting it dors not cause IDA to autornatically generate a name.

Weak name

A weak symbol is a specialized form of public symbol utilized only when no public symbol of the saute name is found to override il. Marking a symbol as weak bas soute significance to an assembler but littie signiflcance in an IDA disassembly.

Create name anyway

As discussed previously, no two locations within a fonction may be given the saute name. Similarly, no two locations outside any fonction (in the global scope) may be given the saute name. This option is somewhat confusing, as it behaves dïfferently depending on the type of name you are attempting to create.

If you are editing a name at the global scope (such as a fonction name or global variable) and you attempt to assign a name that is already in use in the database, IDA will display the conflicting name dialog, shown in Figure 7-3, offering to automatically generate a unique numeric sufflx to resolve the conflict. This dialog is presented regardless of whether you have selected the Create name anyway option or not.

If, however, you are editing a local name within a fonction and you attempt to assign a name that is already in use, the default behavior is simply to reject the attempt. If you are determined to use the given name, you must select Create name anyway in order to force IDA to generate a unique numeric suffix for the local name. 0f course, the sïmplest way to resolve any name conflïct is to choose a name that is not already in use.

Commenting in IDA

Another useful feature in IDA is the ability to embed comments in your databases. Comments are a particularly useful way to leave notes for yourself regarding your progress as you analyze a program. In particular, comments are helpful for describing sequences of assembly language instructions in a hïgher-level fashion. For example, you might opt to wrïte comments using C language statements to summarize the behavior of a particular function. On subsequent analysis of the function, the comments would serve to refresh your memory faster than reanalyzing the assembly language statements. IDA offers several styles of comments, each suited for a different pur-pose. Comments may be associated with any une of the disassembly listing using options available front Edit > Comments. Hotkeys or context menus offer alternate access to IDAs commenting features.

The majority of IDA comments are prefixed with a semicolon to indicate that the remainder of the une is to be considered a comment. This is similar to commenting styles used by many assemblers and equates to #-style comments in many scripting languages or //-style comments in C++.

Regular Comments

The most straightforward comment is the regular comment Regular comments are placed at the end of existing assembly unes, as at O in the preceding listing. Right-click in the right margin of the disassembly or use the colon () hotkey to activate the comment entry dialog. Regular comments will spart multiple lines if you enter multiple lines in the comment entry dialog. Each of the lines will be indented to fine up on the right side of the disassembly. To edit or delete a comment, you must reopen the comment entry dialog and edit or delete ail of the comment text as approprier. By default, regular comments are displayed as blue text.

IDA itself makes extensive use of regular comments. During the analysis phase, IDA inserts regular comments to describe parameters that are being pushed for fonction calls. This cœurs only when IDA bas parameter name or type information for the fonction being called. This information is typically contaïned within type hbraries, which are discussed in Chapter 8 and Chapter 13, but also may be entered manually.

Repea table Comments

A repeatable comment is a comment that is entered once but that may appear automatically in many locations throughout the disassembly. Location e in the previous listing shows a repeatable comment. In a disassembly listing the default color for repeatable comments is blue, making them indïstinguishable front regular comments. It is the behavior rather than the appearance that matters in this case. The behavior of repeatable comments is tïed to the concept of cross-references. When one program location refers to a second location that contaïns a repeatable comment, the comment assocïated wïth the second location is echoed at the first location. By default, the echoed comment appears as gray text, making the repeated comment distinguishable front other comments. The hotkey for repeatable comments is the semicolon (;), making it very easy to confuse repeatable comments and regular comments.

In the previous listing, note that the comment at O is identical to the comment at e. The comment at e bas been repeated because the instruction at O (jge short loc_40106C) refers to the address of O (0040106c).

A regular comment added at a location that is displaying a repeated comment overrides the repeated comment so that only the regular comment will be displayed. If you entered a regular comment at O, the repeatable comment inherited from O would no longer be displayed at O. If you then deleted the regular comment at O, the repeatable comment would once again be displayed.

A variant form of repeatable comment is associated with strings. Whenever IDA automatically creates a string variable, a virtual repeatable comment is added at ail locations referencing the string variable. We say virtual because the comment cannot be edïted by the user. The content of the virtual comment is set to the content of the string variable and displayed throughout the database just as a repeatable comment would be. As a resuit, any program locations that refer to the string variable will dïsplay the contents of the string variable as a repeated comment. The three comments annotated O demonstrate such comments displayed as a result of references to string variables.

Anterior and Posterior Unes

Anterior and posterior unes are full-line comments that appear either immediately before (anterior) or after (posterior) a given disassembly line. These comments are the only IDA comments that are notpreflxed with the semicoion character. An exampie of an anterior line comment appears at O in the prevïous listing. You can dïstinguish an anterior line from a posterior line by comparing the address associated with the line to the address associated with the instruction immedïately preceding or following the line.

Function Comments

Fwiction comments allow you to group comments for dispiay at the top of a functions disassembly listing. An example of fonction comment is shown at 0, where the fonction prototype bas been entered. You enter fonction comments by flrst highhghtïng the fonction name at the top of the fonction (0) and then addïng eïther a regular or repeatable comment. Repeatable fonction comments are echoed at any locations that rail the commented fonction. IDA will automatically generate fonction prototype-style comments when you use the Set Fonction Type command discussed in Chapter 8. Basic Code Transformations In many cases you will be perfectly content with the disassembly listings that IDA generates. In soute cases you wont. As the types of files that you analyze diverge farther and farther from ordïnary executabies generated with commun compilers, you may find that you need to take more control of the disassembly analysis and dispiay processes. This will be especïally truc if you find yourself performing analysis of obfuscated code or files that utihze a custom (unknown to IDA) file format.

Code transformations facilitated by IDA include the following:

  • Converting data into code
  • Converting code into data
  • Designating a sequence of instructions as a fonction
  • Changing the starting or ending address of an existing function
  • Changing the display format for instruction operands

The degree to which you utilize these operations depends on a wide variety of factors and personal preferences. In general, if a binary is very complex, or if IDA is not familiar wïth the code sequences generated by the compiler used to build the binary, then IDA will encounter more problems during the analysis phase, and you will need to make manual adjustments to the disassembled code.

Code Display Options

The sïmplest transformations that you can make to a disassembly listing involve customizing the amount of information that IDA generates for each disassembly line. Each dïsassembled une can be considered as a collection of parts that IDA refers to, not surprisïngly, as disassernhly fine parts. Labels, mnemonics, and operands are always present in a disassembly line. You can select additional parts for each disassembly line via Options > General on the Disassembly tab, as shown in Figure 7-4.

The DisplayDisassernblyLine Parts section in the upper right offers several options for customizing disassembly unes. For IDAs text disassembly view, une prefixes, comments, and repeatable comments are selected by default. Each item is described here and shown in the listing that foliows.

Line prefixes

A une prefix is the section: address portion of each disassembly une. Deselecting this option causes the une prefix to be removed front each disassembly une (the default in graph view). To illustrate this option, we have disabled une prefixes in the next listing.

Stack pointer

IDA performs extensive analysis on each fonction in order to track changes to the program stack pointer. This analysis is essential in understanding the layout of each functions stack frame. Selecting the Stack pointer option causes IDA to display the relative change to the stack pointer throughout the course of each fonction. This may be useful in recognizing discrepancies in calling conventions (IDA may not understand that a particular fonction uses stdcall, for example) or unusual manipulations of the stack pointer. Stack pointer tracking is shown in the column under O. In this example, the stack pointer bas changed by four bytes following the fïrst instruction and a total of Ox7C bytes following the third instruction. By the time the fonction completes, the stack pointer is restored to its original value (a relative change of zero bytes). Whenever IDA encounters a fonction return statement and detects that the stack pointer value is not zero, an error condition is flagged and the instruction line hïghlighted in red. In some cases, this might be a deliberate attempt to frustrate automated analysis. In other cases, it may be that a compiler utilizes prologues and epilogues that IDA cant accurately analyze.

Comments and repeatable comments

Deselecting eïther of these options inhibits the display of the respective comment type. This may be useful if you wïsh to declutter a disassembly listing.

Auto comments

IDA can automatically comment some instruction types. This can serve as a reminder as to bons particular instructions behave. No comments are added for trivial instructions such as the x86 nov. The comments at O are examples of auto comments. User comments take precedence over auto comments; in this case if you want to sec IDAs automatic comment for a line, youll have to remove any comments you've added (regular or repeatable).

Bad instruction <BAD> marks

IDA can mark instructions that are legal for the processor but that may not be recognïzed by some assemblers. Undocumented (as opposed to illegal) CPU instructions may faIl in this category. In such cases IDA will disassemble the instruction as a sequence of data bytes and display the undocumented instruction as a comment prefaced with <BAD>. The intent is to generate a disassembly that most assemblers can handle. Refer to the IDA help file for more information on the use of <BAD> marks.

Number of opcode bytes

Most disassemblers are capable of generating listing files that display the generated machine language bytes side by side with the assembly language instructions from whïch they are derïved. IDA allows you to view the machine language bytes assocïated with each instruction by synchronizing a hex display to the disassembly listing display. You can optïonally view machine language bytes mixed with assembly language instructions by specïfyïng the number of machine language bytes that IDA should display for each instruction.

This is fairly straightforward when you are disassembling code for processors that have a flxed instruction size, but it is somewhat more diffïcult for variable-length instruction processors such as the x86, for whïch instructions may range [rom one to more than a dozen bytes in size. Regardless of the instruction length, IDA reserves display space in the disassembly listing for the number of bytes that you specify here, pushing the remaining portions of the disassembly Une to the right to accommodate the specïfled number of opcode bytes. Number of opcode bytes bas been set to 5 in the following disassembly and can be seen in the columns under O. The + symbol at O indicates that the specïfied instruction is ton long to be fully displayed given the current settïngs.

You can further customize the disassembly display by adjusting the indentation values and margins shown in the lower right of Figure 7-4. Any changes to these options affect only the current database. Global settings for each of these options are stored in the main configuration file, <IDADIR>/cfg/ida.cfg.

Formatting Instruction Operands

During the disassembly process, IDA makes many decisions regarding how to format operands associated with each instruction. The bïggest decisions generaily revolve around how to format various integer constants used by the wide variety of instruction types. Among other things, these constants can represent relative offsets injump or cali instructions, absolute addresses of global variables, values to be used in arïthmetic operations, or programmer - defïned constants. In order to make a disassembly more readable, IDA attentats to use symbolic names rather than numbers whenever possible. In some cases, formatting decisions are made based on the context of the instruction being disassembled (such as a calI instruction); in other cases, the decision is based on the data being used (such as access to a global variable or an offset into a stack frame). In many other cases, the exact context in which a constant is being used may not be clear. When ibis happens, the associated constant is typically formatted as a hexadecimal constant. If you happen not to be one of the few people in the world who rat, sleep, and breathe hex, then you will welcome IDAs operand formatting features. Right-clicking any constant in a disassembly opens a context-sensitive menu similar to that shown in Figure 7-5.

In this case, menu options are offered enabling the constant (41h) to be reformatted as decimal, octal, or binary values. Since the constant in ibis example falls within the ASCII printable range, an option is also presented to format the value as a character constant. In ail cases, the menu dïsplays the exact text that will replace the operand text should a particular option be selected. In many cases, programmers use named constants in their source code. Such constants may be the result of #define statements (or their equivalent), or they may belong to a set of enumerated constants. Unfortunately, by the time a compiler is finïshed with the source code, it is no longer possible to determine whether the source used a symbolic constant or a literai, numeric constant. IDA maïntains a large catalog of named constants associated with many commun libraries such as the C standard lïbrary or the Windows API.

This catalog is accessible via the Use standard symbolic constant option on the context-sensitive menu associated with any constant value. Selecting this option for the constant oAh in Figure 7-5 opens the symbol-selection diaiog shown in Figure 7-6.

The dïalog is populated from IDAs internai hst of constants after fiitering according to the value of the constant we are attempting to format. In this case we sec ail of the constants that IDA knows to be equated with the value oAh. If we determïned that the value was being used in conjonction with the creation of an X.25-style network connection, then we mïght select AF_CCITT and end up with the foliowing disassembiy hne:

.text:004010A2	mov	[ebp+var_60], AFCCITT

The hst of standard constants is a useful way to determine whether a particular constant may be associated with a known name and can save a lot of time reading through API documentation in search of potentiai matches.

Manipulating Functions

There are a number of reasons that yen may wish to manipulate fonctions after the initiai autoanalysis bas been completed. In some cases, such as when IDA [ails to iocate a caH to a fonction, fonctions may not be recognïzed, as there may be no obvïous way to reach them. In other cases, IDA may faïl to properiy locale the end of a fonction, requiring some manual intervention on your part to correct the disassembiy. IDA may have trouble iocating the end of a fonction if a compiler bas spiit the fonction across several address ranges or when, in the process of optimizing code, a compiler merges common end sequences of two or more fonctions in order to save space.

Creating New Functions

Under certain circumstances, new functions can be created where no function exists. New functions can be created from existing instructions that do not already belong to a function, or they can be created [rom raw data bytes that have not been defined by IDA in any other mariner (such as double words or strings). You create functions by placing the cursor on the flrst byte or instruction to be included in the new function and selecting Edit Functions > Create Function. IDA attempts to couvert data to code if necessary. Then it scans forward to analyze the structure of the function and search for a return statement. If IDA can locate a suitable end of the function, it generates a new function riante, analyzes the stack [rame, and restructures the code in the form of a function. If it cant locate the end of the function or encounters any illegal instructions, then the operation fails.

Deleting Functions

You can delete existing functions using Edit F Functions F Delete Function. You may wish to delete a function if you believe that IDA bas erred in its autoanalysis.

Function Cliunks

Function chunks are commonly found in code generated by the Microsoft Visual C++ compiler. Chunks are the resuit of the compiler moving blocks of code that are less frequently executed in order to squeeze frequently executed blocks loto memory pages that are less likely to be swapped out. When a function is split in such a manner, IDA attempts to locate ail of the assocïated chunks by following the jumps that lead to each chunk. In most cases IDA dues a gond job of locating ail of the chunks and listing each chunk in the functions header, as shown in the following partial function disassembly:

Fonction chunks are easily reached by double-clicking the address associated with the chunk, as at O. Within the dïsassembly listing, function chunks are denoted by comments that delimit their instructions and that refer to the owning function, as shown in this listing:

ChunkedFunc

In some cases IDA may fail to locate every chunk assocïated with a function, or functions may be misidentified as chunks rather than as functions in their own right. In such cases, you may find that you need to create your own function chunks or delete existing function chunks.

You create new function chunks by selecting the range of addresses that belong to the chunk, which must not be part of any existing function, and selecting Edit > Fonctions > Append Fonction Tau. At this point you will be asked to select the parent function front a list of all defined functions. NOTE In disassernhly listings, [miction chunks air irferird to asjust that: function chunks. In the IDA menu system, functions chunks are instead re[erred to as function tails.

You can delete existing function chunks by positioning the cursor on any fine within the chunk to be deleted and selecting Edit > Fonctions > Remove Fonction Tail. At this point you will be asked to confirm your action prior to deleting the selected chunk.

If function chunks are turning out to be more trouble than they are worth, you can ask ifiA not to create function chunks by desehecting the Create function tails loader option when you first load a file into IDA. This option is one of the loader options accessible via Kernel Options (sec Chapter 4) in the initial file-load dïalog. If you disable function tails, the primary difference that you may notice is that functions that would otherwïse have contained tails containjumps to regions outside the function boundaries. IDA highlights such jumps using red lines and arrows in the arrow windows on the left side of the disassembhy. In the graph vïew for the corresponding function, the targets of suchjumps are not displayed.

Function Attributes

IDA associates a number of attributes with each function that it recognizes. The function properties dïalog shown in Figure 7-7 can be used to edit many ofthese attributes. Each attribute that can be modifïed is explained here.

Name of function

An alternative means for changing the name of a function.

Start address

The address of the fïrst instruction in the fonction. IDA rnost often dertermines this autornatically, either during analysis or from the address used during the create function operation.

End address

The address following the last instruction in the fonction. Most frequently, this is the address of the location that follows the functions return instruction. In most cases, this address is deterrnined automatically during the analysis phase or as part of fonction creation. In cases where IDA bas trouble determining the true end of a function, you niay need to edit this value rnanually. Remember, this address is not actually part of the function but foliows the last instruction in the fonction.

Local variables area

This represents the number of stack bytes dedicated to local variables (sec Figure 6-4) for the function. In rnost cases, this value is computed automatically based on analysis of stack pointer behavior within the fonction.

Saved registers

This is the number of bytes used to save registers (see Figure 6-4) on behalf of the caller. IDA considers the saved register region to lie on top of the saved return address and below any local variables assocïated wïth the fonction. Some compilers choose to save registers on top of a functions local variables. IDA considers the space required to save such registers as belonging to the local variable area rather than a dedïcated saved registers area.

Purged bytes

Purged bytes shows the number of bytes of parameters that a function removes from the stack when it returns to its caller. For cdecl functions, this value is always zero. For stdcall functions, this value represents the amount of space consumed by any parameters that are passed on the stack (see Figure 6-4). In x86 programs, IDA can automatically determine this value when it observes the use of the RET N variant of the return instruction.

Frame pointer delta

In some cases, compilers may adjust a functions frame pointer to point somewhere into the middle o[ the local variable area rather than ai the saved frame pointer ai the bottom o[ the local variable area. This distance [rom the adjusted frame pointer to the saved [rame pointer is termed the fraine pointer delta. In most cases any frame pointer delta will be computed automatically when the fonction is analyzed. Compilers utilize a stack [rame delta as a speed optimization. The purpose o[ the delta is to keep as many stack [rame variables as possible within reach of a 1-byte signed offset (-128+127) from the [rame pointer.

Additional attribute checkboxes are available to further characterize the [unction. As with other fields within the dialog, these checkboxes generally reflect the results of IDAs automatic analysis. The 1'ollowing attributes can be toggled on and off.

Dors not return

The fonction dues not return to ils caller. When such a function is called, IDA dues not assume that execution continues following the assocïated calI instruction.

Far fonction

Used to mark a [unction as a far [unction on segmented architectures. Callers o[ the fonction would need to specify both a segment and an offset value when calling the function. The need to use lac rails is typically dictated by the memory model in use within a program rather than by the fact that the architecture supports segmentation, for example, the use of the large (as opposed to flat) memory model on an x86.

Library fune

Flags a fonction as library code. Lïbrary code might include support routines ïncluded by a compiler or [unctions that are part of a statically lïnked library. Marking a fonction as a library fonction causes the fonction to be dïsplayed using the assïgned library fonction coloring to distinguish it [rom nonlibrary code.

Static func

Dues nothing other than display the static modifier in the [unctions attribute list.

BP hased frame

Indicates that the fonction utilizes a frame pointer. In most cases you determine this automatically by analyzing the functions prologue. If analysis fails to recognize that a frame pointer is used in the given finir-lion, you can manually select this attribute. If you do manually select this attribute, make sure that you adj ust the saved register size (usually Evert though this is an imported fonction, IDA aliows you to edit one piece of information concerning its behavior: the number of purged bytes associated with the fonction. By editing this fonction, you can specify the number of bytes that it clears off the stack when it returns, and IDA will propagate the information that you supply to every location that rails the fonction, ïnstantly correcting the stack pointer computations at each of those locations.

In order to improve its autornated analysis, IDA incorporates advanced techniques that attempt to resolve stack pointer discrepancies by solving a system of linear equations related to the behavior of the stack pointer. As a result, you may not even realize that IDA bas no prior knowledge of the details of fonctions such as sorne_imported_func. For more information on these techniques, refer to Ilfaks blog post titled "Simplex method in IDA Pro" at http://hexblog.corn/2006/06/ Converting Data to Code (and Vice Versa)

During the automatic analysis phase, bytes are occasionally categorïzed incorrectly. Data bytes may be ïncorrectly classifïed as code bytes and dïsassernbled into instructions, or code bytes may be incorrectly classified as data bytes and forrnatted as data values. This happens for many remous, including the fact that sorne compilers embed data into the code section of programs or the fact that soute code bytes are neyer directly referenced as code and IDA opts not to disassemble them. Obfuscated programs in particular tend to blur the distinction between code sections and data sections.

Regardless of the reason that you wish to reformat your disassernbly, doïng su is fairly easy. The first option for reformatting anything is to rernove its current formatting (code or data). It is possible to undefine fonctions, code, or data by right-clicking the item that you wïsh to undefine and selecting Undefine (also Edit > Undefine or hotkey U) from the resulting contextsensitive menu. Undefining an item causes the underlyïng bytes to be reformatted as a list of raw byte values. Large regions can be undefined by using a click-and-drag operation to select a range of addresses prior to performing the undefine operation. As an example, consider the simple fonction listing that follows:

To disassemble a sequence of undefined bytes, rïght-click the first byte to be dïsassembled and select Code (also Edit F Code or hotkey C). This causes IDA to disassemble ail bytes outil it encounters a defined item or an illegal instruction. Large regions can be converted to code by using a clickand-drag operation to select a range of addresses prior to performing the code-conversion operation.

The complementary operation of converting code to data isa httle more compiex. First, it is not possible to convert code to data using the context menu. Available alternatives inciude Edit > Data and the D hotkey. Bulk conversions of instructions to data are easiest to accomplish by flrst undefining ah of the instructions that you wish to convert to data and then formatting the data appropriately. Basic data formatting is discussed in the foilowing section.

Basic Data Transformations

Properly formatted data can be as important in deveioping an understanding of programs behavior as properiy formatted code. IDA takes information [rom a variety of sources and uses many algorïthms in order to determine the most appropriate way to format data within a dïsassembiy. A few exampies serve to iliustrate how data formats are seiected.

1. Datatypes and/or sizes can be inferred from the manner in whïch registers are used. An instruction observed to ioad a 32-bit register from memory implies that the associated memory location holds a 4-byte datatype (though we may not be able to distinguïsh between a 4-byte integer and a 4-byte pointer).

2. Fonction prototypes can be used to assign datatypes to fonction parameters. IDA maïntains a large library of fonction prototypes for exactly this purpose. Analysis is performed on the parameters passed to fonctions in an attempt to lie a parameter to a memory location. If such a relationship can be uncovered, then a datatype can be applied to the associated memory location. Consider a function whose single parameter is a pointer to a CRITICAL_SECTION (a Windows API datatype). If IDA can determine the address passed in a cali to this fonction, then IDA can flag that address as a CRITICAL_SECTION object.

3. Analysis of sequence of bytes can reveal likely datatypes. This is precisely what happens when a binary is scanned for string content. When long sequences of ASCII characters are encountered, it is not unreasonable to assume that they represent character arrays.

In the next few sections we discuss some basic transformations that you can perform on data within your disassemblies.

Specifying Data Sizes

The sïmplest way to modify a piece of data is to adjust its size. IDA offers a number of data size/type specifiers. The most commonly encountered speciflers are db, dw, and dd, representing 1-, 2-, and 4-byte data, respectïvely. The flrst way to change a data items size is via the Options > Setup Data Types dialog shown in Figure 7-8.

There are two parts to this dialog.

The left side of the dialog contains a column of buttons used to immediately change the data size of the currently selected item. The right side of the dialog contains a column of checkboxes used to configure what IDA terms the data carousel. Note that for each button on the left, there is a corresponding checkbox on the right. The data carousel is a revolving lïst of datatypes that contains only those types whose checkboxes are selected. Modïfying the contents of the data carousel bas no immediate impact on the IDA display. Instead, each type on the data carousel is listed on the context-sensitive menu that appears when you right-click a data item. Thus, it is casier to reformat data to a type listed in the data carousel than to a type not listed in the data carousel. Given the datatypes selected in Figure 7-8, right-clïckïng a data item would offer you the opportunïty to reformat that item as byte, word, or double-word data.

The name for the data carousel derives front the behavior of the associated data formatting hotkey: D. When you press D, the item at the currently selected address is reformatted to the next type in the data carousel list. With the three-ïtem list specified prevïously, an item currently formatted as db toggies to dw, an item formatted as dw toggies to dd, and an item formatted as dd toggies back to db to complete the circuit around the carousel. Using the data hotkey on a nondata item such as code causes the item to be formatted as the first datatype in the carousel list (db in this case).

Toggiing through datatypes causes data items to grow, shrïnk, or remain the saine size. If an items size remaïns the saine, then the only observable change is in the way the datais formatted. If you reduce an items size, from dd (4 bytes) to db (1 byte) for example, any extra bytes (3 in ibis case) become undefïned. If you increase the size of an item, IDA complains if the bytes following the item are already defined and asks you, in a roundabout way, if you want IDA to undefine the next item in order to expand the current item. The message you encounter in such cases is "Directly couvert to data?" This message generally means that IDA will undeflne a sufflcient number of succeedïng items to satisfy your request. For example, when converting byte data (db) to double-word data (dd), 3 additïonal bytes must be consumed to form the new data item.

Datatypes and sizes can be specified for any location that describes data, including stack variables. To change the size of stack-allocated variables, open the detailed stack frame vïew by double-clicking the variable you wïsh to modify; then change the variables size as you would any other variable.

Working with Strings

IDA recognizes a large number of string formats. By default, IDA searches for and formats C-style null-terminated strings. To force data to be converted to a string, utilize the options on the Edit > Strings menu to select a specific string style. If the bytes beginning at the currently selected address form a string of the selected style, IDA groups those bytes together into a single-string variable. At any time, you can use the A hotkey to format the currently selected location in the default string style.

Two dialogs are responsible for the configuration of string data. The flrst, shown in Figure 7-9, is accessed via Options > ASCII String Style, though ASCII in ibis case is a bit of a mïsnomer, as a much wider variety of string styles are understood.

Similar to the datatype configuration dialog, the buttons on the left are used to create a string of the specified style at the currently selected location. A string is created only if the data at the current location conforms to the specified string format. For Character terininated strings, up to two termination characters can be specified toward the bottom of the dialog. The radio buttons on the right of the dialog are used to specify the default string style associated with the use of the strings hotkey (A).

The second dialog used to conflgure string operations is the Options General dïalog, shown in Figure 7-10, where the Strings tab aliows configuration of addïtïonal strings-related options. While you can specify the default string type here as well using the avaïlable drop-down box, the majority of available options deal wïth the narning and display of string data, regardless of their type. The Name generation area on the right of the dïalog is visible only when the Generate names option is selected. When nonne generation is turned off, string variables are given dummy nonnes beginning with the asc_ prefix.

When name generation is enabled, the Name generation options control how IDA generates names for string variables. When Generate serial names is not selected (the default), the specified prefix is combined with characters taken [rom the string to generate a name that dors not exceed the current maximum name length.

Title case is used in the name, and any characters that are not legal to use within names (such as spaces) are omitted when forming the name. The Mark as autogenerated option causes generated names to appear in a different color (dark blue by default) than user-specified names (blue by default). Preserve case forces the name to use characters as they appear within the string rather than converting them to title case. Finaily, Generate serial names causes IDA to serialize names by appending numeric suffixes (beginning with Number). The number of digits in generated suffixes is controlled by the Wïdth field. As confïgured in Figure 7-10, the first three names to be generated would be a000, aool, and a002.

Specifying Arrays

One of the drawbacks to disassembly listings derived front hïgher-level languages is that they provide very few dues regarding the size of arrays. In a disassembly listing, specifying an array can require a tremendous amount of space if each item in the array is specified on its own disassembly une. The following listing shows data declarations that follow the named variable unk_40206o. The fact that only the first item in the listing is referenced by any instructions suggests that it may be the first element in an array. Rather than being referenced directly, additïonal elements within arrays are often referenced using more complex index computations to offset front the beginning of the array.

IDA provides facilities for grouping consecutive data definitions together into a single array deflnition. To create an array, select the first element of the array (we chose unk_402060) and use Edit > Array to launch the array-creation dialog shown in Figure 7-11. If a data item bas been defined at a given location, then an Array option will be available when you right-click the item. The type of array to be created is dictated by the datatype associated with the item selected as the fïrst item in the array. In this case we are creating an array of bytes.

Prior to creating an array, make sure that you select the proper size l'or array elernents by changing the size of the first item in the array to the appropriate value.

Following are descriptions of useful fields for array creation:

Array element width

This value indicates the size of an individual array element (1 byte in this case) and is dictated by the size of the data value that was selected when the dialog was launched.

Maximum possible size

This value is automatically computed as the maximum number of elements (not bytes) that can be included in the array before another deflned data item is encountered. Specïfying a larger size may be possible but will require succeeding data items to be undefined in order to absorb them into the array.

Number of elements

This is where you specïfy the exact size of the array. The total number of bytes occupied by the array can be computed as Number of elements Array element width.

Items on a fine

Specifies the number of elements to be displayed on each disassembly fine. This can be used to redore the amount of space required to dïsplay the array.

Element width This value is for formatting purposes only and controls the column width when multiple items are displayed on a single une.

Use "dup" construct

This option causes identical data values to be grouped into a single item with a repetition specifier.

Signed elements

Dictates whether data is displayed as signed or unsigned values.

Display indexes

Causes array indexes to be displayed as regular comments. This is useful if you need to locate specific data values within large arrays. Selecting this option also enables the Indexes radio buttons so you can choose the display format for each index value.

Create as array

Not checking this may seem to go against the purpose of the dialog, and it is usually left checked. Uncheck it if your goal is sïmply to specify soute number of consecutive items without grouping them into an array.

Accepting the options specifïed in Figure 7-11 results in the following compact array declaration, which can be read as an array of bytes (db) named byte_402060 consisting of the value o repeated 416 (iAoh) times.