Author: Zhuang Xiaoli (liigo), 2010/7/12
This article first address: http://blog.csdn.net/liigo/archive/2010/07/12/5727859.aspx
Reprinted please indicate the source: http://blog.csdn.net/liigo
Purpose of this article: in a specified lib or OBJ file, search for a piece of executable code (x86 Instruction Set) and determine the function to which it belongs.
Reason: assume that the software we write is falsely reported as a virus by the antivirus software, and that we have learned some code (x86 Instruction Set) that is regarded as a virus pattern in the software (exe) through some means) (For details, see the previous blog of liigo.) Let's assume that we have learned through some means that this code comes from a lib/obj file during the compilation link process (see my previous blog ), next, we need to search for the location signature in this lib/obj file to determine which function it comes from. This is what we will discuss in this article.
Basic Idea: parse the binary format of the Lib file (for the basic structure of LIB/obj, refer to a blog post I (liigo) and traverse all OBJ In the Lib file, traverse all sections in each OBJ and search for the pattern in rawdata of the section. If it is found, print the section) according to the function symbol and the offset, you can determine the function of the signature. Parsing the binary format of LIB files is not covered in this article. The main code framework is as follows:
// Search for <br/> clibinfo libinfo; <br/> If (! Libinfo. loadclibfile (szlibfile) <br/>{< br/> printf ("can not load the Lib file: % s/n", szlibfile); <br/> return 0; <br/>}< br/> cobjinfo * pobj = NULL; <br/> for (unsigned int objindex = 0; objindex <libinfo. m_numberofmembers; objindex ++) <br/>{< br/> pobj = libinfo. m_pobjs [objindex]; </P> <p> coff_sectionheader * pseheader header = pobj-> m_sectionheaders; <br/> for (INT sectionnumber = 1; sectionnumber <= pobj-> m_pcoffheader-> numberofsections; sectionnumber ++, pseheader header ++) <br/>{ <br/> void * pserawrawdata = pobj-> m_pcoffdata + pseheader header-> pointertorawdata; <br/> int lensectionrawdata = pseheader header-> sizeofrawdata; <br/> // search in section raw data <br/> int matchedrate = 0, matchoffset = 0; <br/> matchoffset = searchdata (unsigned char *) pserawrawdata, lensectionrawdata, (unsigned char *) signaturemem. getdata (), signaturemem. getdatasize (), ignorebyte, minmatchrate, matchedrate); <br/> If (matchoffset> = 0) <br/>{< br/> int libfileoffset = pobj-> m_pcoffdata + ctionheader header-> pointertorawdata-libinfo. m_pclibdata + matchoffset; <br/> printf ("in OBJ # % d (% s), match % d % (at section offset % d, lib offset % d) in section # % d (% s), which contains symbol (s):/N ", <br/> objindex + 1, pobj-> m_szfilename, matchedrate, matchoffset, libfileoffset, sectionnumber, pobj-> getsectionname (pseheader header); <br/> printsymbolnameinsection (pobj, sectionnumber); <br/> printf ("/tmatched data :"); <br/> printdatabytes (unsigned char *) pserawrawdata + matchoffset, signaturemem. getdatasize (); <br/> printf ("/N"); <br/>}< br/>}
In the code above, once a section matches a signature, the sequence number and name of the Section, and the sequence number and name of the OBJ to which the section belongs are output, the offset of the matched data in the Section and the offset in the file. Note that you must be familiar with the internal lib/obj format before calculating the "libfileoffset" in the file.
The following code outputs the symbolic information in a specified section. Note that you must filter out the auxiliary symbols (Aux symbols). In fact, you can also filter out the symbols of the Section and other irrelevant symbols. The output information includes the symbol name, whether it is a function, and the offset (suspect) of the symbol data in the section. It is sufficient for us to determine the function to which the signature belongs. The LIB/obj file generated by Microsoft Visual C ++ compilers. Generally, each section contains only one function definition, making it easier to make judgments.
Void printsymbolnameinsection (cobjinfo * pobj, int sectionnumber) <br/>{< br/> coff_symbol * psymbol = pobj-> m_symbols; <br/> for (unsigned int symindex = 0; symindex <pobj-> m_pcoffheader-> numberofsymbols; symindex ++, psymbol ++) <br/>{< br/> const char * szsymbolname = pobj-> getsymbolname (psymbol ); </P> <p> If (psymbol-> sectionnumber = sectionnumber) <br/> {<br/> printf ("/T % S % s, section offset % d/N ", szsymbo Lname, (psymbol-> type = 0x20? "()": ""), Psymbol-> value); <br/>}</P> <p> symindex + = psymbol-> numberofauxsymbols; <br/> psymbol + = psymbol-> numberofauxsymbols; <br/>}< br/>}
When retrieving the location signature, liigo introduced the minmatchrate and ignorebyte values to be ignored when calculating the matching rate, some x86 commands (such as the E8 command and call XXX) have the opposite address or the address to be relocated, which may not be exactly the same in EXE and LIB/obj. The Code is as follows:
Bool matchdata (unsigned char * returns archfrom, unsigned char * returns archwhat, int lensearchwhat, <br/> unsigned char ignorebyte, int minmatchrate, Int & matchedrate) <br/> {<br/> int matchtimes = 0, matchtimesall = lensearchwhat; <br/> for (INT I = 0; I <lensearchwhat; I ++) <br/> {<br/> If (distinct archwhat [I] = ignorebyte) <br/>{< br/> matchtimesall --; <br/>}< br/> else <br/> {<br/> If (partition archwhat [I] = partition archfrom [I]) <br/> matchtimes ++; <br/>}</P> <p> int rate = (matchtimes * 100/matchtimesall); <br/> If (rate> minmatchrate) <br/>{< br/> matchedrate = rate; <br/> return true; <br/>}< br/> else <br/> return false; <br/>}</P> <p> // If searched, return the offset; if not searched, return-1 <br/> int searchdata (unsigned char * Should archfrom, int lensearchfrom, unsigned char * specify archwhat, int lensearchwhat, <br/> unsigned char ignorebyte, int minmatchrate, Int & matchedrate) <br/> {<br/> for (INT I = 0; I <lensearchfrom-lensearchwhat + 1; I ++) <br/>{< br/> If (matchdata (psearchfrom + I, psearchwhat, lensearchwhat, ignorebyte, minmatchrate, matchedrate) <br/>{< br/> return I; <br/>}< br/> return-1; <br/>}
In addition, we allow users to enter hexadecimal text data, such as "ff7424 10 E8 00 00 00 00 C2 1000 ", the program needs to convert the data to binary data in the memory, convert each two letters into a byte value, and process the characters such as spaces:
Bool hextext2mem (char * szsignature, bufferedmem & MEm) <br/>{< br/> int Len = strlen (szsignature); <br/> char firstchar = '/0 '; <br/> for (INT I = 0; I <Len; I ++) <br/>{< br/> char c = szsignature [I]; </P> <p> If (C = ''| C = '/T' | C = ',') <br/>{< br/> If (firstchar) <br/> mem. appendbyte (hexchar2decimal (firstchar); <br/> firstchar = '/0'; <br/> continue; <br/>}</P> <p> bool isletterchar = (C> = 'A '&& C <= 'F') | (C> = 'A' & C <= 'F ')); <br/> bool isnumchar = (C> = '0' & C <= '9'); <br/> If (! Isletterchar &&! Isnumchar) <br/>{< br/> szsignature [I + 1] = '/0'; <br/> printf ("/nerror in hexadecimal text of signature data, the printed last char is invalid:/n/T % s/n ", szsignature); <br/> return false; <br/>}</P> <p> If (firstchar = '/0') <br/>{< br/> firstchar = C; <br/>}< br/> else <br/> {<br/> mem. appendbyte (hexchar2decimal (firstchar) * 16 + hexchar2decimal (c); <br/> firstchar = '/0 '; <br/>}</P> <p> return true; <br/>}</P> <p> int hexchar2decimal (char C) <br/>{< br/> If (C> = '0' & C <= '9') <br/> return (c-'0 '); <br/> else if (C> = 'A' & C <= 'F') <br/> return (c-'A' + 10 ); <br/> else if (C> = 'A' & C <= 'F') <br/> return (c-'A' + 10 ); <br/> else <br/> return 0; <br/>}
The final running result of the program is as follows. The search and positioning results are consistent with those obtained in the previous article in easy language (figure) (compare the searched signature file offset and matching rate ). According to the running results, we know that the "ff7424 10 ff7424 10 ff7424 10 ff7424 10 E8 00 00 00 C2 1000" section of the signature (view the corresponding assembly instructions) may come from the colecontrol :: ondoverb (), winmain (), cpropertypageex: Construct and other functions.
The full text is complete. After compiling the code, locatesym.exe (may need to be downloaded using a csdn account ).