Compatible kernel 6: Binary Image Type Recognition

Source: Internet
Author: User
Tags 04x

Apart from some embedded systems, the operating system generally has a question about how to mount the binary image of the Target Program and start it to run when a new process is created (or converted. Since the development history of computer technology does not form a single standard that is shared by all operating systems and compilation/connection tools, this loading/starting process inevitably presents diversity. In addition, even the same operating system will adopt a variety of different target image formats and loading mechanisms during its development. The emergence of the dynamic connection library technology further complicate this process, because at this time, not only the image of the target program needs to be loaded, but also the image of the dynamic Connection Library, it also solves the problem of dynamic connection between the target program and specific library functions. As for the importance of this process, it is self-evident. Otherwise, the operating system either cannot be "useful" or lose its versatility and flexibility.
Taking Linux application software as an example, both the. Out format and the ELF format support dynamic Connection Library. In "Situational Analysis", I only talked about loading and starting a. out images because a. Out is relatively simple, otherwise it will be too long. Readers may ask why the. Out format should be retained in the case of more complex and more powerful elf formats? This is of course for backward compatibility. Once widely used, a technology will not disappear soon. Compared with Linux, Windows uses more formats, because it also needs to support applications in the DOS era.
Compatible with Linux and Windows operating systems, the kernel is more complex and difficult. Fortunately, wine has already solved this problem in its way before, so that we have at least an example to learn from.
Before talking about the installation/Startup Process of wine software images, let's take a look at the image formats that wine needs to support to be compatible with Windows software, and how to identify the format and type of an image. For this reason, let's take a look at a piece of wine code, which is executed in the user space in DLLs/kernel/module. C.
This is a function named module_getbinarytype (). It identifies the image format of an opened file and determines its type. The defined types include:

Code:

enum binary_type
{
    BINARY_UNKNOWN,
    BINARY_PE_EXE,
    BINARY_PE_DLL,
    BINARY_WIN16,
    BINARY_OS216,
    BINARY_DOS,
    BINARY_UNIX_EXE,
    BINARY_UNIX_LIB
};
In addition to binary_unknown, seven types of images are defined here. Binary_pe_exe and binary_pe_dll are windows 32-bit "PE format" images. The former is the target application, and the latter is the dynamic Connection Library DLL. Note that the former is the subject of "active" and can be a process. The latter is a "passive" library program and cannot be a process independently. Binary_win16 and binary_os216 are 16-bit Windows applications. The latter is actually OS/2 Operating System applications, but since Microsoft and IBM have worked closely together, therefore, windows also supports OS/2 applications. Next, binary_dosis the dosapplication software, but the executable program on the dosis is .exe and. com. The reason is not distinguished here. Finally, binary_unix_exe and binary_unix_lib are the successors of UNIX, so they are also suitable for Linux applications and dynamic connection libraries.
We can see the code below. We will read it in segments.

Code:

enum binary_type
MODULE_GetBinaryType( HANDLE hfile,  void **res_start,  void **res_end )
{
    union
    {
        struct
        {
            unsigned char magic[4];
            unsigned char ignored[12];
            unsigned short type;
        } elf;
        struct
        {
            unsigned long magic;
            unsigned long cputype;
            unsigned long cpusubtype;
            unsigned long filetype;
        } macho;
        IMAGE_DOS_HEADER mz;
    } header;

    DWORD len;

    /* Seek to the start of the file and read the header information. */
    if (SetFilePointer( hfile, 0, NULL, SEEK_SET ) == -1)
        return BINARY_UNKNOWN;
    if (!ReadFile( hfile, &header, sizeof(header), &len, NULL ) || len != sizeof(header))
        return BINARY_UNKNOWN;
The file format/type is not described in the directory items of the file system, whether it is Linux or windows. Therefore, you can only add a header to the actual content of the file. However, the header structure and size of different executable images are different. In addition, the header may be cascading or nested, that is, the first level of the header for a large classification, and then further subdivided by the second level of the header. So here we define a Union that contains several first-level header structures. The elf is of course the head of the Linux ELF format image (but. the out format is not included, so it is not complete); macho is about for the Mach operating system, and we do not care about it; while MZ is the first-level header of DOS and Windows format, this is a relatively large data structure:

Code:

typedef struct _IMAGE_DOS_HEADER {
    WORD  e_magic;      /* 00: MZ Header signature */
    WORD  e_cblp;       /* 02: Bytes on last page of file */
    WORD  e_cp;         /* 04: Pages in file */
    WORD  e_crlc;       /* 06: Relocations */
    WORD  e_cparhdr;    /* 08: Size of header in paragraphs */
    WORD  e_minalloc;   /* 0a: Minimum extra paragraphs needed */
    WORD  e_maxalloc;   /* 0c: Maximum extra paragraphs needed */
    WORD  e_ss;         /* 0e: Initial (relative) SS value */
    WORD  e_sp;         /* 10: Initial SP value */
    WORD  e_csum;      /* 12: Checksum */
    WORD  e_ip;         /* 14: Initial IP value */
    WORD  e_cs;         /* 16: Initial (relative) CS value */
    WORD  e_lfarlc;       /* 18: File address of relocation table */
    WORD  e_ovno;       /* 1a: Overlay number */
    WORD  e_res[4];      /* 1c: Reserved words */
    WORD  e_oemid;      /* 24: OEM identifier (for e_oeminfo) */
    WORD  e_oeminfo;    /* 26: OEM information; e_oemid specific */
    WORD  e_res2[10];    /* 28: Reserved words */
    DWORD e_lfanew;     /* 3c: Offset to extended header */
} IMAGE_DOS_HEADER, *PIMAGE_DOS_HEADER;
This data structure provides a lot of information, which is closely related to the loading/startup of the Target Image in the DOS environment. For example, e_ss and e_sp indicate that the stack position is predetermined (rather than dynamically allocated), and the use of e_ss indicates that the target program runs in "real mode. And so on, I will not go into detail here. However, when Microsoft evolved from DOS to Windows and WinNT, it still applied this data structure as the first-level header of the target image of its application, winnt is obviously running in "protection mode. Therefore, many fields are no longer used for Windows Target images.
The code first moves the read/write pointer of the target file to the beginning of the file through setfilepointer () similar to lseek (), and then reads it according to the size of the Union described above, in this way, you can include several headers, especially the headers of Linux and Windows Target images.
Next we will identify:

Code:

    if (!memcmp( header.elf.magic, "/177ELF", 4 ))
    {
        /* FIXME: we don't bother to check byte order, architecture, etc. */
        switch(header.elf.type)
        {
        case 2: return BINARY_UNIX_EXE;
        case 3: return BINARY_UNIX_LIB;
        }
        return BINARY_UNKNOWN;
    }

    ......Check whether the ELF format is Linux. For the data structure definition of the elf header, the first field is the 4-byte identifier code magic, also known as "signature ". The first byte of the elf signature is '/ 123456' Of The octal format, that is, the hexadecimal '0x7f ', then there are three characters: 'E', 'l', and 'F. The Type field in the elf header further indicates the nature of the image. Currently, only two types are defined, binary_unix_exe and binary_unix_lib.
We skipped the recognition of the macho header and looked down at the recognition of the DOS/Windows header. The dos header signature is defined in include/winnt. h:

Code:

#define IMAGE_DOS_SIGNATURE   0x5A4D   /* MZ   */
#define IMAGE_OS2_SIGNATURE     0x454E   /* NE   */
#define IMAGE_OS2_SIGNATURE_LE  0x454C   /* LE   */
#define IMAGE_OS2_SIGNATURE_LX  0x584C   /* LX */
#define IMAGE_VXD_SIGNATURE   0x454C   /* LE   */
#define IMAGE_NT_SIGNATURE   . 0x00004550  /* PE00 */
The value 0x5a4d is actually the code of 'M' and 'Z'. Because Intel's CPU chip uses "Little ending", the order is reversed. Note that only MZ is used for the first-level header, and the rest are used for the second-level header.
Continue to read the code:

Code:

    /* Not ELF, try DOS */

    if (header.mz.e_magic == IMAGE_DOS_SIGNATURE)
    {
        union
        {
            IMAGE_OS2_HEADER os2;
            IMAGE_NT_HEADERS nt;
        } ext_header;

        /* We do have a DOS image so we will now try to seek into
         * the file by the amount indicated by the field
         * "Offset to extended header" and read in the
         * "magic" field information at that location.
         * This will tell us if there is more header information
         * to read or not.
         */
        if (SetFilePointer( hfile, header.mz.e_lfanew, NULL, SEEK_SET ) == -1)
            return BINARY_DOS;
        if (!ReadFile( hfile, &ext_header, sizeof(ext_header), &len, NULL ) || len < 4)
            return BINARY_DOS;

        /* Reading the magic field succeeded so we will try to determine what type it is.*/
        if (!memcmp( &ext_header.nt.Signature, "PE/0/0", 4 ))
        {
            if (len >= sizeof(ext_header.nt.FileHeader))
            {
              if (len < sizeof(ext_header.nt))  /* clear remaining part of header if missing */
                    memset( (char *)&ext_header.nt + len, 0, sizeof(ext_header.nt) - len );
              if (res_start) *res_start = (void *)ext_header.nt.OptionalHeader.ImageBase;
              if (res_end) *res_end = (void *)(ext_header.nt.OptionalHeader.ImageBase +
                     ext_header.nt.OptionalHeader.SizeOfImage);
              if (ext_header.nt.FileHeader.Characteristics & IMAGE_FILE_DLL)
                     return BINARY_PE_DLL;
              return BINARY_PE_EXE;
            }
            return BINARY_DOS;
        }
If the first-level header signature is "MZ", it is the target image of the DOS family. Windows is developed from Dos, so the target image belongs to the DOS family. Further subdivision can only be identified by the second-level header or the "extended" header. Therefore, a Union (ext_header) is defined here. The purpose of this operation is to differentiate between windows and OS/2 images. Here we only care about the Windows Target image, so we only need to look at the definition of the image_nt_headers data structure:

Code:

typedef struct _IMAGE_NT_HEADERS {
  DWORD       Signature;  /* "PE"/0/0 */ /* 0x00 */
  IMAGE_FILE_HEADER    FileHeader;  /* 0x04 */
  IMAGE_OPTIONAL_HEADER  OptionalHeader; /* 0x18 */
} IMAGE_NT_HEADERS, *PIMAGE_NT_HEADERS;
There are two headers nested in this "Header", and their data structure definitions are all in winnt. H. We will not list them here. However, it should be noted that for an executable image, image_optional_header is not "optional", but is very important. For example, there is a field in addressofentrypoint, and baseofcode and baseofdata; in addition, there is a variable-size array datadirectory []; its importance is evident.
It should also be noted that image_os2_header is not only used for OS/2 software images as shown in its name, but also for some 16-bit Windows software images and DOS software images.
The last field e_lfanew in the image_dos_header structure indicates the displacement of the expanded header in the file, so this time the read/write pointer is moved to this position.
After reading the expanded header, first check whether there is a signature in the "pe" format. Ghost image or DLL image. In addition, the number of global res_start and res_end values may be corrected based on the information provided by the read header. However, this is irrelevant to image type recognition.

If the header signature is not "pe", it may be OS/2 or another executable image of Windows/DOS. The signature of such images in the extended header is "ne ". Let's look at it again.

Code:

        if (!memcmp( &ext_header.os2.ne_magic, "NE", 2 ))
        {
            /* This is a Windows executable (NE) header.  This can
             * mean either a 16-bit OS/2 or a 16-bit Windows or even a
             * DOS program (running under a DOS extender).  To decide
             * which, we'll have to read the NE header.
             */
            if (len >= sizeof(ext_header.os2))
            {
              switch ( ext_header.os2.ne_exetyp )
              {
              case 1:  return BINARY_OS216;  /* OS/2 */
              case 2:  return BINARY_WIN16;  /* Windows */
              case 3:  return BINARY_DOS;  /* European MS-DOS 4.x */
              case 4:  return BINARY_WIN16; /* Windows 386; FIXME: is this 32bit??? */
              case 5:  return BINARY_DOS;
                                            /* BOSS, Borland Operating System Services */
              /* other types, e.g. 0 is: "unknown" */
              default:
              return MODULE_Decide_OS2_OldWin(hfile, &header.mz, &ext_header.os2);
              }
            }
            /* Couldn't read header, so abort. */
            return BINARY_DOS;
        }

        /* Unknown extended header, but this file is nonetheless DOS-executable. */
        return BINARY_DOS;
    }

    return BINARY_UNKNOWN;
}
Obviously, the ne_exetyp field in the image_os2_header header further describes the specific image type. We can also see from here that DOS, windows, and OS/2 are both me and me. In addition to 1-5, some type codes and header features are used for some of the very old OS/2 and windows (versions earlier than 3.0) target images. Further use of module_decide_os2_oldwin () I will not repeat it here to identify it. Interested readers can read and study it on their own.
Finally, the returned value of module_getbinarytype () is the type code of the target image.
With this function and the definition of the data structure, you can write a program to print the image type of the given binary image file, and print out many features and parameters of the image. Some functions are available in the tools/winedump directory of wine code, including dump_pe_header (), dump_le_header (), and dump_ne_header (). They are used to print the image headers of various formats. The following is a list of the dump_pe_header () codes for readers to read. The advantage is that the headers (such as image_file_header and image_optional_header) can be seen from these codes) the role and significance of many fields in:

Code:

static void dump_pe_header(void)
{
    const char   *str;
    IMAGE_FILE_HEADER  *fileHeader;
    IMAGE_OPTIONAL_HEADER *optionalHeader;
    unsigned   i;

    printf("File Header/n");
    fileHeader = &PE_nt_headers->FileHeader;

    printf("  Machine:                      %04X (%s)/n",
    fileHeader->Machine, get_machine_str(fileHeader->Machine));
    printf("  Number of Sections:           %d/n", fileHeader->NumberOfSections);
    printf("  TimeDateStamp:                %08lX (%s) offset %lu/n",
    fileHeader->TimeDateStamp, get_time_str(fileHeader->TimeDateStamp),
    Offset(&(fileHeader->TimeDateStamp)));
    printf("  PointerToSymbolTable:         %08lX/n", fileHeader->PointerToSymbolTable);
    printf("  NumberOfSymbols:              %08lX/n", fileHeader->NumberOfSymbols);
    printf("  SizeOfOptionalHeader:         %04X/n", fileHeader->SizeOfOptionalHeader);
    printf("  Characteristics:              %04X/n", fileHeader->Characteristics);
#define X(f,s) if (fileHeader->Characteristics & f) printf("    %s/n", s)
    X(IMAGE_FILE_RELOCS_STRIPPED,  "RELOCS_STRIPPED");
    X(IMAGE_FILE_EXECUTABLE_IMAGE,  "EXECUTABLE_IMAGE");
    X(IMAGE_FILE_LINE_NUMS_STRIPPED,  "LINE_NUMS_STRIPPED");
    X(IMAGE_FILE_LOCAL_SYMS_STRIPPED,  "LOCAL_SYMS_STRIPPED");
    X(IMAGE_FILE_16BIT_MACHINE,  "16BIT_MACHINE");
    X(IMAGE_FILE_BYTES_REVERSED_LO,  "BYTES_REVERSED_LO");
    X(IMAGE_FILE_32BIT_MACHINE,  "32BIT_MACHINE");
    X(IMAGE_FILE_DEBUG_STRIPPED,  "DEBUG_STRIPPED");
    X(IMAGE_FILE_SYSTEM,   "SYSTEM");
    X(IMAGE_FILE_DLL,    "DLL");
    X(IMAGE_FILE_BYTES_REVERSED_HI,  "BYTES_REVERSED_HI");
#undef X
    printf("/n");

    /* hope we have the right size */
    printf("Optional Header/n");
    optionalHeader = &PE_nt_headers->OptionalHeader;
    printf("  Magic                              0x%-4X         %u/n",
    optionalHeader->Magic, optionalHeader->Magic);
    printf("  linker version                     %u.%02u/n",
    optionalHeader->MajorLinkerVersion, optionalHeader->MinorLinkerVersion);
    printf("  size of code                       0x%-8lx     %lu/n",
    optionalHeader->SizeOfCode, optionalHeader->SizeOfCode);
    printf("  size of initialized data           0x%-8lx     %lu/n",
    optionalHeader->SizeOfInitializedData, optionalHeader->SizeOfInitializedData);
    printf("  size of uninitialized data         0x%-8lx     %lu/n",
    optionalHeader->SizeOfUninitializedData, optionalHeader->SizeOfUninitializedData);
    printf("  entrypoint RVA                     0x%-8lx     %lu/n",
    optionalHeader->AddressOfEntryPoint, optionalHeader->AddressOfEntryPoint);
    printf("  base of code                       0x%-8lx     %lu/n",
    optionalHeader->BaseOfCode, optionalHeader->BaseOfCode);
    printf("  base of data                       0x%-8lX     %lu/n",
    optionalHeader->BaseOfData, optionalHeader->BaseOfData);
    printf("  image base                         0x%-8lX     %lu/n",
    optionalHeader->ImageBase, optionalHeader->ImageBase);
    printf("  section align                      0x%-8lx     %lu/n",
    optionalHeader->SectionAlignment, optionalHeader->SectionAlignment);
    printf("  file align                         0x%-8lx     %lu/n",
    optionalHeader->FileAlignment, optionalHeader->FileAlignment);
    printf("  required OS version                %u.%02u/n",
    optionalHeader->MajorOperatingSystemVersion,
           optionalHeader->MinorOperatingSystemVersion);
    printf("  image version                      %u.%02u/n",
    optionalHeader->MajorImageVersion, optionalHeader->MinorImageVersion);
    printf("  subsystem version                  %u.%02u/n",
    optionalHeader->MajorSubsystemVersion, optionalHeader->MinorSubsystemVersion);
    printf("  Win32 Version       0x%lX/n", optionalHeader->Win32VersionValue);
    printf("  size of image        0x%-8lx     %lu/n",
    optionalHeader->SizeOfImage, optionalHeader->SizeOfImage);
    printf("  size of headers       0x%-8lx     %lu/n",
    optionalHeader->SizeOfHeaders, optionalHeader->SizeOfHeaders);
    printf("  checksum           0x%lX/n", optionalHeader->CheckSum);
    switch (optionalHeader->Subsystem)
    {
    default:
    case IMAGE_SUBSYSTEM_UNKNOWN:   str = "Unknown";  break;
    case IMAGE_SUBSYSTEM_NATIVE:    str = "Native";  break;
    case IMAGE_SUBSYSTEM_WINDOWS_GUI:  str = "Windows GUI";  break;
    case IMAGE_SUBSYSTEM_WINDOWS_CUI:  str = "Windows CUI";  break;
    case IMAGE_SUBSYSTEM_OS2_CUI:    str = "OS/2 CUI";  break;
    case IMAGE_SUBSYSTEM_POSIX_CUI:   str = "Posix CUI";  break;
    }
    printf("  Subsystem      0x%X (%s)/n", optionalHeader->Subsystem, str);
    printf("  DLL flags       0x%X/n", optionalHeader->DllCharacteristics);
    printf("  stack reserve size    0x%-8lx     %lu/n",
    optionalHeader->SizeOfStackReserve, optionalHeader->SizeOfStackReserve);
    printf("  stack commit size    0x%-8lx     %lu/n",
    optionalHeader->SizeOfStackCommit, optionalHeader->SizeOfStackCommit);
    printf("  heap reserve size     0x%-8lx     %lu/n",
    optionalHeader->SizeOfHeapReserve, optionalHeader->SizeOfHeapReserve);
    printf("  heap commit size     0x%-8lx     %lu/n",
    optionalHeader->SizeOfHeapCommit, optionalHeader->SizeOfHeapCommit);
    printf("  loader flags         0x%lX/n", optionalHeader->LoaderFlags);
    printf("  RVAs & sizes       0x%lX/n", optionalHeader->NumberOfRvaAndSizes);
    printf("/n");

    printf("Data Directory/n");
    printf("%ld/n",
          optionalHeader->NumberOfRvaAndSizes* sizeof(IMAGE_DATA_DIRECTORY));

    for (i = 0;  i < optionalHeader->NumberOfRvaAndSizes && i < 16;  i++)
    {
printf("  %-12s  rva: 0x%-8lX  size: %8lu/n",
         DirectoryNames[i],
         optionalHeader->DataDirectory[i].VirtualAddress,
         optionalHeader->DataDirectory[i].Size);
    }
    printf("/n");
}
In the code, pe_nt_headers is a data structure pointer of the PE format header, and the content of the header has been read before. RVA is short for "relative virtual address", indicating the virtual address of a symbol relative to the starting point of its floating code block.
Now, the reader should have understood how to identify the target image and how to explain the role and significance of many fields and flag spaces in the image header, this laid the foundation for further investigating the loading and startup of the target image. In future discussions, I will explain how wine loads and starts the target image. By then, the role and significance of these fields will be clearer.
The above wine code is executed in the user space, but it is also easy to move the code to the kernel.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.