Reading binary files in C #

Source: Internet
Author: User
Tags fread

When it comes to thinking of all the files being converted to XML, it's really a good thing. However, this is not true. There are still a large number of file formats that are not XML or even ASCII. The binaries are still propagated across the network, stored on disk, and passed between applications. By contrast, they are more efficient than text files in dealing with these problems.


In C and C + +, it is easy to read a binary file. In addition to some start-up (carriage return) and Terminator (line feed) problems, every file read to C + + is a binary file. In fact, C + + only knows the binaries, and how to make binaries like text files. When we use the language more and more abstract, the last language we use is not directly, easy to read the created file. These languages want to automate the processing of output data in their own unique way.

Where the problem lies
In many computer science fields, C and C + + continue to store and read data directly in accordance with the structure. In C and C + +, it is very simple to read and write files according to in-memory data structures. In C, you only need to use the fwrite () function and provide the following parameters: A pointer to your data, tells it how many data it has, and how big the data is. In this way, the data is written directly into the file using the binary format.

The data is written as described above, and if you know the correct data structure, it also means that it is easy to read the file. You just use the fread () function and provide the following parameters: A file handle, a pointer to the data, how many data to read, and the length of each data. The Fread () function helps you to do the rest of the work. Suddenly, the data is back in memory. There is no parsing and no object model, it simply reads the file directly into memory.

In C and C + +, the two biggest problems are data alignment (structure alignment) and byte Exchange (byte swapping). Data alignment refers to the fact that sometimes the compiler skips bytes in the middle of the data, because if the processor accesses those bytes, it is no longer in optimal condition and takes more time (typically, the processor spends twice times more time accessing misaligned data) and more instructions. Therefore, the compiler is optimized for execution speed, skipping those bytes and reordering. Byte-swapping, on the other hand, refers to the process of reordering bytes of data because of the different ways in which different processors sort bytes.

Data alignment
Because the processor can process more information at once (within a clock cycle), they want the information they handle to be arranged in a certain way. Most Intel processors allow the storage head address of an integer type (32-bit) to be removed by 4 (i.e., from an address that can be removed from 4). If the integers in memory are not stored in multiples of 4, they will not work. The compiler knows this. So when the compiler encounters a data that might cause this problem, they have the following three choices.

First, they can choose to add some useless white space in the data, so that the starting address of an integer can be removed by 4. This is one of the most common practices. Second, they can reorder the fields so that the integers are on a 4-bit boundary. Because this can cause other interesting problems, this approach is less used. The third option is to allow integers in the data to not be on 4-bit boundaries, but to copy the code to a suitable place so that those integers are on a 4-bit boundary. This approach takes some extra time to spend, but it's useful if you have to compress it.

These are mostly compiler details, you don't need to worry too much. If you use the same compiler for the program that writes the data and the program that reads the data, then this is not a problem. The compiler used the same method to process the same data, and everything was OK. But when you're involved in cross-platform file conversion issues, it's important to arrange all the data in the right way so that the information can be converted. In addition, some programmers know how to get the compiler to ignore their data.
Byte Exchange (byte swapping): High priority (big Endians) and low priority (little Endians)

High priority and low priority refers to the way in which the integers are stored in the computer in two different ways. Since integers are more than one byte, the question is whether the most important bytes should be read and written first. The least important bytes are the most frequent of the changes. This is, if you continue to add one to an integer, the least important byte to change 256 times, the sub-unimportant byte only changes once.

Different processors store integers in different ways. Intel processors typically store integers in a low-priority way, in other words, the lows are read and written first. Most other processors store integers in high-priority mode. Therefore, when binary files are read and written on different platforms, you may have to reorder the bytes to get the correct order.

On UNIX platforms, there is a special problem because UNIX can run on a variety of processors, such as Sun SPARC processors, HP processors, IBM Power PCs, inter chips, and more. When moving from one processor to another, it means that the byte order of those variables must be flipped so that they can meet the order required by the new processor.

Working with binary files in C #
There are two additional challenges to working with binary files in C #. The first challenge is that all. NET languages are strongly typed. Therefore, you have to convert the byte stream from the file to the type of data you want. The second challenge is that some data types are more complex than they are on the surface and require some kind of conversion.

Type destruction (type breaking)
Because. NET languages, including C #, are strongly typed, you cannot simply read a byte from a file, and then plug it into the data structure to get everything OK. So when you want to break the type conversion rules, you have to do this, first read the number of bytes you need into a byte array, and then copy them from beginning to end into the data structure.

Searching through the documentation for Usenet (note: A worldwide newsgroup network system), you will find several sets of programs that are architected at the microsoft.public.dotnet level, which allow you to convert any object into a series of bytes and can be re-converted back to the object. They can be found at the address below Listing A

Complex data types
In C + +, you understand what an object is, what an array is, and what is neither an object nor an array. But in C #, things are not as simple as they seem. A string is an object and therefore an array. Because in C #, there are no real arrays, many objects have no fixed dimensions, so some complex data types do not fit into fixed-size binary data.

Fortunately,. NET provides a way to solve this problem. You can tell C # what you want to do with your string and other types of arrays. This is done through the MarshalAs property. The following example uses a string in C #, which must be used before the data being controlled is used:

[MarshalAs (UnmanagedType.ByValTStr, SizeConst = 50)]
The length of the string that you want to read from a binary file or stored in a binary file determines the size of the parameter sizeconst. This determines the maximum value of the string length.
Solve the previous problem

Now, you know how the. NET introduced the problem is how to be solved. So, in the back, you can see that it's so easy to solve the problem of binary files that you've encountered before.

Packaging (Pack)
Do not bother to set the compiler to control how the data is arranged. You can simply use the StructLayout property to arrange or package the data according to your wishes. This is useful when you need different kinds of data to be packaged differently. It's like decorating your car and letting it be your hobby. Use the StructLayout property as if you were careful to decide whether to wrap each data in a compact package or just send them away, as long as they can be reread. The use of the StructLayout property is as follows:

[StructLayout (layoutkind.sequential, Pack = 1)]

Doing so allows the data to be ignored for boundary alignment, so that the data is packaged as tightly as possible. This property should be consistent with the properties of any data that you read from the binary file (that is, the attributes you write to the file should be read from the file and the attributes will remain the same).

You may find that even if you add this attribute to your data, it does not solve the problem completely. In some cases, you may have to perform tedious and repetitive experiments. This is the cause of this problem because different computers and compilers have different ways of working with each other at the binary level. Especially when it comes to cross-platform, we have to be extremely careful with binary data.. NET is a good tool for other binaries, but it's not a perfect tool.

Rollover of byte order (endian flipping)
One of the classic problems with reading and writing binary files is that some computers first store the least important bytes (for example, Inter), while others store the most important bytes first. In C and C + +, you have to deal with this problem manually, and it can only be a field rollover for one of the fields. One of the advantages of the. NET Framework is that code can access the type's metadata at run time (metadata), and you'll be able to read the information and use it to automatically resolve the order of bytes in each section of the data. You can find the source code on Listing B and you can see how it is handled.

Once you know the type of the object, you can get each part of the data and start checking each part and determine whether it is a 16-bit or 32-bit unsigned integer. In either case, you can change the sort order of the bytes without destroying the data.

Note: You do not use the String class (string) to do everything. Whether to use high or low precedence does not affect the string class. Those fields are not affected by the flip code. You just have to pay attention to unsigned integers. Because negative numbers are on different systems, the same representation is not used. A negative number can be represented by only one tick (one byte), but more commonly, it is represented by two tokens (two-bit bytes). This makes the negative numbers more difficult when cross-platform. Fortunately, negative numbers are rarely used in binary files.

This is just a few more words, the same, floating point numbers are sometimes not expressed in the standard way. Although most systems set up floating-point numbers based on the IEEE format, there are a few older systems that use other formats to set up floating-point numbers.

Overcome difficulties
While there are some problems with C #, you can still use it to read binary files. In fact, the kind of meta-data (metadata) used by C # to access an object makes it a better language to read binary files. Therefore, C # can automatically resolve byte-Exchange (byte swapping) problems for the entire data.

Reading binary files in C #

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.