Windows and UNIX file formats, including space, carriage return, Tab conversion, and related tools

Source: Internet
Author: User

Http://casec12.javaeye.com/blog/523160

Today, I have finally figured out the origins and differences between carriage return and line feed.
Before the computer appeared, there was a kind of device called teletype model 33, which can contain 10 characters per second. But there is a problem, that is, when a line breaks a line, it takes 0.2 seconds, just two characters. If a new character is passed in the 0.2 s, the character will be lost.

As a result, the developers thought of a way to solve this problem, that is, adding two end characters after each line. One is "enter", which tells the typewriter to position the print head on the left boundary, and the other is "line feed", which tells the typewriter to move the paper down one line.

This is the source of "line feed" and "Carriage Return". They can also be seen in their English names.

Later, computers were invented, and these two concepts were invented on computers. At that time, memory was very expensive. Some scientists thought it would be too waste to add two characters at the end of each line. Just add one character. As a result, there were differences.

In Unix systems, each line ends with "<line feed>", that is, "/N". In Windows systems, each line ends with "<line feed> <press enter> ", that is, "/n/R". In MAC systems, the end of each line is "<press enter> ". One direct consequence is that if a file in UNIX/MAC is opened in windows, all the text will be changed to a line; if a file in Windows is opened in UNIX/MAC, A ^ m symbol may be added at the end of each line.

C language programming (Windows)

/R means return to the beginning of the row. This overwrites the previous output of this row.

For example:
Int main (){

Cout <"HAHAHA" <"/R" <"Xixi ";

}
At last, only Xixi is displayed, and hahaha is overwritten.

/N is the carriage return + line feed, move the cursor first to the beginning of the line, and then switch to the next line, that is, pull the first line of the next line

Int main (){

Cout <"HAHAHA" <"/N" <"Xixi ";

}

 

 

Http://www.php-oa.com/

This article exposes the differences between windows and Unix formats and the various problems caused by these differences.
It also introduces some related viewing and operation tools, and provides examples in windows and UNIX.
1. Question:
We may have encountered the following confusions:
(1) how to view the binary format of a file or data stream (displayed in hexadecimal format )?
(2) Why cannot shell scripts compiled in Windows be executed in UNIX?
Why do C source files edited in Windows cannot be compiled in some GCC compilers?
(3) Why does a text file contain ^ m when I open it in the VI editor?
Why do I use Notepad on Windows to open UNIX files without line breaks?
(4) how to delete spaces or tabs at the end of a file row?
How can I convert a tab from a file to a space or a space to a tab?
How do I convert the first tab of a line to a space?
...

 

2. Analysis and solutions:
(1) how to view the binary format of a file (displayed in hexadecimal format )?
It is very common to view the binary format of any file or data stream.
Method 1: use Ctrl + H in utraedit to switch to the hexadecimal editing mode.
** Note **:
This method has a defect. It will display a single line break at the end of the line as two characters: "enter" and "line feed.
In this way, the problem (2) (3) (4) cannot be correctly viewed using this tool.
Method 2: Use the file or stream binary viewing tool fbin
Fbin can run on Windows and Various UNIX platforms,
The following command shows the first 48 bytes of the file:
$ Fbin XX. C 0 × 30
Filename: 'xx. c'
Filelen: 0x68 (104), offset: 0x38, Max output: 0x30
00000000: 2369 6e63 6c75 6465 3c73 7464 696f 2e68 # include <stdio. h
00000010: 3e0d 0a0d 0a69 6e74 6d61 696e 2829 0d0a> .... Intmain ()..
00000020: 7b0d 0a20 2020 2063 6861 7209 2020 2020 {. Char.
Fbin can accurately display every byte in the file. (for more details, see the following document)

(2) Why cannot the shell script edited in Windows be executed in UNIX?
Why do C source files edited in Windows cannot be compiled in some GCC compilers?
Cause analysis:
The Unix shell script cannot recognize the "Carriage Return" (I .e., Cr, 'R', And the hexadecimal format is displayed as 0d ),
For Windows file format (see the following for detailed analysis), always press "enter" + "line feed"
(You can use the fbin tool introduced in the previous question to check whether the file contains the "Carriage Return" line feed pair ),
The shell exported to Unix cannot be properly explained.
Solution:
Delete or press enter in Windows format.
Method 1: Use VI to open the source file and replace 'rn 'with 'N'
** Disadvantage **: it is not suitable for Batch jobs with a large number of files.
Method 2: Use utraedit to convert files in Windows format to Unix format.
(Menu) file-> conversion-> UNIX to DoS
** Disadvantage **: it is not suitable for Batch jobs with a large number of files.
Method 3: dos2unix commands in UNIX, such as $ dos2unix-k xx. c
** Disadvantages **:
This method has a fatal defect that changes the original file attributes.
For example, the executable attributes and other attributes of an executable shell script will be lost after conversion.
(That is, use-K to retain the original date .)
** Advantage **: it is suitable for Batch jobs with a large number of common files.
Method 4: win2unix (both Windows and UNIX), functions similar to dos2unix,
For example, win2unx XX. C (for more detailed examples, refer to the following article)
** Advantages **:
Overcome all the disadvantages of dos2unix (see above), which can retain any attribute of the source file.
You can also return the conversion that is unix2win.
Suitable for Batch jobs with a large number of files.

(3) Why does a text file contain ^ m when I open it in the VI editor? (See Conclusion 4 and 5)
Why do I use Notepad on Windows to open UNIX files without line breaks? (See Conclusion 1)
Cause analysis:
To solve this problem, you must first understand the differences between UNIX and Windows text files:
1) Windows text files on the disk always wrap in the form of "Carriage Return" + "line feed.
2) UNIX text files on disks always use line breaks (LF, 'n') instead of line breaks ".
(UNIX rules: When a UNIX text file is saved to a disk, the "Carriage Return" is always automatically converted to "Carriage Return" for saving,
When output to the terminal, the terminal automatically converts the "Carriage Return" to the "Carriage Return" output .)
** Easy to see **:
When a file in Windows format is changed, there is always one more carriage return ('R') character than a file in UNIX format. this was inherited from an older typewriter. The carriage return in the older typewriter refers to the print head returning to the beginning of the line. The line feed is paper feed and the next line is printed.
** Conclusion 1 **:
In this way, when a Unix-format file is opened in Windows notepad, it cannot be displayed because the file does not contain 'R.
The result shows all the contents in the same row.
** Conclusion 2 **:
Tools such as utraedit will automatically check whether the file contains 'R'. When 'R' is missing at the end of the row, it will usually prompt
The format conversion from UNIX to Windows is required. (I believe everyone will encounter this prompt ).
** Conclusion 3 **:
Tools such as utraedit and VI are automatically saved according to the original file format when saving the file. That is:
For example, if the file is opened in Windows format, the file will still be saved in Windows format (automatic conversion is not performed ).
For example, if the file is opened in UNIX format, it will save the file in UNIX format (without automatic conversion ).
** Conclusion 4 **:
In a Unix format file opened to utraedit, you can use the clipboard to "paste" several segment rows in Windows format
(Or vice versa, that is, to paste a number of Unix-format segment rows into a Windows file through the clipboard ),
The "carriage return linefeed" in the code snippet will not be automatically converted into a single "line feed" (and vice versa ).
In this case, there will be a mix of "Carriage Return" and "carriage return.
That is, there are separate "Carriage Return" and "Carriage Return" in the file ".
** Conclusion 5 **:
VI Editor, etc., both the Unix format file of the "rule" can be correctly displayed, and the windows format file of the "rule" can also be correctly displayed,
However, for irregular files that contain separate "Carriage Return" and "Carriage Return" pairs, SEE conclusion 4 ),
VI will display the carriage return in the form of ^ m.
Solution:
Use the solution provided by problem 2 to solve the problem.
To convert UNIX to Windows, use unix2dos or win2unix-R (-R stands for the opposite direction.

(4) how to delete spaces or tabs at the end of a file row?
How can I convert a tab from a file to a space or a space to a tab?
How do I convert the first tab of a line to a space?
Problem Analysis:
For various needs, especially when editing C/C ++, Java and other source programs, you often want to turn the tab in the source file into a space,
Or convert spaces into tabs, and delete unnecessary spaces or tabs at the end of the line.
If the source program body contains "spaces" or "tab", you only want to convert the "spaces" or "tab" at the beginning of the line ".
Solution:
For a single file, use the conversion function provided by the editor.
If you want to perform batch conversion, try tab2sp instead of batch conversion. It is not only suitable for batch files, but also for data streams.
Method 1: Use the utraedit conversion function, that is, convert the tab to a space in the (menu) format ,...
** Disadvantages **:
It is not suitable for Batch jobs with a large number of files.
Method 2: tab2sp (applicable to Windows and UNIX platforms ),
For example, tab2sp-T-W8 XX. C. (for more details, refer to the following section)
** Advantages **:
Suitable for Batch jobs with a large number of files.
Adaptive file or stream binary viewing tool fbin

3. Detailed tool Description: fbin, win2unix, tab2sp, and other convection or files for batch viewing/Conversion
(1) tool Introduction
Fbin, win2unix, tab2sp, and other tools view or convert files in batches,
Suitable for Windows and Various UNIX platforms.
(2) fbin-view the binary format of a stream/File
Type the following command on the command line to view online help (some content is not listed)
$ Fbin-help
Fbin-display file with hex format, version 1.0.4
Copyright (c) eybuild group, 2005,200 6. All rights reserved.
Http://www.eybuild.com, eybuild@hotmail.com
Usage: fbin [Options] [fname [0x] [offset] [maxlen] | [file1]…]
-H-Help-show this help
-W [num]-specify word-width [2/4], default 2
-P-pause for every screen
-V-verbose mode
-L-process file list replace 'fname '...
Fname-file name to display
Offset-hex number, '0x 'is optional.
Offset> = 0 from the begining of input file,
Offset <0 from the end of input file
Maxlen-max length to print
Examples:
Win2unix-P Foo. Bin
Print at most 64 (0x40) bytes from offset 0x200:
Fbin Foo. Bin 0x200 0x40
Print last 32 (0 × 20) bytes with 4-bytes word-width:
Fbin-W4 Foo. Bin-20
Process file list:
Fbin-V-l F1 F2 F3 F4 F5 F6
Example 1. view the first 64 bytes of a file in binary format:
(Run the following command in UNIX/WINDOS to get the same result)
$ Fbin. C 0 40
00000000: 2f2a 2066 6269 6e2e 6320 2d20 6c69 7374/* fbin. C-list
00000010: 2066 696c 6520 7769 7468 2062 696e 6e61 file with binna
00000020: 7279 2066 6f72 6d61 7420 2a2f 0a0a 2f2a ry format */../*
00000030: 2043 6f70 7972 6967 6874 2843 2920 copyright (c) ey
Example 2. view the last 64 bytes binary of a file:
(Run the following command in UNIX/WINDOS to get the same result)
$ Fbin. C-40
Listen 22d5: 2020 2061 7267 632d 2d2c 2061 7267 762b argc-, argv +
Listen 22e5: 2b3b 0a20 2020 2020 2020 2067 6f74 6f20 +;. Goto
Listen 22f5: 4e45 5854 3b0a 2020 2020 7d0a 0a20 2020 next ;.}..
00002305: 2072 6574 7572 6e20 4f4b 3b0a 7d0a 0a0a Return OK ;.}...
Example 3. view the 64-byte binary (4-byte width) of a file starting from 128 Bytes ):
(Run the following command in UNIX/WINDOS to get the same result)
$ Fbin-W4 fbin. C 40 40
00000040: 4275696c 64204772 6f75702c 20323030 build group, 200
00000050: 352c2032 3030362e 20416c6c 20526967 5, 2006. All rig
00000060: 68747320 52657365 72766564 2e202a2f HTS reserved .*/
00000070: 0a0a2f2a 0a6d6f64 69666963 6174696f ../*. modificatio
Example 4. Batch display of all files (4-byte width) from search (including subdirectories ):
Unix command:
$ Find ../bin-name "*. EXE" | xargs fbin-W4-l-v | less
Filename: '../bin/csp2bin.exe'
Filelen: 0x18000 (98304)
00000000: 4d5a9000 03000000 04000000 ffff0000 MZ ..............
00000010: b8000000 00000000 40000000 00000000 ........ @.......
00000020: 00000000 00000000 00000000 ................
00000030: 00000000 00000000 00000000 e0000000 ................
00000040: 0e1fba0e 00b409cd 21b8014c cd215468 ........!.. L .! Th
00000050: 69732070 127f6772 616d2063 616e6e6f is program canno
00000060: 74206265 2072756e 20696e20 444f5320 t be run in DOS
00000070: 6d6f6465 2e0d0d0a 24000000 00000000 mode .... $ .......
00000080: 08a64111 4cc72f42 4cc72f42 4cc72f42... A. L./BL./BL./B
00000090: 37db2342 4ec72f42 7ae12442 4dc72f42 7. # bn./BZ. $ BM./B
201700a0: cfdb2142 59c72f42 2ed83c42 4fc72f42 ..! By./B... <Bo./B
201700b0: 4cc72e42 0fc72f42 7ae12542 26c72f42 l.../BZ. % B &./B
201700c0: 52696368 4cc72f42 00000000 00000000 richl./B ........
201700d0: 00000000 00000000 00000000 00000000 ................
201700e0: 50450000 4c010300 3642a445 00000000 pe... 6B. E ....
000000f0: 00000000 e0000f01 0b010600 00000100 ................
00000100: 00901000 00000000 f5a80000 00100000 ................
00000110: 00100100 00004000 00100000 00100000 ...... @.........
...
WINDOS command:
E:> for/F % I in ('dir/W/B/S/A:-D eybuildbin ') Do fbin-W2-V-l % I | more
(3) win2unix-mutual conversion tool between windows and UNIX file formats
Type the following command on the command line to view online help (some content is not listed)
$ Win2unix-help
Win2unix-translate file between windows and Unix format, version 1.0.5
Usage: win2unix [Options] [[SRC] [DST] | [file1]…]
-H-Help-show this help
-R-translate file from UNIX format to Windows
-V-verbose mode
-L-process file list replace 'src' & 'dst 'pair
Src-source file or dectory
DST-destination file or dectory
Examples:
Win2unix foo.txt
Convert UNIX to Windows format:
Win2unix-r-B src.txt dst.txt
Process file list:
Win2unix-V-l F1 F2 F3 F4 F5 F6
Example 1. Convert UNIX format to Windows format:
$ Win2unx-r fbin. c
View the binary result as follows. Compared with "Example 1" in fbin, it is easy to find that the original 0a0a (two "linefeeds") in row 3rd is ")
It is converted to 0d0a0d0a (two pairs of "Carriage Return ").
$ Fbin. C 0 40
00000000: 2f2a 2066 6269 6e2e 6320 2d20 6c69 7374/* fbin. C-list
00000010: 2066 696c 6520 7769 7468 2062 696e 6e61 file with binna
00000020: 7279 2066 6f72 6d61 7420 2a2f 0d0a 0d0a ry format */....
00000030: 2f2a 2043 6f70 7972 6967 6874 2843/* copyright (c)
Example 2. Convert windows to Unix:
$ Win2unx fbin. c
View the binary result as follows. Compared with "Example 1", it is easy to find that the original 0d0a0d0a (two pairs of "carriage return linefeed") in line 3rd is as follows ")
Converted to 0a0a (two "linefeeds ")
$ Fbin. C 0 40
00000000: 2f2a 2066 6269 6e2e 6320 2d20 6c69 7374/* fbin. C-list
00000010: 2066 696c 6520 7769 7468 2062 696e 6e61 file with binna
00000020: 7279 2066 6f72 6d61 7420 2a2f 0a0a 2f2a ry format */../*
00000030: 2043 6f70 7972 6967 6874 2843 2920 copyright (c) ey
Example 3. Batch convert all objects to search (including subdirectories:
$ Find Src-name "*. c" | xargs win2unix-l-v
Convert 'win' to 'unix 'format...
Src/csp2bin. c
Src/tab2sp. c
Src/fbin. c
Src/win2unix. c
...
WINDOS command:
E:> for/F % I in ('dir/W/B/S/A:-D src/*. C') Do win2unix-V-l % I
(4) mutual conversion between tab2sp-tab and Space
Type the following command on the command line to view online help (some content is not listed)
$ Win2unix-help
Tab2sp-convert tabs to spaces or revert, version 1.0.2
Usage: tab2sp [Options] [[SRC] [DST] | [file1]…]
-H-Help-show this help
-R-convert spaces to tabs
-P-only convert line prefixed spaces or tabs
-T-Remove tail tabs and spaces
-W [num]-specify tab width [1-8], default 4
-No-don't do any convert
-V-verbose mode
-L-process file list replace 'src' & 'dst 'pair
Src-source file
DST-Destination File
Examples:
Tab2sp-r <foo.txt
Convert tabs to spaces:
Tab2sp foo.txt
Convert spaces to tabs and remove tail tabs, spaces:
Tab2sp-r-t foo.txt
Only remove tail tabs and spaces:
Tab2sp-no-T foo.txt
Process file list:
Tab2sp-V-l F1 F2 F3 F4 F5 F6
Example 1. Convert the tab in the file to "space" (4 characters by default), and delete the tab and space at the end:
$ Tab2sp-T fbin. c
You can use fbin to view the binary result as follows. check whether all tabs (09) are converted into spaces (20 ).
Examples are not listed here.
Example 2: Convert all spaces in the file to tab (4 characters in width by default), and delete the trailing tab and space:
$ Tab2sp-r-t fbin. c
Example 3. Convert the tab in the file to a space (specified as an 8-character width), and delete the tab and space at the end:
$ Tab2sp-T-W8 fbin. c
Example 4: delete only spaces at the end of line F1, F2, and F3 F4.
$ Tab2sp-no-T-l F1 F2 F3 F4
Example 5: only convert the first tab of the row to a space (specified as the 8-character width), and delete the trailing tab and space:
$ Tab2sp-p-t-L F1 F2 F3 F4
Example 6: Convert all files in batches (including subdirectories) to search for them. The command can be freely combined:
$ Find Src-name "*. c" | xargs tab2sp-l-v-W8-P
Convert 'tab 'to 'space' format...
Src/csp2bin. c
Src/tab2sp. c
Src/fbin. c
Src/win2unix. c
...
WINDOS command:
E:> for/F % I in ('dir/W/B/S/A:-D src /*. C ') Do tab2sp-l-v-W8-P % I

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.