Unix/Dos text file format _ MySQL

Source: Internet
Author: User
The UnixDos text file format has been around for one or two hours. a batch of text files cannot be imported into the database. after being saved with notepad ++, it is OK.


Later, we found that these text files are in UNIX format. Check the file format .....


In Linux, text files use "/n" to indicate line breaks, while Windows uses "/r/n" to indicate line breaks.


Bytes ------------------------------------------------------------------------------------------

Conversion between Linux and Windows text formats


Principle:

In Linux, text files use "/n" to indicate line breaks, while Windows uses "/r/n" to indicate line breaks. Therefore, Windows text files in Linux often encounter errors.

Because the dual system is installed, sometimes some files are shared between the two systems, but they have never been very concerned about this problem. in windows, you can use Word or Notepad ++ to open files. in linux, you have always used the vim editor. However, you can only use vi on the command line interface yesterday .. depressed !! Chinese support is a headache. in windows, the lines at the end of the edited text file will end with ^ M...

----------------------------------------------------

Linux provides two commands for converting text formats: dos2unix and unix2dos. dos2unix converts "/r/n" to "/n ", unix2dos converts "/n" to "/r/n ".

Dos2unix or unix2dos is not found in Ubuntu 7.10 by default.

Prompt to install the tofrodos package (Converts DOS <-> Unix text files, alias tofromdos)

$ Sudo apt-get install tofrodos

Dos2unix --> convert a windows text file to a linux Format

Unix2dos --> convert a linux text file to a windows format

Usage:

Man dos2unix

Man unix2dos

----------------------------------------------------

Use the streaming text editor sed in linux

The format of line breaks for DOS/Windows and Linux/Unix files is different. DOS/Windows-based text files have a CR (line breaks) and LF (line breaks) at the end of each line ), while UNIX text only has one line break ..

1. move the files in Dos/Windows to Linux/Unix

Although many programs do not care about CR/LF text files in DOS/Windows format, there are several programs that care about it-the most famous is bash. as long as you press enter, it will cause problems. The following sed calls convert DOS/Windows text to a trusted UNIX format:

$ Sed-e's/. $ // 'mydos.txt> myunix.txt

The script works very easily: the replacement rule expression matches the last character of a row, and the character is exactly the carriage return. We can replace it with an empty character to completely delete it from the output. If you use this script and notice that the last character of each line in the output has been deleted, you specify a text file that is already in UNIX format. So there is no need to do that!

2. move the Linux/UNIX text to the Windows system and use the following script to perform the required format conversion:

$ Sed-e's/$ // r/'myunix.txt> mydos.txt

In this script, the '$' rule expression matches the end of the row, and '/r' tells sed to insert a carriage return before it. Insert a carriage return before line feed. immediately, each line ends with CR/LF. Note that '/r' is replaced with CR only when GNU sed 3.02.80 or later is used '.

----------------------------------------------------

It is said that vim can also be used for processing (not tested ):

Enter % s/^ M $/g in Vim command mode, and press enter to automatically delete all ^ M characters in the file.

Command analysis:

% Indicates matching the entire file. s indicates Replacement. ^ M note: use Ctrl + V and Ctrl + M to input, $ After M indicates the content at the end of the matched line (replaced by space), and g indicates that all the matched content in each line must be replaced.

----------------------------------------------------

Use the tr command:

1. use a representation similar to this: tr abc xyz, which represents replacing all the letters "a" with the letter "x" and replacing all the letters "B" with the letter "y ", replace "c" with the letter "z ".

2. the tr a-z A-Z replaces all lowercase letters with the corresponding upper-case letters (for example, it converts "no smoking" to "no smoking ").

3. this special technique is very convenient when you want to emphasize a part of the text being edited in the vi editor. Just press the Escape key, then press:, and then enter 2, 4! Tr 'A-Z' 'a-Z' and press Return. Now, all the letters from row 2nd to row 4th are converted into uppercase letters.

4. In addition, when someone sends you a text file created on Mac OS or DOS/Windows, you will find tr very useful. If you do not save the file as a UNIX line break to indicate the end of the line format, you need to convert the file to the UNIX format. otherwise, some command utilities will not process the files correctly. The line at the end of Mac OS ends with a carriage return character. many text processing tools process such files as a line. To correct this problem, use the following tips:

* Mac-> UNIX: tr'/R''/n' <macfile> unixfile

* UNIX-> Mac: tr '/N'/r' <unixfile> macfile



----------------------------------------------------


Tips:
The origins and differences between Carriage Return and Line Feed.
Before the computer appeared, there was a kind of thing called Teletype Model 33, which also came from the tty concept in Linux/Unix. it can contain 10 characters per second. But there is a problem, that is, when a line breaks a line, it takes 0.2 seconds, just two characters. If a new character is passed in the 0.2 s, the character will be lost.

As a result, the developers thought of a way to solve this problem, that is, adding two end characters after each line. One is "enter", which tells the typewriter to position the print head on the left boundary, and the other is "line feed", which tells the typewriter to move the paper down one line. This is the source of "line feed" and "carriage return". They can also be seen in their English names.

Later, computers were invented, and these two concepts were invented on computers. At that time, memory was very expensive. some scientists thought it would be too waste to add two characters at the end of each line. just add one character. As a result, there were differences.

In Unix systems, each line ends with only" <换行> ", That is,"/n "; in Windows, the end of each line is" <换行> <回车> ", That is,"/n/r ". in Mac systems, the end of each line is" <回车> ", That is,"/n ";. One direct consequence is that if a file in Unix/Mac is opened in Windows, all the text will be changed to a line; if a file in Windows is opened in Unix/Mac, a ^ M symbol may be added at the end of each line.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.