Various file encoding conversion methods in Mac OS X, macos

Source: Internet
Author: User

Various file encoding conversion methods in Mac OS X, macos

How long was it when the cat was still coding in windows? At that time, the ruby source code encoding formats were all gbk! As a result, more than N Chinese characters are displayed as garbled characters. Later, I wrote a small tool to convert the code from gbk encoding to UTF-8:


#!/usr/bin/ruby
#tool 4 gbk encoding to utf8 

src_path = $*[0]
unless src_path
	puts "usage #{$0[2..-1]} gbk_file"
	exit 1
end

dir_name,base_name = File.split(src_path)
dst_path = dir_name << '/u8_' << base_name
f_src = File.open(src_path,"r:gbk")
f_dst = File.open(dst_path,"w:utf-8")

f_src.each_with_index do |line,i|
	line.encode!("utf-8")
	if(i < 2)
		#line.gsub!(/gbk/,"utf-8") if(line =~ /^#[ ]*coding*/)
		line.gsub!(/gbk/,"utf-8") if(line =~ /^*coding*/)
	end
	f_dst.puts line
end

f_src.close
f_dst.close
`chmod +x #{dst_path}`


Then I found out that iconv comes with the mac system:


ICONV (1) Linux Programmer's Manual ICONV (1)


NAME

Iconv-character set conversion


SYNOPSIS

Iconv [OPTION...] [-fencoding] [-tencoding] [inputfile...]

Iconv-l


DESCRIPTION

The iconv program converts text from one encoding to another encoding.

More precisely, it converts from the encoding given for the-f option

To the encoding given for the-t option. Either of these encodings

Ults to the encoding of the current locale. All the inputfiles are

Read and converted in turn; if no inputfile is given, the standard

Input is used. The converted text is printed to standard output.


The encodings permitted are system dependent. For the libiconv imple-

Mentation, they are listed in the iconv_open (3) manual page.


Options controlling the input and output format:


-F encoding, -- from-code = encoding

Let's try to create a UTF-8 text:


Passerby A: I have learned more German recently. Now I understand Chinese, English, and German.
Cat and CAT: By the way, I have mastered more than a dozen languages.
Passerby A: I don't believe it.
Cat: assembly language, C language, C ++ language, C # language, ruby language, javascript language...
Passerby :...

Use iconv to convert to gbk format (or reverse conversion ):


Apple @ kissAir: ruby_src $ iconv-f UTF-8-t GBK ex_u8.txt> ex_gbk.txt

Apple @ kissAir: ruby_src $ cat ex_gbk.txt

·? Why? :???? Why? Why? Why ?? Why ??? Parameters ???? Why? Why ?? Why ?? Bytes

????????? There are too many problems ??????????

·? Too many ?????? Too many ???

?????????, C ????, C ++ ??? Why? C #????, Ruby ????, Javascript ????...

·? Why? :... Apple @ kissAir: ruby_src $

Let's take a look at the number of encoding formats supported by iconv. It seems that there are many formats:


Apple @ kissAir: ruby_src $ iconv-l

ANSI_X3.4-1968 ANSI_X3.4-1986 ASCII CP367 IBM367 ISO-IR-6 ISO646-US ISO_646.IRV: 1991 US US-ASCII CSASCII

UTF-8

UTF-8-MAC UTF8-MAC

ISO-10646-UCS-2 UCS-2 CSUNICODE

UCS-2BE UNICODE-1-1 UNICODEBIG CSUNICODE11

UCS-2LE UNICODELITTLE

ISO-10646-UCS-4 UCS-4 CSUCS4

UCS-4BE

UCS-4LE

UTF-16

UTF-16BE

UTF-16LE

UTF-32

UTF-32BE

UTF-32LE

UNICODE-1-1-UTF-7 UTF-7 CSUNICODE11UTF7

UCS-2-INTERNAL

UCS-2-SWAPPED

UCS-4-INTERNAL

UCS-4-SWAPPED

C99

JAVA

CP819 IBM819 ISO-8859-1 ISO-IR-100 ISO8859-1 ISO_8859-1: 1987 L1 LATIN1 CSISOLATIN1

ISO-8859-2 ISO-IR-101 ISO8859-2 ISO_8859-2: 1987 L2 LATIN2 CSISOLATIN2

ISO-8859-3 ISO-IR-109 ISO8859-3 ISO_8859-3: 1988 L3 LATIN3 CSISOLATIN3

ISO-8859-4 ISO-IR-110 ISO8859-4 ISO_8859-4: 1988 L4 LATIN4 CSISOLATIN4

CYRILLIC ISO-8859-5 ISO-IR-144 ISO8859-5 ISO_8859-5: 1988 CSISOLATINCYRILLIC

ARABIC ASMO-708 ECMA-114 ISO-8859-6 ISO-IR-127 ISO8859-6 ISO_8859-6: 1987 CSISOLATINARABIC

ECMA-118 ELOT_928 GREEK GREEK8 ISO-8859-7 ISO-IR-126 ISO8859-7 ISO_8859-7: 1987 ISO_8859-7: 2003 CSISOLATINGREEK

HEBREW ISO-8859-8 ISO-IR-138 ISO8859-8 ISO_8859-8: 1988 CSISOLATINHEBREW

ISO-8859-9 ISO-IR-148 ISO8859-9 ISO_8859-9: 1989 L5 LATIN5 CSISOLATIN5

ISO-8859-10 ISO-IR-157 ISO8859-10 ISO_8859-10: 1992 L6 LATIN6 CSISOLATIN6

ISO-8859-11 ISO8859-11 ISO_8859-11

ISO-8859-13 ISO-IR-179 ISO8859-13 L7 LATIN7

ISO-8859-14 ISO-CELTIC ISO-IR-199 ISO8859-14 ISO_8859-14: 1998 L8 LATIN8

ISO-8859-15 ISO-IR-203 ISO8859-15 ISO_8859-15: 1998 LATIN-9

ISO-8859-16 ISO-IR-226 ISO8859-16 ISO_8859-16: 2001 L10 LATIN10

KOI8-R CSKOI8R.

KOI8-U

KOI8-RU

CP1250 MS-EE WINDOWS-1250

CP1251 MS-CYRL WINDOWS-1251

CP1252 MS-ANSI WINDOWS-1252

CP1253 MS-GREEK WINDOWS-1253

CP1254 MS-TURK WINDOWS-1254

CP1255 MS-HEBR WINDOWS-1255

CP1256 MS-ARAB WINDOWS-1256

CP1257 winbaltrim windows-1257

CP1258 WINDOWS-1258

850 CP850 IBM850 CSPC850MULTILINGUAL

862 CP862 IBM862 CSPC862LATINHEBREW

866 CP866 IBM866 CSIBM866

CP1131

Mac macintosh macroman csmacloud

MACCENTRALEUROPE

MACICELAND

MACCROATIAN

MACROMANIA

MACCYRILLIC

MACUKRAINE

MACGREEK

MACTURKISH

MACHEBREW

MACARABIC

MACTHAI

HP-ROMAN8 R8 ROMAN8 CSHPROMAN8

NEXTSTEP

ARMSCII-8

GEORGIAN-ACADEMY

GEORGIAN-PS

KOI8-T

CP154 CYRILLIC-ASIAN PT154 PTCP154 CSPTCP154

KZ-1048 RK1048 STRK1048-2002 CSKZ1048

MULELAO-1

CP1133 IBM-CP1133

ISO-IR-166 TIS-620 TIS620 TIS620-0 TIS620.2529-1 TIS620.2533-0 TIS620.2533-1

CP874 WINDOWS-874

VISCII VISCII1.1-1 (CSVISCII)

TCVN TCVN-5712 TCVN5712-1 TCVN5712-1: 1993

ISO-IR-14 ISO646-JP JIS_C6220-1969-RO JP cs1crjisc6220ro

JISX0201-1976 JIS_X0201 X0201 CSHALFWIDTHKATAKANA

ISO-IR-87 JIS0208 JIS_C6226-1983 JIS_X0208 JIS_X0208-1983 X0208 CSISO87JISX0208

ISO-IR-159 JIS_X0212 JIS_X0212-1990 JIS_X0212.1990-0 X0212 CSISO159JISX02121990

CN GB_1988-80 ISO-IR-57 ISO646-CN CSISO57GB1988

CHINESE GB_2312-80 ISO-IR-58 CSISO58GB231280

CN-GB-ISOIR165 ISO-IR-165

ISO-IR-149 KOREAN KSC_5601 KS_C_5601-1987 KS_C_5601-1989 CSKSC56011987

EUC-JP EUCJP EXTENDED_UNIX_CODE_PACKED_FORMAT_FOR_JAPANESE CSEUCPKDFMTJAPANESE

MS_KANJI SHIFT-JIS SHIFT_JIS sjis csshiftjis.

CP932

ISO-2022-JP CSISO2022JP

ISO-2022-JP-1

ISO-2022-JP-2 (CSISO2022JP2)

CN-GB EUC-CN EUCCN GB2312 CSGB2312

GBK

CP936 MS936 WINDOWS-936

GB18030

ISO-2022-CN (CSISO2022CN)

ISO-2022-CN-EXT

HZ HZ-GB-2312

EUC-TW EUCTW (CSEUCTW)

BIG-5 BIG-FIVE BIG5 BIGFIVE CN-BIG5 CSBIG5

CP950

BIG5-HKSCS: 1999

BIG5-HKSCS: 2001

BIG5-HKSCS: 2004

BIG5-HKSCS BIG5-HKSCS: 2008 BIG5HKSCS

EUC-KR EUCKR CSEUCKR

CP949 UHC

CP1361 JOHAB

ISO-2022-KR (CSISO2022KR)

CP856

CP922

CP943

CP1046

CP1124

CP1129

CP1161 IBM-1161 (IBM1161 CSIBM1161)

CP1162 IBM-1162 (IBM1162 CSIBM1162)

CP1163 IBM-1163 (IBM1163 CSIBM1163)

DEC-KANJI

DEC-HANYU

437 CP437 IBM437 CSPC8CODEPAGE437

CP737

CP775 IBM775 CSPC775BALTIC

852 CP852 IBM852 CSPCP852

CP853

855 CP855 IBM855 CSIBM855

857 CP857 IBM857 CSIBM857

CP858

860 CP860 IBM860 CSIBM860

861 CP-IS CP861 IBM861 CSIBM861

863 CP863 IBM863 CSIBM863

CP864 IBM864 CSIBM864

865 CP865 IBM865 CSIBM865

869 CP-GR CP869 IBM869 CSIBM869

CP1125

EUC-JIS-2004 EUC-JISX0213

SHIFT_JISX0213 SHIFT_JIS-2004

ISO-2022-JP-2004 ISO-2022-JP-3

BIG5-2003

ISO-IR-230 TDS565

ATARI ATARIST

RISCOS-LATIN1

Finally, let's talk a little bit about the integrity and uniformity of UNIX systems. This uniformity brings about a sharp reduction in learning costs and brings a sense of accomplishment. For example, I know in ruby that the regular expression is added with I to indicate that case sensitivity is ignored. When I used grep to find the regular expression, I found that case sensitivity is also ignored. You can guess that I used the Shenma option: grep-I xxx, so unified and harmonious. Windows? Oh, by the way, in windows, people do not play the console, and people all play the window...





Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.