Mac OS X下各種檔案編碼的轉換方法,macos

來源:互聯網
上載者:User

Mac OS X下各種檔案編碼的轉換方法,macos

    何曾幾時本貓還在windows下編碼的時候,那時ruby的原始碼的編碼格式都是gbk啊!導致N多中文顯示為亂碼。後來無奈寫了個轉碼從gbk編碼轉為utf-8格式的小工具:

#!/usr/bin/ruby
#tool 4 gbk encoding to utf8 src_path = $*[0]unless src_pathputs "usage #{$0[2..-1]} gbk_file"exit 1enddir_name,base_name = File.split(src_path)dst_path = dir_name << '/u8_' << base_namef_src = File.open(src_path,"r:gbk")f_dst = File.open(dst_path,"w:utf-8")f_src.each_with_index do |line,i|line.encode!("utf-8")if(i < 2)#line.gsub!(/gbk/,"utf-8") if(line =~ /^#[ ]*coding*/)line.gsub!(/gbk/,"utf-8") if(line =~ /^*coding*/)endf_dst.puts lineendf_src.closef_dst.close`chmod +x #{dst_path}`

再後來發現mac系統下內建iconv這個好東東啊:

ICONV(1)                   Linux Programmer's Manual                  ICONV(1)


NAME

       iconv - character set conversion


SYNOPSIS

       iconv [OPTION...] [-fencoding] [-tencoding] [inputfile ...]

       iconv -l


DESCRIPTION

       The  iconv program converts text from one encoding to another encoding.

       More precisely, it converts from the encoding given for the -f  option

       to  the  encoding  given  for  the -t option. Either of these encodings

       defaults to the encoding of the current locale. All the inputfiles  are

       read  and  converted  in  turn;  if no inputfile is given, the standard

       input is used. The converted text is printed to standard output.


       The encodings permitted are system dependent. For the  libiconv  imple-

       mentation, they are listed in the iconv_open(3) manual page.


       Options controlling the input and output format:


       -f encoding, --from-code=encoding

我們來試一下,建立一個utf-8格式的文本:

路人甲:最近又多學了德語,現在懂中文,英語和德語啊
貓貓:靠,我早精通十幾門語言了
路人甲:擦,我才不信
貓貓:組合語言,C語言,C++語言,C#語言,ruby語言,javascript語言...
路人甲:...

用iconv轉換為gbk格式(或者反向轉換也可以):

apple@kissAir: ruby_src$iconv -f UTF-8 -t GBK ex_u8.txt > ex_gbk.txt

apple@kissAir: ruby_src$cat ex_gbk.txt

·?˼?:????ֶ?ѧ?˵?????ڶ????ģ?Ӣ??͵??ﰡ

èè?????????羫ͨʮ??????????

·?˼ף??????ҲŲ???

èè?????????,C????,C++???ԣ?C#????,ruby????,javascript????...

·?˼?:...apple@kissAir: ruby_src$

我們可以看一下iconv到底支援多少種編碼格式,貌似是超多的啊:

apple@kissAir: ruby_src$iconv -l

ANSI_X3.4-1968 ANSI_X3.4-1986 ASCII CP367 IBM367 ISO-IR-6 ISO646-US ISO_646.IRV:1991 US US-ASCII CSASCII

UTF-8

UTF-8-MAC UTF8-MAC

ISO-10646-UCS-2 UCS-2 CSUNICODE

UCS-2BE UNICODE-1-1 UNICODEBIG CSUNICODE11

UCS-2LE UNICODELITTLE

ISO-10646-UCS-4 UCS-4 CSUCS4

UCS-4BE

UCS-4LE

UTF-16

UTF-16BE

UTF-16LE

UTF-32

UTF-32BE

UTF-32LE

UNICODE-1-1-UTF-7 UTF-7 CSUNICODE11UTF7

UCS-2-INTERNAL

UCS-2-SWAPPED

UCS-4-INTERNAL

UCS-4-SWAPPED

C99

JAVA

CP819 IBM819 ISO-8859-1 ISO-IR-100 ISO8859-1 ISO_8859-1 ISO_8859-1:1987 L1 LATIN1 CSISOLATIN1

ISO-8859-2 ISO-IR-101 ISO8859-2 ISO_8859-2 ISO_8859-2:1987 L2 LATIN2 CSISOLATIN2

ISO-8859-3 ISO-IR-109 ISO8859-3 ISO_8859-3 ISO_8859-3:1988 L3 LATIN3 CSISOLATIN3

ISO-8859-4 ISO-IR-110 ISO8859-4 ISO_8859-4 ISO_8859-4:1988 L4 LATIN4 CSISOLATIN4

CYRILLIC ISO-8859-5 ISO-IR-144 ISO8859-5 ISO_8859-5 ISO_8859-5:1988 CSISOLATINCYRILLIC

ARABIC ASMO-708 ECMA-114 ISO-8859-6 ISO-IR-127 ISO8859-6 ISO_8859-6 ISO_8859-6:1987 CSISOLATINARABIC

ECMA-118 ELOT_928 GREEK GREEK8 ISO-8859-7 ISO-IR-126 ISO8859-7 ISO_8859-7 ISO_8859-7:1987 ISO_8859-7:2003 CSISOLATINGREEK

HEBREW ISO-8859-8 ISO-IR-138 ISO8859-8 ISO_8859-8 ISO_8859-8:1988 CSISOLATINHEBREW

ISO-8859-9 ISO-IR-148 ISO8859-9 ISO_8859-9 ISO_8859-9:1989 L5 LATIN5 CSISOLATIN5

ISO-8859-10 ISO-IR-157 ISO8859-10 ISO_8859-10 ISO_8859-10:1992 L6 LATIN6 CSISOLATIN6

ISO-8859-11 ISO8859-11 ISO_8859-11

ISO-8859-13 ISO-IR-179 ISO8859-13 ISO_8859-13 L7 LATIN7

ISO-8859-14 ISO-CELTIC ISO-IR-199 ISO8859-14 ISO_8859-14 ISO_8859-14:1998 L8 LATIN8

ISO-8859-15 ISO-IR-203 ISO8859-15 ISO_8859-15 ISO_8859-15:1998 LATIN-9

ISO-8859-16 ISO-IR-226 ISO8859-16 ISO_8859-16 ISO_8859-16:2001 L10 LATIN10

KOI8-R CSKOI8R

KOI8-U

KOI8-RU

CP1250 MS-EE WINDOWS-1250

CP1251 MS-CYRL WINDOWS-1251

CP1252 MS-ANSI WINDOWS-1252

CP1253 MS-GREEK WINDOWS-1253

CP1254 MS-TURK WINDOWS-1254

CP1255 MS-HEBR WINDOWS-1255

CP1256 MS-ARAB WINDOWS-1256

CP1257 WINBALTRIM WINDOWS-1257

CP1258 WINDOWS-1258

850 CP850 IBM850 CSPC850MULTILINGUAL

862 CP862 IBM862 CSPC862LATINHEBREW

866 CP866 IBM866 CSIBM866

CP1131

MAC MACINTOSH MACROMAN CSMACINTOSH

MACCENTRALEUROPE

MACICELAND

MACCROATIAN

MACROMANIA

MACCYRILLIC

MACUKRAINE

MACGREEK

MACTURKISH

MACHEBREW

MACARABIC

MACTHAI

HP-ROMAN8 R8 ROMAN8 CSHPROMAN8

NEXTSTEP

ARMSCII-8

GEORGIAN-ACADEMY

GEORGIAN-PS

KOI8-T

CP154 CYRILLIC-ASIAN PT154 PTCP154 CSPTCP154

KZ-1048 RK1048 STRK1048-2002 CSKZ1048

MULELAO-1

CP1133 IBM-CP1133

ISO-IR-166 TIS-620 TIS620 TIS620-0 TIS620.2529-1 TIS620.2533-0 TIS620.2533-1

CP874 WINDOWS-874

VISCII VISCII1.1-1 CSVISCII

TCVN TCVN-5712 TCVN5712-1 TCVN5712-1:1993

ISO-IR-14 ISO646-JP JIS_C6220-1969-RO JP CSISO14JISC6220RO

JISX0201-1976 JIS_X0201 X0201 CSHALFWIDTHKATAKANA

ISO-IR-87 JIS0208 JIS_C6226-1983 JIS_X0208 JIS_X0208-1983 JIS_X0208-1990 X0208 CSISO87JISX0208

ISO-IR-159 JIS_X0212 JIS_X0212-1990 JIS_X0212.1990-0 X0212 CSISO159JISX02121990

CN GB_1988-80 ISO-IR-57 ISO646-CN CSISO57GB1988

CHINESE GB_2312-80 ISO-IR-58 CSISO58GB231280

CN-GB-ISOIR165 ISO-IR-165

ISO-IR-149 KOREAN KSC_5601 KS_C_5601-1987 KS_C_5601-1989 CSKSC56011987

EUC-JP EUCJP EXTENDED_UNIX_CODE_PACKED_FORMAT_FOR_JAPANESE CSEUCPKDFMTJAPANESE

MS_KANJI SHIFT-JIS SHIFT_JIS SJIS CSSHIFTJIS

CP932

ISO-2022-JP CSISO2022JP

ISO-2022-JP-1

ISO-2022-JP-2 CSISO2022JP2

CN-GB EUC-CN EUCCN GB2312 CSGB2312

GBK

CP936 MS936 WINDOWS-936

GB18030

ISO-2022-CN CSISO2022CN

ISO-2022-CN-EXT

HZ HZ-GB-2312

EUC-TW EUCTW CSEUCTW

BIG-5 BIG-FIVE BIG5 BIGFIVE CN-BIG5 CSBIG5

CP950

BIG5-HKSCS:1999

BIG5-HKSCS:2001

BIG5-HKSCS:2004

BIG5-HKSCS BIG5-HKSCS:2008 BIG5HKSCS

EUC-KR EUCKR CSEUCKR

CP949 UHC

CP1361 JOHAB

ISO-2022-KR CSISO2022KR

CP856

CP922

CP943

CP1046

CP1124

CP1129

CP1161 IBM-1161 IBM1161 CSIBM1161

CP1162 IBM-1162 IBM1162 CSIBM1162

CP1163 IBM-1163 IBM1163 CSIBM1163

DEC-KANJI

DEC-HANYU

437 CP437 IBM437 CSPC8CODEPAGE437

CP737

CP775 IBM775 CSPC775BALTIC

852 CP852 IBM852 CSPCP852

CP853

855 CP855 IBM855 CSIBM855

857 CP857 IBM857 CSIBM857

CP858

860 CP860 IBM860 CSIBM860

861 CP-IS CP861 IBM861 CSIBM861

863 CP863 IBM863 CSIBM863

CP864 IBM864 CSIBM864

865 CP865 IBM865 CSIBM865

869 CP-GR CP869 IBM869 CSIBM869

CP1125

EUC-JIS-2004 EUC-JISX0213

SHIFT_JIS-2004 SHIFT_JISX0213

ISO-2022-JP-2004 ISO-2022-JP-3

BIG5-2003

ISO-IR-230 TDS565

ATARI ATARIST

RISCOS-LATIN1

最後說點題外話,誇一下UNIX系統的整體性和統一性,這種統一性帶來學習成本的急劇下降,而且讓人很有成就感。比如我在ruby中知道Regex最後加i表示忽略大小寫,我有次用grep尋找的時候發覺也要忽略大小寫尋找,你猜猜我用神馬選項:grep -i xxx,就是這麼統一,這麼和諧。windows下可以嗎?哦,對了windows下人家不玩console,人家都玩視窗...




相關文章

聯繫我們

該頁面正文內容均來源於網絡整理,並不代表阿里雲官方的觀點,該頁面所提到的產品和服務也與阿里云無關,如果該頁面內容對您造成了困擾,歡迎寫郵件給我們,收到郵件我們將在5個工作日內處理。

如果您發現本社區中有涉嫌抄襲的內容,歡迎發送郵件至: info-contact@alibabacloud.com 進行舉報並提供相關證據,工作人員會在 5 個工作天內聯絡您,一經查實,本站將立刻刪除涉嫌侵權內容。

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.