2009-05-13 javaeye http://angeloce.iteye.com/admin/blogs/385976
==================================================
YAML是一種直觀的能夠被電腦識別的的資料序列化格式,容易被人類閱讀,並且容易和指令碼語言互動。YAML類似於XML,但是文法比XML簡單得多,對於轉化成數組或可以hash的資料時是很簡單有效。
YAML文法規則:
http://www.ibm.com/developerworks/cn/xml/x-cn-yamlintro/
http://www.yaml.org/
YAML被很多人認為是可以超越xml和json的檔案格式。對比xml,除了擁有xml的眾多優點外,它足夠簡單,便於使用。而對於json,YAML可以寫成正常化的設定檔(這我認為是高於json很多的優點,用json寫設定檔會讓人發瘋)。
YAML使用寄主語言的資料類型,這在多種語言中流傳的時候可能會引起相容性的問題。
如何寫yaml?(抄的)
name: Tom Smithage: 37spouse: name: Jane Smith age: 25children: - name: Jimmy Smith age: 15 - name1: Jenny Smith age1: 12
具體文法請參照yaml文法規則。
--------------------------------------------------------------------------------------------
yaml在python上的具體實現:PyYaml
將yaml寫成配置指令碼test.yaml ,以下介紹如何讀寫yaml配置。
使用python的yaml庫PyYAML。http://pyyaml.org/
安裝到python lib下後就可以正常使用了。
#載入yamlimport yaml#讀取檔案f = open('test.yaml')#匯入x = yaml.load(f)print x
也許你會得到以下類似的strings:
{'age': 37, 'spouse': {'age': 25, 'name': 'Jane Smith'}, 'name': 'Tom Smith', 'children': [{'age': 15, 'name': 'Jimmy Smith'}, {'age1': 12, 'name1': 'Jenny Smith'}]}
python上使用yaml庫很簡單,基本就使用兩個函數:
yaml.load
yaml.dump
對於使用過pickle的各位童鞋來說,這意味著什麼不用詳說了吧?
Warning: It is not safe to call yaml.load with any data received from an untrusted source!yaml.load is as powerful as pickle.load and so may call any Python function.
對於yaml的讀取來講,最難的在於寫出正確的yaml資料格式。如果一不小心出錯,將會導致load異常,但有時沒有異常報,而是會讀不出任何資料。
pyYaml是完全的python實現,號稱比pickle更nb。(這誰知道呢?)
yaml.load accepts a byte string, a Unicode string, an open binary file object, or an open text file object. A byte string or a file must be encoded with utf-8, utf-16-be or utf-16-le encoding. yaml.loaddetects the encoding by checking the BOM (byte order mark) sequence at the beginning of the string/file. If no BOM is present, the utf-8 encoding is assumed.
yaml.load可接收一個byte字串,unicode字串,開啟的二進位檔案或文字檔對象。位元組字串和檔案必須是utf-8,utf-16-be或utf-16-le編碼的.yaml.load通過檢查字串/檔案開始的BOM(位元組序標記)來確認編碼。如果沒有BOM,就預設為utf-8。
百度上的關於BOM 在UCS 編碼中有一個叫做"ZERO WIDTH NO-BREAK SPACE"的字元,它的編碼是FEFF。而FFFE在UCS中是不存在的字元,所以不應該出現在實際傳輸中。UCS規範建議我們在傳輸位元組流前,先傳輸字元"ZERO WIDTH NO-BREAK SPACE"。這樣如果接收者收到FEFF,就表明這個位元組流是Big-Endian的;如果收到FFFE,就表明這個位元組流是Little- Endian的。因此字元"ZERO WIDTH NO-BREAK SPACE"又被稱作BOM。
UTF-8不需要BOM來表明位元組順序,但可以用BOM來表明編碼方式。字元"ZERO WIDTH NO-BREAK SPACE"的UTF-8編碼是EF BB BF。所以如果接收者收到以EF BB BF開頭的位元組流,就知道這是UTF-8編碼了。Windows就是使用BOM來標記文字檔的編碼方式的。
yaml.load 會返回一個python對象。關於會是什麼……看你資料是什麼了……
If a string or a file contains several documents, you may load them all with the yaml.load_all function.
如果string或檔案包含幾塊yaml文檔,你可以使用yaml.load_all來解析全部的文檔。
yaml.load(stream, Loader=<class 'yaml.loader.Loader'>) Parse the first YAML document in a stream #只解析第一個 and produce the corresponding Python object.yaml.load_all(stream, Loader=<class 'yaml.loader.Loader'>) Parse all YAML documents in a stream and produce corresponding Python objects.
yaml.load_all 會產生一個迭代器,你要做的就是for 讀出來
documents = """name: The Set of Gauntlets 'Pauraegen'description: > A set of handgear with sparks that crackle across its knuckleguards. ---name: The Set of Gauntlets 'Paurnen'description: > A set of gauntlets that gives off a foul, acrid odour yet remains untarnished. ---name: The Set of Gauntlets 'Paurnimmen'description: > A set of handgear, freezing with unnatural cold."""for data in yaml.load_all(documents):print data#{'description': 'A set of handgear with sparks that crackle across its #knuckleguards.\n',#'name': "The Set of Gauntlets 'Pauraegen'"}#{'description': 'A set of gauntlets that gives off a foul, acrid odour #yet remains untarnished.\n',#'name': "The Set of Gauntlets 'Paurnen'"}#{'description': 'A set of handgear, freezing with unnatural cold.\n',#'name': "The Set of Gauntlets 'Paurnimmen'"}
PyYAML allows you to construct a Python object of any type.
Even instances of Python classes can be constructed using the !!python/object tag.
PyYaml允許你構建任何類型的python對象,甚至是python類執行個體,只需要藉助一下yaml標籤!!python/object。
這個以後再說,非常有用的東西。
Note that the ability to construct an arbitrary Python object may be dangerous if you receive a YAML document from an untrusted source such as Internet. The function yaml.safe_load limits this ability to simple Python objects like integers or lists.
需要注意的是隨意在yaml裡構建python對象是有一定危險的,尤其是接收到一個未知的yaml文檔。yaml.safe_load可以限制這個能力,就使用些簡單的對象吧。
---------------------------------------
Dumping YAML
The yaml.dump function accepts a Python object and produces a YAML document.
yaml.dump 將一個python對象產生為yaml文檔,與yaml.load搭配使用。
dump(data, stream=None, Dumper=<class 'yaml.dumper.Dumper'>, **kwds) Serialize a Python object into a YAML stream. If stream is None, return the produced string instead. #很好,如果預設資料流為空白的話,就會給你返回個字串作為yaml文檔
aproject = {'name': 'Silenthand Olleander', 'race': 'Human', 'traits': ['ONE_HAND', 'ONE_EYE'] }print yaml.dump(aproject)#返回#name: Silenthand Olleander#race: Human#traits: [ONE_HAND, ONE_EYE]
yaml.dump accepts the second optional argument, which must be an open text or binary file. In this case, yaml.dump will write the produced YAML document into the file. Otherwise, yaml.dump returns the produced document.
解釋上面那句話的:yaml.dump接收的第二個參數一定要是一個開啟的文字檔或二進位檔案,yaml.dump會把產生的yaml文檔寫到檔案裡。否則,yaml.dump會返回產生的文檔。
If you need to dump several YAML documents to a single stream, use the function yaml.dump_all.yaml.dump_all accepts a list or a generator producing
Python objects to be serialized into a YAML document. The second optional argument is an open file.
如果你需要把幾段yaml文檔同時寫進一個資料流中,請使用yaml.dump_all函數。yaml.dump_all可以接收一個列表或者產生python對象的可序列化產生器(好彆扭啊),第二個參數是開啟的檔案。這完全是對應yaml.load_all的。
You may even dump instances of Python classes.
你甚至可以直接把python類的執行個體(對象)dump進去。
yaml.dump supports a number of keyword arguments that specify formatting details for the emitter. For instance, you may set the preferred intendation and width, use the canonical YAML format or force preferred style for scalars and collections.
yaml.dump支援很多種確定格式化發射器的關鍵字參數(請先無視這句- -#)。比如你可以設定縮排和寬度(指的yaml文檔),使用標準yaml格式或者強制優先樣式對於標量和收集(請繼續無視- -#)。
瞧這翻譯的。
dump_all(documents, stream=None, Dumper=<class 'yaml.dumper.Dumper'>, default_style=None, default_flow_style=None, canonical=None, indent=None, width=None, allow_unicode=None, line_break=None, encoding='utf-8', explicit_start=None, explicit_end=None, version=None, tags=None)#不過對應具體的函數參數可以看出所敘述的幾個參數#cannonical#indent#width#等等
舉例
>>> print yaml.dump(range(50))[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49]>>> print yaml.dump(range(50), width=50, indent=4)[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49]>>> print yaml.dump(range(5), canonical=True)---!!seq [ !!int "0", !!int "1", !!int "2", !!int "3", !!int "4",]>>> print yaml.dump(range(5), default_flow_style=False)- 0- 1- 2- 3- 4>>> print yaml.dump(range(5), default_flow_style=True, default_style='"')[!!int "0", !!int "1", !!int "2", !!int "3", !!int "4"]
這關鍵都在後面的參數呢。
------------------------------------------------------
Constructors, representers, resolvers
構造器,描繪器(?),解析器
You may define your own application-specific tags. The easiest way to do it is to define a subclass ofyaml.YAMLObject
你可以自訂一個程式專屬標籤(tag),定義一個yaml.YAMLObject的子類的最簡單方法可以這麼幹:
class Monster(yaml.YAMLObject): yaml_tag = u'!Monster' def __init__(self, name, hp, ac, attacks): self.name = name self.hp = hp self.ac = ac self.attacks = attacks def __repr__(self): return "%s(name=%r, hp=%r, ac=%r, attacks=%r)" % ( self.__class__.__name__, self.name, self.hp, self.ac,self.attacks)
The above definition is enough to automatically load and dump Monster objects:
上面這個定義的Monster類已經足夠用來load和dump了:
>>> yaml.load("""... --- !Monster... name: Cave spider... hp: [2,6] # 2d6... ac: 16... attacks: [BITE, HURT]... """)Monster(name='Cave spider', hp=[2, 6], ac=16, attacks=['BITE', 'HURT'])>>> print yaml.dump(Monster(... name='Cave lizard', hp=[3,6], ac=16, attacks=['BITE','HURT']))!Monsterac: 16attacks: [BITE, HURT]hp: [3, 6]name: Cave lizard
yaml.YAMLObject uses metaclass magic to register a constructor, which transforms a YAML node to a class instance, and a representer, which serializes a class instance to a YAML node.
yaml.YAMLObject 使用魔法元類註冊一個把yaml編碼轉成類執行個體的構造器,還有一個把類執行個體序列化成yaml編碼的描述器。
If you don't want to use metaclasses, you may register your constructors and representers using the functions yaml.add_constructor and yaml.add_representer. For instance, you may want to add a constructor and a representer for the following Dice class:
如果不想使用元類,也可以使用函數yaml.add_constructor和yaml.add_representer來註冊構造器和描述器。例如,你可以把一個構造器和描述器加到下面這個Dice類裡:
>>> class Dice(tuple):... def __new__(cls, a, b):... return tuple.__new__(cls, [a, b])... def __repr__(self):... return "Dice(%s,%s)" % self>>> print Dice(3,6)Dice(3,6)
The default representation for Dice objects is not nice:
這個Dice對象預設的yaml描述可不怎麼好看:
>>> print yaml.dump(Dice(3,6))!!python/object/new:__main__.Dice- !!python/tuple [3, 6]
Suppose you want a Dice object to represented as AdB in YAML:
好,現在假設你想把Dice對象描述成在yaml裡為"AdB"的形式(A,B為變數)。
First we define a representer that convert a dice object to scalar node with the tag !dice and register it.
首先我們定義一個可以把Dice對象轉換成帶有'!dice'標籤節點的描述器,然後註冊。
>>> def dice_representer(dumper, data):... return dumper.represent_scalar(u'!dice', u'%sd%s' % data)>>> yaml.add_representer(Dice, dice_representer)
Now you may dump an instance of the Dice object:
現在你就可以dump一個Dice執行個體了:
>>> print yaml.dump({'gold': Dice(10,6)}){gold: !dice '10d6'}
Let us add the code to construct a Dice object:
讓我們把節點加到Dice對象的構造器中。
>>> def dice_constructor(loader, node):... value = loader.construct_scalar(node)... a, b = map(int, value.split('d'))... return Dice(a, b)>>> yaml.add_constructor(u'!dice', dice_constructor)
Then you may load a Dice object as well:
然後就可以使用了
>>> print yaml.load("""... initial hit points: !dice 8d4... """){'initial hit points': Dice(8,4)}
從這裡可以看出了,constructor和representer是相對的,一個為load,一個為dump。
-------------------------------------------------------
以上大多數來自 http://pyyaml.org/wiki/PyYAMLDocumentation