IPFs Data Model-IPLD

Source: Internet
Author: User
Tags format definition rfc
  • Ipld.io
  • Github:ipld
  • Original: IPLD Specs

There are many system-used merkle-tree and hash-chain inspired data structures (such as GIT,BITTORRENT,IPFS,TAHOE-LAFS,SFSRO). IPLD (Interstellar link data) Definition:

    • merkle-links: core unit of Merkle-graph
    • Merkle-dag: A graph of any side that is merkle-links. dagrepresents a "direction-free graph"
    • merkle-paths: Uses the named Merkl-links to traverse the Merkl-dags UNIX-style path.
    • IPLD format : A set of formats that can represent IPLD objects, such as JSON,CBOR,CSON,YAML,PROTOBUF,XML,RDF.
    • IPLD canonical format : A deterministic description of a serialized format that ensures that the same logical objects are always serialized to the same bit sequence. This is critical for linking and all cryptographic applications.

Introduced

What is Merkle-link?

Merkl-link is a link between two objects that are processed by the target object's cryptographic hash and embedded in the source object. Content addressing for Merkl-links allows:

    • Cryptographic Integrity Check : The value of the parse link can be tested by hash. This, in turn, enables extensive, secure and reliable data exchange (such as Git or BitTorrent), because others cannot give you any data that does not hash to the link value.
    • Immutable Data Structures : Data structures with merkle links cannot be changed, which is a good attribute for distributed systems. This is useful for versioning, which represents distributed mutable states (such as Crdt) and long-term archiving.

A merkle-link is represented in the IPLD object model by a mapping that contains one key/mapped to the link value. For example:

A link to a "linked object" in JSON

{ "/" : "/ipfs/QmUmg7BZC1YP1ca66rRtWKxpXp77WgVHrnv263JtDuvs2k" }// "/" is the link key// "/ipfs/QmUmg7BZC1YP1ca66rRtWKxpXp77WgVHrnv263JtDuvs2k" is the link value

foo/bazthe link to the object

{  "foo": {    "bar": "/ipfs/QmUmg7BZC1YP1ca66rRtWKxpXp77WgVHrnv263JtDuvs2k", // not a link    "baz": {"/": "/ipfs/QmUmg7BZC1YP1ca66rRtWKxpXp77WgVHrnv263JtDuvs2k"} // link  }}

The actual link to the object files/cat.jpg/link and the files/cat.jpg pseudo "linked object" in.

{   “ files ”: {      “ cat.jpg ”: { //将链接属性封装在另一个对象中      “ link ”: { “ / ”: “ / ipfs / QmUmg7BZC1YP1ca66rRtWKxpXp77WgVHrnv263JtDuvs2k ” },//链接      “ mode ”: 0755,      “owner”: “ jbenet ”    }  }}

When the link is unlinked, the map itself is replaced by the object it points to, unless the link path is invalid.

The link can be multihash , in this case, the assumption that it is a link in the/ipfs hierarchy, or directly to the absolute path of the object. Currently, only the/IPFS hierarchy is allowed.

If an application wants to use an object with a single/key for other purposes, the application itself should be responsible for escaping the/key in the/ipld object so that the application's key does not conflict with the special/key of IPLD.

What is Merkle-graph or Merkle-dag?

Objects with Merkl-links form a graph (merkle-graph), and if the properties of the cryptographic hash function remain constant, then the objects must be directed and can be considered non-cyclic, that is, Merkle-dag. Therefore, all graphs that use merkle-linking (Merkle-graph) must also have directed acyclic graphs (dags, therefore, Merkle-dag).

What is a merkle path?

Merkl-path is a UNIX-style path (for example,/A/B/C/D), which is initially referenced through Merkl-link and allows access to the elements of the referenced node and other nodes.

We encourage the universal file system to design an object model on IPLD that will be dedicated to file operations and have a specific path algorithm to query the model.

How does merkle-paths work?

Merkl-path is a UNIX-style path that is initially referenced by Merkl-link and then named Merkl-links in an intermediate object. After the name, it means finding the object, finding the name, and resolving the related Merkl-link.

For example, suppose we have this merkle-path:

/ipfs/QmUmg7BZC1YP1ca66rRtWKxpXp77WgVHrnv263JtDuvs2k/a/b/c/d

Path Description:

    • ipfsis a protocol namespace (allows the computer to recognize what to do)
    • QmUmg7BZC1YP1ca66rRtWKxpXp77WgVHrnv263JtDuvs2kis a cryptographic hash.
    • a/b/c/dis a path traversal, just like in Unix.

Path traversal, denoted by a symbol/, occurs on two types of links:

    • objects within the object traverse the data within the same object.
    • cross-object traversal traverses from one object to another, parsing through Merkle-link.

Example

Use the following data set:

> ipfs object cat --fmt=yaml QmUmg7BZC1YP1ca66rRtWKxpXp77WgVHrnv263JtDuvs2k---a:  b:    link:      /: QmV76pUdAAukxEHt9Wp2xwyTpiCmzJCvjnMxyQBreaUeKT    c: "d"    foo:      /: QmQmkZPNPoRkPd7wj2xUJe5v5DsY6MX33MFaGhZKB2pRSE> ipfs object cat --fmt=yaml QmV76pUdAAukxEHt9Wp2xwyTpiCmzJCvjnMxyQBreaUeKT---c: "e"d:  e: "f"foo:  name: "second foo"> ipfs object cat --fmt=yaml QmQmkZPNPoRkPd7wj2xUJe5v5DsY6MX33MFaGhZKB2pRSE---name: "third foo"

An example of a path:

    • /ipfs/QmUmg7BZC1YP1ca66rRtWKxpXp77WgVHrnv263JtDuvs2k/a/b/cThe first object is traversed and a string is fetched d .
    • /ipfs/QmUmg7BZC1YP1ca66rRtWKxpXp77WgVHrnv263JtDuvs2k/a/b/link/cWill traverse two objects and get a stringe
    • /ipfs/QmUmg7BZC1YP1ca66rRtWKxpXp77WgVHrnv263JtDuvs2k/a/b/link/d/eTraverse two objects and get a stringf
    • /ipfs/QmUmg7BZC1YP1ca66rRtWKxpXp77WgVHrnv263JtDuvs2k/a/b/link/foo/nameTraverse the first and second objects and get a stringsecond foo
    • /ipfs/QmUmg7BZC1YP1ca66rRtWKxpXp77WgVHrnv263JtDuvs2k/a/b/foo/nameTraverse the first and last object and get a stringthird foo

What is the IPLD data model?

The IPLD data model defines a simple JSON-based structure for all Merkle-dag and identifies a set of formats to encode the structure in.

Limitations and expectations

Some limitations:

    • The IPLD path must be clear. The given path string must always be deterministic to traverse to the same object. (e.g. avoid duplicate link names)
    • The IPLD path must be generic and avoid being unfriendly to non-English-speaking countries (for example, using UTF-8 instead of ASCII).
    • The IPLD path must be cleanly layered on UNIX and on the Web (using/, deterministic transformations on the ASCII system).
    • Given the extensive success of JSON, a large number of systems provide JSON interfaces. IPLD must be able to easily import and export JSON.
    • The JSON data model is also very easy to use. IPLD must be as easy to use.
    • Defining a new data structure must be very simple. Trying out new definitions on IPLD should not be cumbersome or require much knowledge.
    • Because IPLD is based on the JSON data model, it can be fully compatible with RDF and associated data standards through JSON-LD.
    • The IPLD serialization format (on disk and online) must be fast and space-saving. (You should not use JSON as a storage format, but should use Cbor or a similar format)
    • IPLD password hashes must be upgradeable (using Multihash)

Some nice points:

    • IPLD should not make errors, such as missing integers in JSON.
    • IPLD should be upgradeable, for example, if a better disk format is present, the system should be able to migrate to it and minimize the cost of doing so.
    • The IPLD object should be able to parse the properties of the path, not just merkle links.
    • IPLD Canonical format should be easy to write parsers.
    • IPLD Canonical format should be able to find without parsing the full object. (Cbor and Protobuf allowed).

Format definition

(Note: Here we will use JSON and YML to show what the format is.) We explicitly use these two methods to show the equivalence of objects in different formats. )

The IPLD data model is JSON, which (a) is also a tree-based document with some basic types, (b) 1:1 mapped to JSON, and (c) the user can use it through the JSON itself. It is "not JSON" (a) has improved on some errors, (b) has an efficient serialization representation, (c) does not actually specify a single on-wire format, because it is well known that the world is improving.

The basic node

The following is a sample Ipld object in JSON:

{   “ name ”:“ Vannevar Bush ” }

Suppose it hashes the Multihash value QmAAA...AAA . Note that it does not have a link at all, just a string name value. But we can still "parse" the key name below it:

> ipld cat --json QmAAA...AAA{  "name": "Vannevar Bush"}> ipld cat --json QmAAA...AAA/name"Vannevar Bush"

Of course, we can view it in other formats.

> ipld cat --yml QmAAA...AAA---name: Vannevar Bush> ipld cat --xml QmAAA...AAA<!xml> <!-- todo --><node>  <name>Vannevar Bush</name></node>

Links between nodes

The merkle-linking between nodes is the reason why IPLD exists. The links in Ipld are just a special form of embedded nodes:

{  "title": "As We May Think",  "author": {    "/": "QmAAA...AAA" // links to the node above.  }}

Assume that the hash value is a Multihash value QmBBB...BBB . The node is linked through author a subpath QmAAA...AAA to the node above this section. So now we can do this:

> ipld cat --json QmBBB...BBB{  "title": "As We May Think",  "author": {    "/": "QmAAA...AAA" // links to the node above.  }}> ipld cat --json QmBBB...BBB/author{  "name": "Vannevar Bush"}> ipld cat --yml QmBBB...BBB/author---name: "Vannevar Bush"> ipld cat --json QmBBB...BBB/author/name"Vannevar Bush"

Link Property Conventions

IPLD allows users to build complex data structures, as well as other links related properties. This is useful for encoding additional information, as well as links, such as relationship types or ancillary data needed in a link. This is different from the "linked object conventions" discussed below, and they are very useful in themselves. But sometimes you just want to add some data to the link without having to create another object. IPLD won't hinder you. You can do this simply by nesting the actual IPLD link in another object and using other properties.

Important: Link properties are not allowed to be used directly in linked objects because there is a clear ambiguity. Read the specification history and learn about the difficult issues.

For example, suppose you have a file system and you want to assign metadata similar to permissions or owners in a link between objects. Suppose you have a hash value for QmCCC...CCC The directory object like this:

{  "foo": { // link wrapper with more properties    "link": {"/": "QmCCC...111"} // the link    "mode": "0755",    "owner": "jbenet"  },  "cat.jpg": {    "link": {"/": "QmCCC...222"},    "mode": "0644",    "owner": "jbenet"  },  "doge.jpg": {    "link": {"/": "QmCCC...333"},    "mode": "0644",    "owner": "jbenet"  }}

or yml

---foo:  link:    /: QmCCC...111  mode: 0755  owner: jbenetcat.jpg:  link:    /: QmCCC...222  mode: 0644  owner: jbenetdoge.jpg:  link:    /: QmCCC...333  mode: 0644  owner: jbenet

Although we have new properties in the links that are specific to this data structure, we can still parse the links well:

> ipld cat --json QmCCC...CCC/cat.jpg{  "data": "\u0008\u0002\u0012��\u0008����\u0000\u0010JFIF\u0000\u0001\u0001\u0001\u0000H\u0000H..."}> ipld cat --json QmCCC...CCC/doge.jpg{  "subfiles": [    {      "/": "QmPHPs1P3JaWi53q5qqiNauPhiTqa3S1mbszcVPHKGNWRh"    },    {      "/": "QmPCuqUTNb21VDqtp5b8VsNzKEMtUsZCCVsEUBrjhERRSR"    },    {      "/": "QmS7zrNSHEt5GpcaKrwdbnv1nckBreUxWnLaV4qivjaNr3"    }  ]}> ipld cat --yml QmCCC...CCC/doge.jpg---subfiles:  - /: QmPHPs1P3JaWi53q5qqiNauPhiTqa3S1mbszcVPHKGNWRh  - /: QmPCuqUTNb21VDqtp5b8VsNzKEMtUsZCCVsEUBrjhERRSR  - /: QmS7zrNSHEt5GpcaKrwdbnv1nckBreUxWnLaV4qivjaNr3> ipld cat --json QmCCC...CCC/doge.jpg/subfiles/1/{  "data": "\u0008\u0002\u0012��\u0008����\u0000\u0010JFIF\u0000\u0001\u0001\u0001\u0000H\u0000H..."}

However, we cannot extract the links as well as other attributes because the links are parsed.

Repeating properties keys

Note that there are two properties of the same name that are not allowed, but it is not really possible to block (someone would do this and give it to the parser), so for security reasons, we defined the value of the path traversal as the first entry in the serialization representation. For example, suppose we have objects:

{  "name": "J.C.R. Licklider",  "name": "Hans Moravec"}

Suppose this is the exact order of the canonical format (not JSON, but Cbor), and it's hash column QmDDD…DDD . We always get:

> ipld cat --json QmDDD...DDD{  "name": "J.C.R. Licklider",  "name": "Hans Moravec"}> ipld cat --json QmDDD...DDD/name"J.C.R. Licklider"

Path restrictions

The path description in UNIX and the web has some important issues. For discussion, see this discussion. To be compatible with the UNIX and Web models and expectations, IPLD explicitly prohibits paths with specific path components. Note that the data itself may still contain these attributes (someone would do this and have a legitimate purpose). So only the path parser cannot parse through these paths. These limitations are the same as for typical UNIX and UTF-8 path systems:

Todo:

    • [] List Path resolution restrictions
    • [] Show Example

Integral type in JSON

IPLD can be directly compatible with JSON to take advantage of the success of JSON, but it does not need to be constrained by JSON errors. This is where we can follow the idiomatic selection of formats, but we must be careful to ensure that well-defined 1:1 mappings always exist.

With regard to integers, there are several formats in JSON that represent integers as strings, such as Ejson. These can be used and converted to other formats, which naturally occur-that is, when you convert JSON to Cbor, you should naturally convert the ejson integers to the appropriate cbor integers instead of the mappings that represent them as string values.

Serialized data format

IPLD supports a variety of serialized data formats through MULTICODEC. These can be used, but for formats that are idiomatic, such as Cbor, we can use Cbor type tags to represent merkl-link and avoid writing out the full string key @link. Encourage users to fully use these formats and store and transfer IPLD data in any format that is most meaningful. The only requirement is that there must be a single-to-one mapping of a well-defined IPLD canonical format. This allows you to convert data from one format to another without having to change its meaning or password hashes.

Tagged serialization Cbor

In Cbor, you can use the tags defined in RFC 7049 section 2.4 to represent IPLD links.

The label <tag-link-object> is defined. This tag can be either a text string (main type 3) or a byte string corresponding to the target of the link (main type 2).

When you encode IPLD "linked objects" to Cbor, use the following algorithm:

    • Extracts the link value.
    • If the link value is a valid multiaddress, and the link text is converted to a multi-address binary string and returned to the text will guarantee that the exact same text is produced, the link will be converted to a binary multi-address byte string stored in Cbor (main type 2).
    • Otherwise, the link value is stored as text (main type 3)
    • The resulting encoding is the <tag-link-object> cbor representation of the link value

When decoding Cbor and converting it to IPLD, each event of,<tag-link-object> is converted by the following algorithm:

    • The following values must be the extracted link values.
    • If the link is a binary string, it is interpreted as a multi-address and converted to text format. Otherwise, use the literal string directly.
    • Creates a mapping with a key-value pair. The key is the standard IPLD link key/, which is a text string that contains the link value.

When a Ipld object contains these tags in the manner described here, the MULTICODEC header used to represent the object's codec must be/CBOR/IPLD-TAGSV1, not just/cbor. Readers should be able to use the optimized reading process to detect links that use these tags.

Canonical format

In order to maintain merkle-linking capabilities, we must ensure that the IPLD document has a single canonical serialization representation. This ensures that the application obtains the same cryptographic hash. It should be noted that this is a system-wide parameter. Future systems may change it to evolve the representation. However, we estimate that this will take 10 years not more than once.

IPLD canonical format is a standardized cbor with tags.

In addition to the rules defined here, the canonical Cbor format must follow the rules defined in RFC 7049 section 3.9.

Users of this format should not expect any particular sort of keys, because keys may be sorted in different non-standard formats.

The traditional canonical format is protocol buffers.

This canonical format is used to determine the format that is used when the object is first created and its hash is computed. Once the format is determined for the IPLD object, it must be used in all communications so that the sender and receiver can examine the data based on the hash.

For example, when sending a traditional object encoded in protocol buffers, the sender must not send the Cbor version because the receiver will not be able to check the validity of the file.

Similarly, when a recipient stores an object, it must ensure that the object's canonical format is stored with the object so that it can be shared with other nodes.

An easy way to store these objects in their format is to store them with their MULTICODEC headers.

Data structure Examples

Importantly, IPLD is a simple, flexible, and extensible format that does not prevent users from defining new or importing old data files. To do this, I'll show some sample data structures below.

UNIX File System

A small file

{   “ data ”: “ hello world ”,   “ size ”: “ 11 ” }

A chunked file
Split into multiple independent sub-files.

{  "size": "1424119",  "subfiles": [    {      "link": {"/": "QmAAA..."},      "size": "100324"    },    {      "link": {"/": "QmAA1..."},      "size": "120345",      "repeat": "10"    },    {      "link": {"/": "QmAA1..."},      "size": "120345"    },  ]}

Directory

{   “ foo ”: {     “ link ”: { “ / ”: “ QmCCC ... 111 ” },     “ mode ”: “ 0755 ”,     “ owner ”: “ jbenet ”  },  “ cat.jpg ”: {     “ link ”: { “ / ”: “ QmCCC ... 222 ” },     “ mode ”: “ 0644 ”,     “ owner ”: “ jbenet ”  },  “ doge.jpg ”: {     “ link ”: { “ / ”: “ QmCCC ... 333 ” },     “ mode ”: “ 0644 ”,     “ owner ”: “ jbenet ”  }}

Git

Git blob

{   “ data ”: “ hello world ” }

Git tree

{   “ foo ”: {     “ link ”: { “ / ”: “ QmCCC ... 111 ” },     “ mode ”: “ 0755 ”  },  “ cat.jpg ”: {     “ link ”: { “ / ”: “ QmCCC ... 222 ” },     “ mode ”: “ 0644 ”  },  “ doge.jpg ”: {     “ link ”: { “ / ”: “ QmCCC ... 333 ” },     “ mode ”: “ 0644 ”  }}

Git commit

{  "tree": {"/": "e4647147e940e2fab134e7f3d8a40c2022cb36f3"},  "parents": [    {"/": "b7d3ead1d80086940409206f5bd1a7a858ab6c95"},    {"/": "ba8fbf7bc07818fa2892bd1a302081214b452afb"}  ],  "author": {    "name": "Juan Batiz-Benet",    "email": "juan@benet.ai",    "time": "1435398707 -0700"  },  "committer": {    "name": "Juan Batiz-Benet",    "email": "juan@benet.ai",    "time": "1435398707 -0700"  },  "message": "Merge pull request #7 from ipfs/iprs\n\n(WIP) records + merkledag specs"}

Bitcoin

Bitcoin block

{   “ parent ”: { “ / ”: “ Qm000000002CPGAzmfdYPghgrFtYFB6pf1BqMvqfiPDam8 ” },   “ transactions ”: { “ / ”: “ QmTgzctfxxE8ZwBNGn744rL5R826EtZWzKvv2TF2dAcd9n ” },   “ nonce ”: “ UJPTFZnR2CPGAzmfdYPghgrFtYFB6pf1BqMvqfiPDam8 ” }

Bitcoin Trading

This time, in the yml. TODO: Let it be a real txn

---inputs:  - input: {/: Qmes5e1x9YEku2Y4kDgT6pjf91TPGsE2nJAaAKgwnUqR82}    amount: 100outputs:  - output: {/: Qmes5e1x9YEku2Y4kDgT6pjf91TPGsE2nJAaAKgwnUqR82}    amount: 50  - output: {/: QmbcfRVZqMNVRcarRN3JjEJCHhQBcUeqzZfa3zoWMaSrTW}    amount: 30  - output: {/: QmV9PkR2gXcmUgNH7s7zMg9dsk7Hy7bLS18S9SHK96m7zV}    amount: 15  - output: {/: QmP8r8fLUnEywGnRRUrHB28nnBKwmshMLiYeg8udzYg7TK}    amount: 5script: OP_VERIFY
Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.