Summary
JavaScript Object Notation (JSON) is a lightweight, text-based, language-independent data exchange format. It is derived from the ecmascript language standard. JSON defines a small set of formatting rules for lightweight representation of structured data.
1. Description
JSON is a serialized text format of structured data. It derives from the real words of JavaScript objects, which are defined in the third edition of the ecmascript language standard.
JSON can describe four simple types (string, number, Boolean, and null) and two structured types (Object and array ).
A string is a sequence of zero or multiple Unicode characters.
An object is a set of zero or multiple name/value pairs without order. The name here is of the string type, value can be string, number, Boolean, null, object, or array type.
An array is an ordered sequence of zero or multiple values.
The terms "object" and "array" come from the Javascript specification.
JSON is designed to make it small, lightweight, and textual, and a subset of JavaScript.
1.1. Terms/conventions used in this document
Keywords in this document: "must", "must not", "required", "shall", "shall not ", "shocould", "shocould not", "recommended", "may", and "optional", as described in [rfc2119.
The syntax rules in this document are described in [rfc4234.
2. JSON syntax
JSON text is a tag sequence. The tag contains six constructor characters, strings, numbers, and three real-name characters.
JSON text is a serialized object or array.
The following are six constructor characters:
- Begin-array = ws % x5b ws; [left square brackets
- Begin-object = ws % x7b ws; {left braces
- End-array = ws % x5d ws;] right square brackets
- End-object = ws % x7d ws;} Right braces
- Name-separator = ws % x3a ws;: Colon
- Value-separator = ws % x2c ws;, comma
Meaningless spaces are allowed before or after the six constructor characters.
- Ws = *(
% X20/; space character
% X09/; horizontal Tab
% X0a/; line feed
% X0d; carriage return
)
2.1. Value
JSON must be an object, array, number, or string, or one of the following three real name:
The real name must be in lower case (must) and cannot contain any other real name.
- Value = false/null/true/Object/array/number/string
false = %x66.61.6c.73.65 ; false
null = %x6e.75.6c.6c ; null
true = %x74.72.75.65 ; true
2.2. Object
The object structure is represented as: a pair of braces enclose zero or multiple name/value pairs (or members ). The name is of the string type. Each name is followed by a colon to separate the name from the value. Comma (,) is used to separate the names that follow the values. These names in the object should be unique.
- Object = begin-object [member * (value-separator member)] End-Object
- Member = string name-separator Value
2.3. Array
Array Structure: square brackets enclose zero or multiple values (or elements ). Elements are separated by commas.
- Array = begin-array [value * (value-separator value)] End-array
2.4. Number
The numeric representation is similar to that of most other programming languages. A number contains an integer that may carry a negative number, which may be followed by a decimal or exponential part.
Octal and hexadecimal formats are not allowed. The preceding 0 values are also forbidden.
The fractional part is followed by one or more Arabic numerals after a decimal point.
The index part starts with an uppercase or lowercase E, and E can be followed by a positive/negative number. Followed by one or more Arabic numerals.
Numeric values cannot represent the sequence of Arabic numerals (such as Infinity and Nan are not allowed ).
- Number = [minus] int [frac] [Exp]
decimal-point = %x2E ; .
digit1-9 = %x31-39 ; 1-9
e = %x65 / %x45 ; e E
exp = e [ minus / plus ] 1*DIGIT
frac = decimal-point 1*DIGIT
int = zero / ( digit1-9 *DIGIT )
minus = %x2D ; -
plus = %x2B ; +
zero = %x30 ; 0
2.5. String
The string representation is similar to the C language family specification. String starts and ends with quotation marks. All Unicode characters can be placed in quotation marks, except escape characters: quotation marks, backslashes/, and controllers (U + 0000-U + 001f ).
All these characters should be avoided. If the character is in a basic multi-language environment (U + 0000-U + FFFF), it can be expressed in a six-Character Sequence: a reverse oblique rod followed by a letter U, A hexadecimal number encoded with four character codes. A-F these hexadecimal letters can be in lower case. Therefore, the character that contains only one backslice can be expressed as "/u005c ".
Another method is to use two escape character sequences to represent some common characters. Therefore, strings that contain only one backslash character can be expressed as "//" in a more concise manner "//".
Escape a character that does not exist in a multilingual environment, which can represent a 12-character sequence encoded as an alternative pair of UTF-16 (UTF-16 surrogate pair ). Therefore, strings that contain only one G audio spectrum character can be expressed as "/ud834/udd1e ".
- String = quotation-mark * char quotation-mark
char = unescaped /
escape (
%x22 / ; " quotation mark U+0022
%x5C / ; / reverse solidus U+005C
%x2F / ; / solidus U+002F
%x62 / ; b backspace U+0008
%x66 / ; f form feed U+000C
%x6E / ; n line feed U+000A
%x72 / ; r carriage return U+000D
%x74 / ; t tab U+0009
%x75 4HEXDIG ) ; uXXXX U+XXXX
escape = %x5C ; /
quotation-mark = %x22 ; "
unescaped = %x20-21 / %x23-5B / %x5D-10FFFF
3. Encoding
The JSON text shall is encoded as Unicode. The default code is UTF-8.
Because the first two characters of the JSON text are always ASCII characters [rfc0020], you can view the first four characters in part 0, to determine whether the 8-byte stream is a UTF-8 or a UTF-16 (Be or Le), or a UTF-32 (Be or Le ).
- 00 00 00 XX UTF-32BE
00 xx 00 xx UTF-16BE
xx 00 00 00 UTF-32LE
xx 00 xx 00 UTF-16LE
xx xx xx xx UTF-8
4. parser
The JSON parser converts JSON text into other representations. It must (must) be able to accept all text that complies with the JSON syntax. The parser can also accept non-JSON format or some extensions.
In implementation, you can set a limit on the size of the received text, the maximum depth of the JSON text, the number range, or the length of Characters in the string.
5. Generator
The JSON builder generates JSON text. The target text must (msut) strictly abide by the JSON syntax.
6. iana considerations
The MIME type of JSON text is application/JSON.
Type name: Application
Chart type name: JSON
Required parameter: N/
Optional parameter: N/
Encoding options: 8bit if UTF-8; binary if UTF-16 or UTF-32
- JSON may be represented using UTF-8, UTF-16, or
UTF-32. When JSON is written in UTF-8, JSON is 8bit compatible. When
JSON is written in UTF-16 or UTF-32, the binary
Content-transfer-encoding must be used.
Security considerations:
- The scripting language usually has security issues. JSON is a subset of JavaScript, but it is a security subset that revokes values and calls.
- JSON text can safely pass Javascript eval () functions. If the character is not included in the tag in the JSON tag, you can use the test and replace methods in two regular expressions in JavaScript to quickly determine the character.
VaR my_json_object =! (/[^,: {}/[/] 0-9./-+ eaeflnr-U/n/R/T]/. Test (
Text. Replace (/"(//. | [^" //]) * "/g ,'')))&&
Eval ('+ TEXT + ')');
Interoperability considerations: N/
Released specification: RFC 4627
Applications that use this media type:
- JSON has been used to exchange data
Applications written in all of these programming ages:
ActionScript, C, C #, ColdFusion, Common LISP, E, Erlang, Java,
Javascript, Lua, objective caml, Perl, PHP, Python, REBOL, Ruby, and
Scheme.
Additional information (omitted)
7. security considerations
Refer to security considerations in Section 6.
8. Example
A json object:
- {
"Image": {
"Width": 800,
"Height": 600,
"Title": "View from 15th Floor",
"Thumbnail": {
"Url": "http://www.example.com/image/481989943",
"Height": 125,
"Width": "100"
},
"IDs": [116, 943, 234, 38793]
}
}
The image member of this object is an object, while the thumbnail Member of the image is also an object, and the IDS Member of the image is an array.
The following is a JSON array containing two objects:
9. References
9.1. Standardization reference
[ECMA] European Computer Manufacturers Association, "ECMAScript
Language Specification 3rd Edition", December 1999,
ecma-st/ECMA-262.pdf>.
[RFC0020] Cerf, V., "ASCII format for network interchange", RFC 20,
October 1969.
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119, March 1997.
[RFC4234] Crocker, D. and P. Overell, "Augmented BNF for Syntax
Specifications: ABNF", RFC 4234, October 2005.
[UNICODE] The Unicode Consortium, "The Unicode Standard Version 4.0",
2003, http://www.unicode.org/versions/Unicode4.1.0/.
Contact information of the author
Douglas Crockford
JSON.org
EMail: douglas@crockford.com