Lexical structure of Javascript core reading; javascript lexical

Source: Internet
Author: User

Lexical structure of Javascript core reading; javascript lexical

The lexical structure of a programming language is a set of basic rules used to describe how you compile the language. As the basis of the syntax, it specifies how the variable name is, how to write comments, and how the statements are distinguished. This section describes the lexical structure of javascript in a short time.

1. Character Set

Javascript programs are written in the Unicode Character Set. Unicode is a superset of ASCII and Latin-1, and supports almost all languages in the region. ECMAscript3 requires javascript implementation to support Unicode2, 1 and later versions. ECMAscript5 requires Unicode3 and later versions.

I. Case Sensitive

Javascript is a case-sensitive language. That is to say, keywords, variables, function names, and all descriptive characters must use the same case. For example, the keywords while must be written as while, and cannot be written as While or WHILE.

However, it should be noted that html is not case sensitive (although xhtml is case sensitive), and it is easy to be confused because it is closely related to the client javascript. For example, in the Processing Event Set in html, The onclick attribute can be written as onClick, But The onclick attribute can be written as lower-case onclick in javascript.

Ii space, line feed, and format Controller

Javascript ignores spaces between tokens in the program. In most cases, javascript also ignores line breaks. Since spaces and line breaks can be used freely in the code, neat and consistent indentation can be used at a time to form a unified encoding style, improving the readability of the Code.

Besides the space character (\ u0020 ). Javascript also describes the following characters that indicate spaces: horizontal tab (\ u0009), vertical tab (\ u000B), page feed (\ u000C), and blank spaces without interruption (\ u00A0), byte mark (\ uFEFF), and all Zs class characters in Unicode. Javascript recognizes the following characters as Terminator: Line Break (\ u000A), carriage return (\ u000D), line separator (\ u2028), and segment separator (\ u2029 ). The carriage return and linefeed are parsed as the terminator of a single line.

Unicode format control characters (Cf class), such as "Writing mark from right to left" (\ u200F) and "Writing mark from left to right" (\ u200E ), controls the visual display of text. This is crucial for the correct display of some non-English texts. These characters can be directly included in javascript comments, strings, and regular expressions, but cannot be used as Identifiers (for example, variable name), but a except zero-width connector (\ u200D) and zero-width non-connector (\ uFEFF) can appear in the identifier, but cannot be used as the operator character of the identifier. As mentioned above, the byte order mark format controller (\ uFEFF) is treated as a space.

Iii. Unicode escape sequence

In some computer hardware and software, the full set of Unicode characters cannot be displayed or entered. To support programmers who use old technologies, javascript defines a special sequence that uses 6 ASCII characters to represent any 16-bit Unicode Internal code. These Unicode escape sequences are prefixed with \ u, followed by hexadecimal rats (represented by numbers and uppercase/lowercase letters, A-F ). This Unicode escape method can be used in the direct amount of javascript strings, the river of regular expressions, and the identifier (except for keywords ). For example, the Unicode escape code of character é is \ u00E9. The following two Javascript strings are identical.

"Caf é" = "caf \ u00e9" => true
Unicode escape can be used in comments. However, because javascript ignores comments, they are only processed as ascii characters in the context and are not followed by Unicode characters.

Iiii Standardization

Unicode allows multiple methods to encode the same character. For example, you can use the Unicode Character \ u00E9 or the ordinary ascii character e to follow the tone symbol \ u0301. In the text editor, the results of the two encodings are the same, but their binary representation is different and not the same in the computer. The Unicode Standard defines a preferred code format for indexed characters, and provides a standardized processing method to convert text into a standard format suitable for comparison, other representations, strings, or regular expressions are not normalized.

2. Notes

Javascript supports two annotation methods. The text after "//" at the end of a row is ignored by javascript.
In addition, the text between/* and */is also used as a comment. This annotation can be written across lines, but it cannot contain nested annotations.

// Single line comment
/*
*
*
*
*/
3. Direct Volume

The so-called direct amount (literal) is the data value directly used in the program. The following lists the direct amount

Copy codeThe Code is as follows:
12 // number
1.2 // decimal
"Hllo World" // string text
'Hi' // another string
True // Boolean Value
False // Boolean Value
/Javascript/gi // direct amount of regular expressions (for pattern matching)
Null // null

Chapter 2 describes the numbers and the number of strings in detail. The regular expression quantity is explained in Chapter 10th. More well-being expressions can be written as arrays or objects.

{X: 1, y: 2} // object
[1, 2, 3, 4, 5] // Array

4. identifier and reserved words

An identifier is a name. In javascript, identifiers are used to name variables and functions, or to mark the jump position in some loop statements in javascript code. The javascript identifier must be a letter. Start with an underscore or dollar sign. The subsequent characters can be letters. Number. Underline or dollar sign (numbers are not allowed to appear as the first letter, once javascript can easily distinguish between identifiers and numbers), below is a valid identifier

Copy codeThe Code is as follows:
My_variable_name
B13
_ Dummy
$ Str

It is portable and easy to write. Generally, we only use ASCII letters and numbers to write identifiers. Note that javascript allows the identifier to contain letters and numbers in the full set of Unicode characters (technically, ECMAScript allows the Mn class of the Unicode character mechanism to appear after the first character of the identifier, mc class and P class). Therefore, programmers can use non-English languages or mathematical symbols to write identifiers.

Copy codeThe Code is as follows:
Var sá= true;
Var π = 3.14;

Javascript uses some identifiers as keywords, so the names cannot use these keywords as identifiers in the program.

Copy codeThe Code is as follows:
Break
Case
Catch
Continue
Default
Delete
Do
Else
Finally
For
Function
If
In
Instanceof
New
Return
Switch
This
Throw
Try
Typeof
Var
Void
While
With

Reserved javascript words

Class const enum export
Export extends import super
In addition, these keywords are valid in common javascript, but they are reserved words in strict mode.

Implements let private public yield interface package
Protected static
In the same strict mode, the following identifiers are strictly restricted, but variable names, parameter names, and function names are not allowed.

Arguments eval
The specific implementation of javascript may define unique global variables and functions. Each specific javascript Runtime Environment (client) server has its own global attribute list, this must be kept in mind. (Window object to understand the list of global variables and functions defined in client javascript)

5. Optional semicolon

Like many programming languages, javascript uses semicolons (;) to separate statements. This is very important to enhance the readability and purity of the Code. The end of a statement without a separator is the start of the next statement, and vice versa.
In javascript, each statement occupies only one row. The semicolon between statements can be omitted (the semicolon before "}" is used at the end of the program ). Many javascript programmers (including the code examples in this book) Use semicolons to clearly mark the end of a statement, even if the semicolon is not completely required, another style is to omit the semicolon when it can be omitted. The semicolon is used only when it is not necessary. No matter which programming style, there are several details about javascript.
The following code indicates that the first semicolon can be omitted.

A = 3;
B = 4;
However, the first semicolon cannot be omitted if it is written in the following format.

A = 3; B = 4;
Note that javascript does not fill the semicolon in all line breaks: javascript fills the semicolon only when the Code cannot be parsed without the semicolon, in other words (similar to the two exceptions in the following code), if the current statement and subsequent non-space characters cannot be regarded as a whole parsing, javascript fills the semicolon at the end of the current statement, see the following code:

Var
A
=
3
Console. log ()
Javascript parses it

Var a; a = 3; console. log ();
Javascript adds a semicolon to the first line. Without a semicolon, javascript cannot parse var a in the code. The second a can be used as a separate statement "a;", but javascript does not fill the end of the second line with a semicolon. Because it can be parsed into "a = 3;" together with the content of the third line ;".

Some statement separation rules may lead to unexpected situations. This code is divided into two lines, and it looks like two independent statements.

Var y = x + f
(A + B). toString ()
The parentheses in the second line form a function call with f in the first line. javascript will regard this code

Var y = x + f (a + B). toString ();
Obviously, this is not the purpose of the code. To resolve the above Code into two different statements, you must manually enter the display semicolon of the behavior.

Generally, if a statement starts with ([/+-, it is very likely to be combined with the previous statement for parsing. Statements starting with/+-are not very common, however, statements starting with [are very common. It is at least common in some javascript encoding styles. Some programmers prefer to add a semicolon to the front of a statement. In this way, even if the previous statement is modified and the semicolon is deleted by mistake, the current statement will still be correctly parsed;
If the current statement and the next statement cannot be combined for parsing. Javascript fills the semicolon after the first line. This is a general rule, but there are two columns. The first exception involves the returnm, birak, And he continue statements. If the three keywords are followed by a line break. Javascript fills the semicolon in the line feed. For example

For example

Return
True;
While javascript is parsed

Return; ture;
The intention of the Code is

Return ture;
That is to say, there cannot be line breaks between the expressions after return, break, and contuine. If a line break is added, the program will report an error in very special cases. It is inconvenient to debug the program.

In the second example, when the ++ -- operator is involved, these expression symbols can represent the prefix and suffix of the identifier expression. If you use it as a suffix expression. It and the expression should be treated as a row. Otherwise, the end of the line is filled with a semicolon.

Copy codeThe Code is as follows:
X
++
Yy

The above code is parsed

Copy codeThe Code is as follows:
X;
++ Y

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.