POSTGRESQL syntax structure

Source: Internet
Author: User
Tags lowercase parse error postgresql postgresql syntax time interval alphanumeric characters

The SQL input consists of a series of commands . An order consists of a series of tokens , with a semicolon (";") End. The termination of the input stream also ends a command. Which tokens are valid depends on the syntax of the particular command.

A token can be a keyword , an identifier , an identifier surrounded by quotation marks , a literal (or constant), a special character symbol. Tokens are usually delimited by whitespace (spaces/tab/newline characters), but they can be used without confusion (usually just when a special character is connected to some other token type).

In addition, there can be comments in the SQL input. They are not tokens, they are actually equivalent to whitespace.

For example, the following commands are (syntactically) valid SQL input:

SELECT * from my_table; UPDATE my_table SET A = 5;insert into my_table VALUES (3, ' Hi there ');

Here is a sequence of three commands, each line (although not required), multiple commands can be in one line, and a single command can be logically split into multiple lines.

SQL syntax is not very consistent if you are considering which tokens identify the command, which are the operands or the arguments. Usually the first few tokens are command names, so the above example is usually called a "select", an "UPDATE", and an "INSERT" command. However, theUPDATE command always requires a SET to appear in a certain location, and this particular INSERT also requires a VALUES to be complete. The exact syntax rules for each command are described in part VI.

4.1.1. Identifiers and Keywords

Like the SELECT, UPDATE, and VALUES in the example above are examples of keywords , which are words that have a fixed meaning in the SQL language. Tokens my_table and a are examples of identifiers . Depending on the commands that use them, they identify the names of tables, fields, or other database objects. So sometimes they simply call them "names." Keywords and identifiers have the same lexical structure, meaning that we cannot tell whether a token is an identifier or a name until we know the language. You can find a complete list of keywords in Appendix C.

SQL identifiers and keywords must begin with a single letter (a-z and diacritics and non-Latin letters) or underscores (_), followed by letters, underscores, and numbers (0-9 ), dollar sign ($). It is important to note that, according to the SQL standard, the dollar sign is not allowed to appear in identifiers, so using the dollar sign will not be easy to transplant. The SQL standard does not define keywords that contain numbers or underscores at the beginning or end, so identifiers defined in this format are safe and do not conflict with future standard extensibility features.

The system uses no more than Namedatalen-1 characters as identifiers; You can write longer names in the commands, but they will be truncated. The default value for Namedatalen is 64, so the identifier maximum length is 63. If you feel this limitation is problematic, you can change it in src/include/postgres_ext.h by modifying Namedatalen .

Identifiers and keyword names are case-insensitive. So

UPDATE my_table SET A = 5;

It can also be written in an equivalent

UPDaTE my_table SeT a = 5;

A good habit is to write the keyword in uppercase, and the name in lowercase:

UPDATE my_table SET a = 5;

There is also a second identifier: a delimited identifier or an identifier surrounded by quotation marks . It is formed by enclosing any sequence of characters in double quotation marks ("). Delimited identifiers are always an identifier, not a keyword. Therefore, you can use "select" to denote a field or table name, and a select without quotation marks will be used as part of a command, so if you use it as a table name or as a field name, a parse error will occur. The above example can be written in quotes surrounded by identifiers:

UPDATE "my_table" SET "a" = 5;

Identifiers enclosed in quotation marks can contain any character encoded not equal to zero (to include a double quotation mark, you can write two consecutive double quotes). This allows us to construct table names or field names that would otherwise be disallowed, such as those that contain blanks or numbers (&). But the length limit is still the same.

Enclose an identifier in quotation marks while also making it case-sensitive, and the name without the enclosed quotation marks is always converted to lowercase. For example, we think that the identifier foo, foo, "foo" is an equivalent PostgreSQL name, but "foo" and "foo" is different from the above three and between them. PostgreSQL always converts unquoted names to lowercase, which is incompatible with the SQL standard, and the SQL standard requires names that are not enclosed in quotation marks to always be capitalized. Therefore, according to the standard,foo equals "foo " but not equal to "foo" . If you want to write portable programs, then we recommend that you either always enclose a name in quotation marks, or never.

4.1.2. Constants

There are three types of constants in PostgreSQL that are implicitly typed : string, bit string, numeric. Constants can also be declared as explicit types, so that more accurate representations can be used and can be handled more efficiently by the system. These will be described in the following subsections.

4.1.2.1. String constants

A string literal in SQL is any sequence of characters enclosed in single quotation marks ('), such as ' This is a string ' . This method of declaring string constants is defined by the SQL standard. The standard compatibility of embedding single quotes in this type of string constant is to knock out two consecutive single quotes, such as ' Dianne ' s horse ' . Note: Two consecutive single quotes are not double quotation marks (").

Two string constants that are simply delimited by at least one newline character are concatenated together and treated as if they were written as a constant. Like what:

SELECT ' http://www.infocool.net ' bar;

is equivalent to

SELECT ' Http://www.infocool.netbar ';

But

SELECT ' http://www.infocool.net '      bar;

is an illegal grammar. This bizarre behavior is SQL declarative, and PostgreSQL follows the standard.

PostgreSQL also allows content in the "Escape" string, which is an extension of PostgreSQL to the SQL standard. The escape string syntax is declared by writing the letter e (uppercase or lowercase) before the string. For example e ' foo ' . When a string containing escape characters needs to be continued, only the e can be written before the opening quotation marks of the first line. Escape string using C-style backslash (\ ) Escape: \b (BACKSPACE), \f (paper Feed ), \n (newline), \r (carriage return), \t (Horizontal tab). Escape characters in the format of \ digits are also supported ( digits is an octal byte value), as well as \x hexdigits escape characters in the format ( hexdigits represents the hexadecimal byte value). Whether the byte sequence you create is the correct character that the server's character set encoding can accept, is your own responsibility. Any other characters that follow the backslash are treated as text. Therefore, to include a backslash in a string constant, write two backslashes (\\ ). In addition, PostgreSQL allows you to escape single quotes (\ ' ) with a backslash, but future versions of PostgreSQL will not allow this. So it's best to stick with the standard ".

Warning

If the value of the configuration parameter standard_conforming_strings is off , then PostgreSQL will be able to recognize all (whether or not leading E) string constants in the backslash escape, This is intended to be compatible with past historical behavior. Although the current default value for standard_conforming_strings is off , it will become on in the near future to be compatible with the standard. We encourage you to not use backslashes to escape in your application. If you do need to use a backslash escape to represent a special character, precede the string constant with E to ensure that it is handled correctly.

In addition to standard_conforming_strings , the escape_string_warning and backslash_quote configuration parameters also affect the processing of backslashes in string constants.

Characters encoded as zero are not allowed in string constants.

4.1.2.2. Dollar symbol delimiting string constants

Although the standard method of declaring string constants is usually convenient, if the string contains many single quotes or backslashes, understanding the contents of the string can become bitter because each single quotation mark doubles. To make the query more readable in this context, PostgreSQL allows another string literal notation called "dollar symbol definition". A string constant that is defined by a dollar symbol is a "tick" of a dollar sign ($), 0 or more characters, another dollar sign, any sequence of characters that make up a string constant, a dollar sign, the same token as before, and a dollar sign. For example, here are two different ways to declare "Dianne's horse" in a dollar-delimited way:

$ $Dianne ' s horse$$ $SomeTag $dianne ' s horse$sometag$

Note that in a dollar-delimited string, single quotes are not allowed to escape. In fact, in a dollar-delimited string, escaping any character is not allowed: the string content is always written in literal terms. Backslashes are not special, and the dollar symbol itself is not special (unless they match a part of the open tag).

We can implement nesting by using different "tags" at different nesting levels. The most common is when writing function definitions. Like what:

$function $begin    RETURN ($ ~ $q $[\t\r\n\v\\] $q $); END; $function $

Here, the sequence $q $[\t\r\n\v\\] $q $ denotes a dollar-delimited string literal [\t\r\n\v\\] , which is recognized when the function body is executed by PostgreSQL. But since this sequence does not match the outer boundary of the delimiter $function $ , so long as the outer string is considered, it is just the ordinary character inside the constant.

If there is a tag, a dollar-delimited string follows the same rules as unquoted identifiers, except that it cannot contain dollar characters. The label is case-sensitive, so $tag $string content$tag$ is correct, and $TAG $string content$tag$ is wrong.

A dollar-character definition string followed by a keyword or identifier must be separated from the following keyword or identifier, otherwise the dollar delimiter will be treated as the beginning of the identifier ([original] Otherwise the dollar quoting delimiter would Being taken as part of the preceding identifier).

The dollar symbol definition is not a SQL standard, but it is usually more convenient than the standard single quote syntax when writing complex string literals. This is especially useful when performing string constants in other constants. For example, in the process function definition, if you use single quote syntax, each backslash in the above example must be written in four, they will be reduced to two as a string literal parsing, and then in the inner string constant when the function executes, it will be parsed again as one.

4.1.2.3. Bit string Constants

Bit-string constants look like a normal string with a b(uppercase or lowercase) in front of the open quotation marks (there is no white space between them), such as B ' 1001 ' . The characters that can be used in bit string constants are only 0 and 1 .

In addition, the bit string constants can be declared in hexadecimal notation by using the prefix x(uppercase or lowercase), such as x ' 1FF ' , where each hexadecimal bit is equivalent to four bits.

Both forms of bit-string constants can be contiguous across rows like normal string constants. Bit string constants cannot be defined in dollar characters.

4.1.2.4. Numeric constants

Numeric constants accept the following common forms:

digits digits. [digits] [e[+]digits[digits].  Digits[e[+]digits]digitse[+-]digits    

The digits here is one or more decimal digits (0-9). If there is a decimal point, then at least one is in front of or behind the decimal. If an exponential delimiter (e) appears, then at least one number follows it. There must be no spaces or other characters in the constants. Note that any leading plus or minus sign is not actually considered to be part of a constant, it is an operator that is applied to a constant.

Here are some examples of valid numeric constants:

42
3.5
4.
.001
5e2
1.925e-3

If a numeric constant contains neither a decimal point nor an exponential operator, then if its value can be placed in an integer type (32 bits), it is considered an integer type; If its value can be placed in the bigint Medium (64-bit), it is considered to be bigint , otherwise it is considered to be a numeric type. Constants that contain decimal and/or exponential operators are always considered to be numeric types.

Assigning a numeric constant to the initial data type is only the beginning of the type resolution algorithm. In most cases, the constant is automatically cast to the most appropriate type, depending on the environment. If necessary, you can parse a numeric value into a specific data type by forcing the type conversion. For example, you can force a value to be treated as a real(float4) type, by writing:

REAL ' 1.23 '  --string style 1.23::real   

These are actually just exceptions to the general conversions discussed below.

4.1.2.5. Other types of constants

Any type of constant can be entered with either of the following representations:

type 'string '::typeCAST ('stringtype ')

Where the 'string' will be converted to a constant of type. If there is no ambiguity about the type of the constant, then you can omit the explicit type conversion (for example, when you assign it directly to a table field), in which case it will automatically convert.

Where the 'string' can be written in plain SQL notation or dollar-delimited notation.

We can also declare type conversions with the syntax of a function style:

TypeName ('string')

However, not all types of names can be used in this way; see section 4.2.8 for details.

::, CAST () and function call syntax can also be used to declare run-time type conversions of arbitrary expressions (as discussed in section 4.2.8). But the form of type 'string' can only be used to declare a literal constant type. Another limitation of type 'string' is that it cannot be used for an array type (to declare the type of an array constant by : or CAST () ).

The CAST () syntax follows the SQL standard. The type 'string' syntax is a generalization of the standard: SQL only declares this syntax for a handful of data types, but PostgreSQL allows it to be used for all types. :: and the syntax of the function call is the historical usage of PostgreSQL.

4.1.3. operator

An operator is a sequence of up to Namedatalen-1 (default 63) of the following characters:

+-*/< > = ~! @ #% ^ & | ` ?

However, there are several limitations:
    • -- and/ * cannot appear anywhere in the operator because they are treated as comments.

    • A multi-character operator cannot end with + or - unless at least one of the following operators is included:

      ~ ! @ #% ^ & | ` ?

      For example,@- is an operator that is allowed, but * -not. This restriction allows PostgreSQL to parse SQL-compatible queries without the need for whitespace between tokens.

When you use non-SQL-standard operators, you usually need to separate adjacent operators with whitespace to avoid ambiguity. For example, if you define a left monocular operator called @ , then you cannot write x*@y , but write x* @y to make sure that PostgreSQL reads it as two operators instead of one.

4.1.4. Special characters

Some non-alphanumeric characters have special meanings and cannot be used as operators. Their usage details can be found in the appropriate description of the grammatical elements. This section simply describes their existence and summarizes the purpose of these characters.

    • The dollar sign ($) is followed by a number used to represent the position of the parameter in a function body definition or in a prepared statement. In other environments The dollar sign may be part of an identifier name or a string constant defined by a dollar symbol.

    • Parentheses (()) are used to group and enforce precedence when the meaning is the same as usual. In some cases, parentheses are required as part of a fixed syntax for a particular SQL command.

    • square brackets ([]) are used to select an array element. See section 8.10 for more information.

    • Commas (,) are used in some syntactic constructs to separate the elements of a list.

    • Semicolon (;) Ends an SQL command. It cannot appear anywhere in a command, except in string constants or identifiers surrounded by quotation marks.

    • The colon (:) is used to select fragments from the array (see section 8.10). In some SQL dialects (such as Embedded SQL), colons are used to prefix variable names.

    • Asterisks (*) represent all fields of a table or values of a composite type in some environments. When used as a parameter to a clustered function, it also indicates that the aggregation does not require explicit arguments.

    • Period (.) Used in numeric constants and used to separate patterns, tables, and field names.

4.1.5. Notes

A comment is any sequence of characters that begins with a double dash and extends to the end of a line, such as:

In addition, you can also use C-style block annotations:

/* Multiple lines of comments * can be nested:/* Nested block comments */*

Here the comments start with/ * and extend to the corresponding * /. These block annotations can be nested, as SQL99 says (but not the same as C), so we can comment out a chunk of code that already contains block comments.

Annotations are removed from the input stream and replaced with blanks before further parsing.

4.1.6. Lexical precedence

Table 4-1 shows the precedence and associativity of the operators inside PostgreSQL. Most operators have the same precedence and are left-associative. This situation may have a less intuitive behavior, for example, Boolean operators < and > have different priorities between the Boolean operators <= and >= . Also, when you combine binocular and monocular operators, you sometimes need to add parentheses. Like what

SELECT 5! -6;

will be analyzed as

SELECT 5! (-6);

Because the parser doesn't know ! is defined as the suffix operator, not the infix operator (it can only be too late to know). To get the features you need in this example, you have to write

SELECT (5!)-6;

This is the price we pay for extensibility.

Table 4-1. Operator Precedence (decrement)

operator/Element Relevance of Description
. Left Table/Field name separators
:: Left PostgreSQL-specific type conversion operators
[ ] Left Array element Selection
- Right Monocular minus
^ Left Power
* / % Left Multiply, divide, mold
+ - Left Add, Subtract
Is is TRUE, was FALSE, is UNKNOWN, is NULL
ISNULL Test is NULL
Notnull Test is not NULL
(Any other) Left All other local and user-defined operators
Inch Collection Members
Between Scope contains
Overlaps Time interval overlap
like ILIKE SIMILAR String pattern Matching
< > Less than, greater than
= Right Equal to, value assigned
Not Right Logical Non-
and Left Logic and
OR Left Logical OR

Note that operator Precedence also applies to built-in operators and user-defined operators with the same name as mentioned above. For example, if you define a "+" operator for some customer data types, it has the same priority as the built-in "+" operator, regardless of what it is used for.

If a pattern-decorated operator name is used in the OPERATOR syntax, such as

SELECT 3 OPERATOR (pg_catalog.+) 4;

Then the OPERATOR construct will have the default priority shown in table 4-1 for "any other" operator. This is true no matter what the particular operator appears in OPERATOR () .

For more information refer to Http://www.infocool.net

POSTGRESQL syntax structure

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.