Database in-depth learning notes

Source: Internet
Author: User
Tags abstract join logical operators sort mysql index mysql locate oracle database

-------------------------------------------Split Line-------------------------------------------
Database in-depth learning note-sql execution engine

Here are a few questions to start today's discussion:
1, what is the SQL engine.
2, what is SQL storage engine.
How the 3,sql engine works (workflow).
The order in which the 4,SQL statements are executed.

One, the pre-compilation of SQL statements
Precompiled statement PreparedStatement is an interface in the java.sql package, PreparedStatement inherits the statement interface. When you execute SQL through statement, the application
The program first sends the SQL statement to the DBMS, which is compiled and executed by the DBMS. The difference between PreparedStatement and statement is that PreparedStatement is creating
When you preparedstatement an object, SQL is specified. The statement is immediately sent to the DBMS for compilation. When PreparedStatement executes, the DBMS runs the compiled SQL directly. That is
said that the process of eliminating the compilation process, so PreparedStatement execution speed is higher than statement.
Java.sql.PreparedStatement preparedstatementsql = null;
Com.mysql.jdbc.PreparedStatement preparedstatementmysql = null;

1,mysql engine
The MySQL engine depends on how MySQL was compiled when it was installed. To add a new engine, you must recompile MySQL. By default, MySQL supports 3 engines: isam/myisam/
HEAP. The other 2 types are: Innodb/berkley. Here are some of these MySQL engines:

Isam:isam is a well-defined and time-tested data table management approach, at the beginning of the design, ISAM takes into account the number of times the database is queried
is much larger than the number of changes. As a result, ISAM queries are fast and do not consume large amounts of memory and hard disks. The two shortcomings of ISAM are that ISAM does not support
Transactions, and ISAM is not capable of fault tolerance. If the hard drive crashes, the data file cannot be recovered.
Myisam:myisam is the extended format of the MySQL ISAM and the default database engine. MyISAM provides a large number of indexes and field management that ISAM does not have
function, MyISAM also has the table lock function. The MyISAM table locking feature optimizes multiple concurrent read and write operations, at the expense of running optimize table
command to restore the space wasted by the updated mechanism. MyISAM's query is fast, which may be one of the reasons MySQL receives web favor. One of the MyISAM
The drawback is that no further damage can be done after data recovery.
HEAP:HEAP allows temporary tables that reside only in memory. The memory is much faster than the hard disk, so the heap is faster than the Isam/myisam. But the heap is shaky.
Data is easily lost.
Innodb:innodb engine supports transactions, supports foreign keys

The difference between MyISAM and InnoDB:
* MyISAM does not support transactions, InnoDB support transactions;
* MyISAM does not support foreign keys, INNODB support foreign keys;
* MyISAM does not support row-level lock, INNODB support row-level lock;
* MyISAM execution speed is faster than InnoDB;
* The exact number of rows in the table is not saved in InnoDB, that is, when you execute select COUNT (*) from table, InnoDB scans the entire table. And in MyISAM
Save the total number of rows in the table and read directly. However, if you include a where statement, both InnoDB and MyISAM need to scan the entire table.
* For fields of type auto_increment, InnoDB must contain only the index of the field, but in the MyISAM table, you can work with other fields
Establish a federated index.
* Delete from table, InnoDB does not reestablish the table, but deletes one row at a time.

Generally speaking, MyISAM is suitable for the following scenarios:
* Fewer insert operations, more query operations;
* No business;

The InnoDB is suitable for the following scenarios:
* High reliability requirements;
* Request for business;

MySQL supports table-level database engine settings, which means you can set up a separate database engine for each table in MySQL. This is also a manifestation of MySQL flexibility.
Because we can set different database engines for different tables for the application scenario. For example, for a table with transactional operations, we need to configure InnoDB, for a table with no transactions, I
You can use MyISAM to improve performance.

How the 2,sql engine compiles the SQL statements that the application sends over
When an application sends an SQL statement to the DBMS, the SQL engine processes the SQL and executes it. Here's a question: If a SQL needs to be executed multiple times, the SQL engine
How to do it. OK, let's start with the SQL engine processing of SQL, which can be broadly divided into three steps:
The first step: parsing SQL statements;
Step two: Check syntax and semantics;
Step three: Generate SQL that the DB can execute;
This process is still a bit abstract, we take the Oracle database as an example to illustrate that the SQL engine of the Oracle database to the SQL processing process is as follows:
The first step: grammar check;
The second step: semantic check, check whether the object exists, determine the type is correct;
The third step: check the user rights;
Fourth step: Select one of the many possible execution paths as the execution plan;
Fifth step: Generate the compiled version of the statement;
From here we can see that parsing is a fairly cost-intensive operation. Especially for databases.
Precompiled SQL, completed the first 2 steps, the SQL engine only needs a third step when it needs to be executed. So, here's the question, where is the precompiled SQL statement stored, and the system
How much is the overhead. (Oracle has a SQL cache that, when executing SQL, first uses the hash algorithm to derive a value from the SQL statement's string, and then checks to see if the value exists in the shared pool.) If present, the cache
Hit, execute the statement directly with the execution plan that has been cached. If it does not exist, SQL will need to be processed)

Second, the order in which SQL statements are executed

In general, SQL statements can be divided into 7 categories based on the underlying sequence of operations and operation types:
* GROUP by
* Having
* ORDER by

These 7 types of keywords are executed in the following order:
* GROUP by
* Having
* ORDER by

Third, the MySQL parsing process is detailed

The function of parsing is to transform an input string into a struct that describes the string. That is, the process of translating the SQL language closer to the machine language.
Parsing consists of three processes: lexical analysis, parsing, and output abstract syntax tree.

1, lexical analysis
In the lexical analysis phase, the lexical parser is used. The lexical analyzer is a deterministic finite automaton (DFA). The lexical parser can follow our definition of the lexical, the input character
Set to "word". As follows:
ABC--Identifier (identifier)
' ABC '--stringliteral (String)
123--Number (digital)
SELECT--Keyword (keyword)

For example, when you enter an SQL statement in the console, the lexical Analyzer completes the following procedure after a carriage return:
--(Keyword:select) (Identifier:id) (Keyword:from) (identifier:payment)

2, Syntax analysis
After lexical analysis, enter the grammar analysis stage. The output of the lexical analysis will be used as input to the parsing. Syntax analysis is primarily to determine whether a user-entered word conforms to the grammatical logic.

3, Output abstract syntax tree
The final step in parsing is to output an abstract syntax tree. Abstract syntax tree, full name adt,abstract Syntax trees. Abstract syntax trees represent user input in a tree-shaped structure.
Each node on the abstract syntax tree is a word. At this point, the SQL statement entered by the user becomes a "tree-like structure".

Four, the semantic analysis process of MySQL

The semantic analysis phase is the most complex one in the SQL parsing process. Semantic analysis involves the relevant theories of SQL standard, SQL optimization, MapReduce, and so on. Abstract syntax tree generated during the parsing phase,
After semantic analysis, a query plan is generated. The query plan directs the physical execution operator to run on our distributed system in a single step, reading the contents of the table.
Semantic analysis can be divided into 2 stages: logical analysis and physical analysis.

1, logic analysis
Logical analysis is essentially the analysis process of pure algebra. Logical analysis is independent of the underlying distributed environment. Logical analysis is to analyze the user input SQL statement exactly what to do, need to execute
which operations. Generally speaking, an SQL statement always has one input, one output, and the input data is processed by SQL to get the output data.

2, physical analysis
Physical analysis is the transformation of the results of logical analysis. Physical analysis is closely related to the underlying execution environment. In general, physical analysis requires determining how data is partitioned at MapReduce,
Sort, read the amount of data, start the number of processes to perform tasks, and so on.

Five, the logical operator of MySQL

The following SQL operations, in fact, all correspond to a logical operator, the logical operator can be understood as a number of MySQL internal operations. These operations are atomic and non-detachable. Logical operators can
As the smallest execution unit of MySQL internal operations.
* GROUP by
* Having
* ORDER by

These SQL operations correspond to the logical operators of MySQL internal operations as follows:
* SELECT---Selectoperator (SEL)
* Distinct/group by--Gbyoperator (Gby)
* FROM--tablescanoperator (TS)
* Where/having--filteroperator (FIL)
* Order By/limit-orderbyoperator (order)
* Join---joinoperator (join)
* Union/union All--unionalloperator (UNION)

1, function of the logic operator
A logical query plan is made up of logical operators. A logical plan is a directed acyclic graph (DAG). General logic operators have an input dataset and an output dataset. Joinoperator
And Unionalloperator are very special, have two or more than two input datasets, because the function of these two logical operators is to correlate multiple datasets.
We call the input dataset and the output dataset of the logical operator a virtual table. The user can not see the virtual table, the virtual table is used for internal analysis, is a bridge between the logical operators.

Six, generate a logical query plan

With the concept of a logical operator, we can generate our query plan. Based on the execution order of the SQL statements entered by the user, the abstract syntax tree generated during the compilation phase is traversed, with
The logical operator replaces the "word" of the tree node.
Once the logical query plan is generated, an optimization is required. These optimizations include the following:
* Calculates the result of a constant expression
* Column Clipping
* Predict Push down

Here we have one of the following cropping and predict Push down. When you build a query plan, all of the columns in the table are read by default, but you may need to make only a few columns
Calculations, the other columns do not need to be queried. This is the column clipping. Predict Push down refers to the problem of first filtering and then connecting, or first connecting and then filtering. Predict Push Down
Logical judgment, if the filtering operation (Filteroperator) can be placed before the connection operation (Joinoperator), then the first filter and then connect. Predict Push down optimization for
The performance of the lift is very helpful.

Seven. Generate a physical query plan

A logical query plan is an input to a physical query plan. We follow the topological sequence to traverse every logical operator on the logical query plan and generate the corresponding physical operator. When you generate a physical query plan
Also needs to be optimized.

The above syntax analysis and semantic analysis refer to the blog "SQL parsing process in detail"

-------------------------------------------Split Line-------------------------------------------
Database in-depth learning notes-how applications communicate with databases

One, how the application server communicates with the database server

Personally, the application server communicates with the database server through the socket.

-------------------------------------------Split Line-------------------------------------------
Database in-depth learning note-mysql Architecture

MySQL is a portable database that can be run on almost all of the current operating systems. Various systems vary in the bottom-level implementation, but MySQL is basically guaranteed to be in each operating system
The consistency of the physical architecture on the

One, the part of MySQL

1, Connection pool component (Connections pools)
2, Management Services and tools components (Management services and Utilities)
3,sql interface Components (SQL Interface)
4, Query Analyzer component (Parse)
5, Optimizer component (Optimize)
6, buffer (cache) component/caching component (caches and buffer)
7, plug-in storage engine (pluggable Storage engines)
8, physical files

One important feature of MySQL that is different from other relational databases is that MySQL supports plug-in storage engines. The MySQL5.6 version of the default storage engine is InnoDB.

-------------------------------------------Split Line-------------------------------------------

Database in-depth learning notes-database index underlying implementation principles

One, background knowledge

1,b-tree and B+tree

2,binary Search (binary find)

3, Database performance issues
* Disk IO performance is very low, seriously affecting the performance of the database;
* Disk sequential read-write performance is much higher than random read and write;

4, the basic storage structure of data
* Disk space is divided into many blocks or pages of the same size (page)
* Data blocks of a table are linked together in a list
* Data is in blocks that are stored on disk in the unit of behavior
* When accessing data, read from disk or write at least one full block at a time (block)

5, how to locate the data
Before you can manipulate data, you need to locate it first. How does MySQL locate the data? In simple terms, it's a table scan. A table scan reads all blocks of data from the disk sequentially.
to match. The time complexity of the table scan is O (n).

6, how to improve the efficiency of data location
There are several ways to reduce the disk space consumed by data, that is, to compress the data, optimize the data storage structure, reduce the total amount of data access, and reduce the access to invalid data.
The advent of indexes greatly improves the efficiency of data location.

>>>>> think of a problem: the difference between a federated index and a single-column index.

7, index-generated background
If we match only one field in a row of records, then efficiency can be greatly improved, which effectively reduces the access to invalid data; If we put a field in a single
block, the efficiency can be greatly improved, which effectively reduces the amount of disk space consumed. The index is generated in this case. Indexes are replaced at the expense of space.
Of time. Everything has two sides, so the rational use of the index is particularly important. The MySQL index is saved in the B-tree format.

8,binary Search (binary find)
Binary lookup, which is to sort the index, stores the index block address in an array order. This array is also stored on the disk. The time complexity of the binary lookup is O (log (n)). On this basis
, continue the abstraction until there is only one record in the topmost block, so it becomes a b+tree. The height of the tree is the number of times you need to find it, and the higher the index tree, the less efficient it is.

Second, MySQL index classification

1, normal index: The most basic index type, there is no uniqueness limit.

2, UNIQUE index: The difference between a unique index and a normal index is that a unique index must ensure that the value of the indexed column is not duplicated, that is, the value of the indexed column is unique.

3, PRIMARY key: The primary key is an index and is a uniqueness index. The primary key is equivalent to the aggregate index, which is the fastest index of the query.

4, full-text index: a special index. A full-text index generates a list of all the words that appear in a field in a data table. Full-text indexes can only be created in the MyISAM data table. Less than
A 3-character word is usually ignored. You can modify the MySQL configuration file my.cnf by modifying it. Ft_min_word_len = 3.

5, single-column index: a single-column index is the index of a field.

6, Multi-column index: Multi-column index is the index of multiple fields, and multi-column index is also called Union Index. When using multi-column indexes, be aware that only the first column fields in the index are included in the Where condition
Effective. Examples are as follows:
First create an index: ALTER TABLE PAYMENT ADD index Union_index (id,amount,date);
Then look at the following several query operations:
SELECT * from PAYMENT WHERE ID = ' 10000001 ' and AMOUNT = ' 100.00 '; --The federated index is valid
SELECT * from PAYMENT WHERE ID = ' 10000001 ' and DATE = ' 20161208 '; --The federated index is valid
SELECT * from PAYMENT WHERE ID = ' 10000001 '; --The federated index is valid
SELECT * from PAYMENT WHERE AMOUNT = ' 100.00 ' and DATE = ' 20161208 '; --Invalid federated index

7, leftmost index: Can you understand that MySQL's combined index must conform to the rule of the leftmost index. As mentioned above, only the first column field in the index is included in the Where condition.
The index is only valid.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.